manaestras commited on
Commit
0070faf
·
verified ·
1 Parent(s): 28be5b7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +22 -19
README.md CHANGED
@@ -1,16 +1,20 @@
1
  ---
 
 
2
  library_name: transformers
3
  ---
4
 
5
 
 
6
  <p align="center">
7
  <img src="https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png" width="400"/> <br>
8
  </p><p></p>
9
 
10
 
 
11
  <p align="center">
12
  🤗&nbsp;<a href="https://huggingface.co/tencent/"><b>HuggingFace</b></a>&nbsp;|&nbsp;
13
- 🤖&nbsp;<a href="https://modelscope.cn/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct"><b>ModelScope</b></a>&nbsp;|&nbsp;
14
  🪡&nbsp;<a href="https://github.com/Tencent/AngelSlim/tree/main"><b>AngelSlim</b></a>
15
  </p>
16
 
@@ -21,14 +25,13 @@ library_name: transformers
21
  </p>
22
 
23
  <p align="center">
24
- <a href="https://github.com/Tencent-Hunyuan/Hunyuan-7B"><b>GITHUB</b></a> |
25
- <a href="https://cnb.cool/tencent/hunyuan/Hunyuan-7B"><b>cnb.cool</b></a> |
26
- <a href="https://github.com/Tencent-Hunyuan/Hunyuan-7B/blob/main/LICENSE"><b>LICENSE</b></a> |
27
- <a href="https://raw.githubusercontent.com/Tencent-Hunyuan/Hunyuan-A13B/main/assets/1751881231452.jpg"><b>WeChat</b></a> |
28
  <a href="https://discord.gg/bsPcMEtV7v"><b>Discord</b></a>
29
  </p>
30
 
31
-
32
  ## Model Introduction
33
 
34
  Hunyuan is Tencent's open-source efficient large language model series, designed for versatile deployment across diverse computational environments. From edge devices to high-concurrency production systems, these models deliver optimal performance with advanced quantization support and ultra-long context capabilities.
@@ -49,7 +52,7 @@ We have released a series of Hunyuan dense models, comprising both pre-trained a
49
 
50
  ## Benchmark
51
 
52
- Note: The following benchmarks are evaluated by TRT-LLM-backend on several **base models**.
53
 
54
  | Model | Hunyuan-0.5B-Pretrain | Hunyuan-1.8B-Pretrain | Hunyuan-4B-Pretrain | Hunyuan-7B-Pretrain|
55
  |:------------------:|:---------------:|:--------------:|:-------------:|:---------------:|
@@ -87,7 +90,7 @@ First, please install transformers. We will merge it into the main branch later.
87
  ```SHELL
88
  pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
89
  ```
90
- Our model defaults to using slow-thinking reasoning, and there are two ways to disable CoT reasoning.
91
  1. Pass **"enable_thinking=False"** when calling apply_chat_template.
92
  2. Adding **"/no_think"** before the prompt will force the model not to use perform CoT reasoning. Similarly, adding **"/think"** before the prompt will force the model to perform CoT reasoning.
93
 
@@ -110,7 +113,7 @@ messages = [
110
  tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True,return_tensors="pt",
111
  enable_thinking=True # Toggle thinking mode (default: True)
112
  )
113
-
114
  outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
115
 
116
  output_text = tokenizer.decode(outputs[0])
@@ -271,7 +274,7 @@ We use FP8-static quantization, FP8 quantization adopts 8-bit floating point for
271
  ### Int4 Quantization
272
  We use the GPTQ and AWQ algorithm to achieve W4A16 quantization.
273
 
274
- GPTQ processes the model weights layer by layer, uses a small amount of calibration data to minimize the reconfiguration error of the quantized weights, and adjusts the weights layer by layer by the optimization process of approximating the Hessian inverse matrix. The process eliminates the need to retrain the model and requires only a small amount of calibration data to quantize the weights, improving inference efficiency and lowering the deployment threshold.
275
  AWQ using a small amount of calibration data (without the need for training), the amplitude of the activation values is statistically calculated. For each weight channel, a scaling coefficient s is computed to expand the numerical range of important weights, allowing more information to be retained during quantization.
276
 
277
  You can use [AngleSlim](https://github.com/tencent/AngelSlim) quantization, you can also directly download our quantization completed open source model to use [LINK](https://huggingface.co/).
@@ -293,19 +296,19 @@ This subsection describes the Benchmark metrics for the Hunyuan quantitative mod
293
 
294
  For deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.
295
 
296
- image: https://hub.docker.com/r/hunyuaninfer/hunyuan-7B/tags
297
 
298
 
299
  ### TensorRT-LLM
300
 
301
- #### Docker Image
302
 
303
  We provide a pre-built Docker image based on the latest version of TensorRT-LLM.
304
 
305
  We use tencent/Hunyuan-7B-Instruct for example
306
  - To get started:
307
 
308
- https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
309
 
310
  ```
311
  docker pull hunyuaninfer/hunyuan-7B:hunyuan-moe-7B-trtllm
@@ -356,14 +359,14 @@ trtllm-serve \
356
  Please use vLLM version v0.10.0 or higher for inference.
357
 
358
  We use tencent/Hunyuan-7B-Instruct for example
359
- - Download Model file:
360
  - Huggingface: will download automicly by vllm.
361
  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-7B-Instruct`
362
-
363
  - model download by huggingface:
364
  ```shell
365
  export MODEL_PATH=tencent/Hunyuan-7B-Instruct
366
- ```
367
 
368
  - model downloaded by modelscope:
369
  ```shell
@@ -383,7 +386,7 @@ python3 -m vllm.entrypoints.openai.api_server \
383
  --quantization experts_int8 \
384
  --served-model-name hunyuan \
385
  2>&1 | tee log_server.txt
386
- ```
387
  - After running service script successfully, run the request script
388
  ```shell
389
  curl http://0.0.0.0:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{
@@ -471,7 +474,7 @@ python3 -m vllm.entrypoints.openai.api_server \
471
 
472
  ### SGLang
473
 
474
- #### Docker Image
475
 
476
  We also provide a pre-built Docker image based on the latest version of SGLang.
477
 
@@ -501,4 +504,4 @@ docker run --entrypoint="python3" --gpus all \
501
 
502
  ## Contact Us
503
 
504
- If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email (hunyuan_opensource@tencent.com).
 
1
  ---
2
+ base_model:
3
+ - tencent/Hunyuan-4B-Pretrain
4
  library_name: transformers
5
  ---
6
 
7
 
8
+
9
  <p align="center">
10
  <img src="https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png" width="400"/> <br>
11
  </p><p></p>
12
 
13
 
14
+
15
  <p align="center">
16
  🤗&nbsp;<a href="https://huggingface.co/tencent/"><b>HuggingFace</b></a>&nbsp;|&nbsp;
17
+ 🤖&nbsp;<a href="https://modelscope.cn/models/Tencent-Hunyuan/"><b>ModelScope</b></a>&nbsp;|&nbsp;
18
  🪡&nbsp;<a href="https://github.com/Tencent/AngelSlim/tree/main"><b>AngelSlim</b></a>
19
  </p>
20
 
 
25
  </p>
26
 
27
  <p align="center">
28
+ <a href="https://github.com/Tencent-Hunyuan/"><b>GITHUB</b></a> |
29
+ <a href="https://cnb.cool/tencent/hunyuan/"><b>cnb.cool</b></a> |
30
+ <a href="https://github.com/Tencent-Hunyuan/Hunyuan-1.8B/blob/main/LICENSE"><b>LICENSE</b></a> |
31
+ <a href="https://raw.githubusercontent.com/Tencent-Hunyuan/Hunyuan-A13B/main/assets/1751881231452.jpg"><b>WeChat</b></a> |
32
  <a href="https://discord.gg/bsPcMEtV7v"><b>Discord</b></a>
33
  </p>
34
 
 
35
  ## Model Introduction
36
 
37
  Hunyuan is Tencent's open-source efficient large language model series, designed for versatile deployment across diverse computational environments. From edge devices to high-concurrency production systems, these models deliver optimal performance with advanced quantization support and ultra-long context capabilities.
 
52
 
53
  ## Benchmark
54
 
55
+ Note: The following benchmarks are evaluated by TRT-LLM-backend on several **base models**.
56
 
57
  | Model | Hunyuan-0.5B-Pretrain | Hunyuan-1.8B-Pretrain | Hunyuan-4B-Pretrain | Hunyuan-7B-Pretrain|
58
  |:------------------:|:---------------:|:--------------:|:-------------:|:---------------:|
 
90
  ```SHELL
91
  pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
92
  ```
93
+ Our model defaults to using slow-thinking reasoning, and there are two ways to disable CoT reasoning.
94
  1. Pass **"enable_thinking=False"** when calling apply_chat_template.
95
  2. Adding **"/no_think"** before the prompt will force the model not to use perform CoT reasoning. Similarly, adding **"/think"** before the prompt will force the model to perform CoT reasoning.
96
 
 
113
  tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True,return_tensors="pt",
114
  enable_thinking=True # Toggle thinking mode (default: True)
115
  )
116
+
117
  outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
118
 
119
  output_text = tokenizer.decode(outputs[0])
 
274
  ### Int4 Quantization
275
  We use the GPTQ and AWQ algorithm to achieve W4A16 quantization.
276
 
277
+ GPTQ processes the model weights layer by layer, uses a small amount of calibration data to minimize the reconfiguration error of the quantized weights, and adjusts the weights layer by layer by the optimization process of approximating the Hessian inverse matrix. The process eliminates the need to retrain the model and requires only a small amount of calibration data to quantize the weights, improving inference efficiency and lowering the deployment threshold.
278
  AWQ using a small amount of calibration data (without the need for training), the amplitude of the activation values is statistically calculated. For each weight channel, a scaling coefficient s is computed to expand the numerical range of important weights, allowing more information to be retained during quantization.
279
 
280
  You can use [AngleSlim](https://github.com/tencent/AngelSlim) quantization, you can also directly download our quantization completed open source model to use [LINK](https://huggingface.co/).
 
296
 
297
  For deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.
298
 
299
+ image: https://hub.docker.com/r/hunyuaninfer/hunyuan-7B/tags
300
 
301
 
302
  ### TensorRT-LLM
303
 
304
+ #### Docker Image
305
 
306
  We provide a pre-built Docker image based on the latest version of TensorRT-LLM.
307
 
308
  We use tencent/Hunyuan-7B-Instruct for example
309
  - To get started:
310
 
311
+ https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
312
 
313
  ```
314
  docker pull hunyuaninfer/hunyuan-7B:hunyuan-moe-7B-trtllm
 
359
  Please use vLLM version v0.10.0 or higher for inference.
360
 
361
  We use tencent/Hunyuan-7B-Instruct for example
362
+ - Download Model file:
363
  - Huggingface: will download automicly by vllm.
364
  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-7B-Instruct`
365
+
366
  - model download by huggingface:
367
  ```shell
368
  export MODEL_PATH=tencent/Hunyuan-7B-Instruct
369
+ ```
370
 
371
  - model downloaded by modelscope:
372
  ```shell
 
386
  --quantization experts_int8 \
387
  --served-model-name hunyuan \
388
  2>&1 | tee log_server.txt
389
+ ```
390
  - After running service script successfully, run the request script
391
  ```shell
392
  curl http://0.0.0.0:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{
 
474
 
475
  ### SGLang
476
 
477
+ #### Docker Image
478
 
479
  We also provide a pre-built Docker image based on the latest version of SGLang.
480
 
 
504
 
505
  ## Contact Us
506
 
507
+ If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email (hunyuan_opensource@tencent.com).