File size: 4,832 Bytes
2af9402 77d0898 2af9402 dc0fd6a 2af9402 77d0898 2af9402 dc0fd6a 2af9402 81aac9c 2af9402 f974fc0 2af9402 3ae04d5 2af9402 07c3fec 2af9402 e3d1ef1 e52024e e3d1ef1 07c3fec 2af9402 3ae04d5 2af9402 5a9cafb 2af9402 253ef59 dc0fd6a 2af9402 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
license: mit
---
# Chuxin-1.6B-1M
<br>
## 介绍 (Introduction)
**Chuxin-1.6B-Base**是16亿参数规模的模型。Chuxin-1.6B完全基于开源数据构建,在经过超大规模数据训练后,Chuxin-1.6B在各类下游任务上具有非常的竞争力。
**Chuxin-1.6B-1M**是基于Chuxin-1.6B模型在1M窗口下训练后的结果,大海捞针实验显示其具有非常强的上下文检索能力。
如果您想了解更多关于Chuxin-1.6B开源模型的细节,我们建议您参阅我们的[技术报告](https://arxiv.org/pdf/2405.04828)
**Chuxin-1.6B-Base** is a model with 1.6 billion parameters. Chuxin-1.6B is built entirely on open-source data. After being trained with large-scale data, Chuxin has very competitive capabilities in various downstream tasks.
**Chuxin-1.6B-1M** is the result of training the Chuxin-1.6B model with a 1M windows. Experiments such as searching for a needle in a haystack demonstrate its strong contextual retrieval abilities.
If you would like to learn more about the Chuxin-1.6B open-source model, we suggest you refer to our [technical report](https://arxiv.org/pdf/2405.04828).
<br>
## 快速使用(Quickstart)
您可以通过以下代码轻松调用:
You can easily call the model with the following code:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("chuxin-llm/Chuxin-1.6B-1M", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chuxin-llm/Chuxin-1.6B-1M", device_map="auto", trust_remote_code=True, bf16=True).eval()
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_new_tokens=15, do_sample=False)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# 蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是亚的斯亚贝巴(Addis Ababa)...
```
## 评测效果(Evaluation)
### 常识推理和阅读理解 (Common Sense Reasoning and Reading Comprehension tasks)
| Model | size | ARC-c |ARC-e |Boolq |Copa |Hellaswag |OpenbookQA |Piqa |Sciq |Winogrande |Avg|
|:--------------|:----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| chuxin-1.6B-base | 1.6B | 39.68 | 71.38 | 71.25 | 83 | 66.09 | 35.00 | 77.09 | 95 | 63.54 | 66.89 |
| chuxin-1.6B-32k | 1.6B | 39.16 | 70.66 | 67.71 | 81 | 65.69 | 35.8 | 76.88 | 94.2 | 62.51 | 65.96 |
| chuxin-1.6B-64k | 1.6B | 38.48 | 70.24 | 67.52 | 82 | 65.6 | 35.2 | 76.61 | 94.3 | 63.3 | 65.92 |
| chuxin-1.6B-128k | 1.6B | 39.08 | 69.4 | 67.71 | 80 | 65.74 | 35.4 | 76.39 | 94.1 | 63.3 | 65.68 |
| chuxin-1.6B-256k | 1.6B | 40.19 | 70.75 | 69.3 | 78 | 65.85 | 35.8 | 76.88 | 93.5 | 63.85 | 66.01 |
| chuxin-1.6B-512k | 1.6B | 40.61 |71.21| 67.77 |78| 64.82| 34.8| 76.88| 93.6| 61.88| 65.51|
| chuxin-1.6B-1M | 1.6B | 41.13| 72.26| 62.08| 75| 64.59 |34.8| 76.71| 93.33| 62.43| 64.7|
### Open LLM LeaderBoard
| Model | size | ARC-c |HellaSwag|MMLU |TruthfulQA |Winogrande |GSM-8k |Avg |Avg wo GSM|
|:--------------|:----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| chuxin-1.6B-base | 1.6B | 39.68 | 66.09 | 41.07 | 37.65 | 63.54 | 12.66 | 43.45 |49.61|
| chuxin-1.6B-32k | 1.6B | 39.16 | 65.69 | 38.63 | 35.66 | 62.51 | 11.6 | 42.21 | 48.33|
| chuxin-1.6B-64k | 1.6B | 38.48 | 65.6 | 38.43 | 35.07 | 63.3 | 11.9 | 42.13|48.18|
| chuxin-1.6B-128k | 1.6B | 39.08 | 65.74 | 37.65 | 34.89 | 63.3 | 11.07 | 41.96|48.13|
| chuxin-1.6B-256k | 1.6B | 40.19 | 65.85 | 37.16 | 35.2 | 63.85 | 10.16 | 42.07 |48.45|
| chuxin-1.6B-512k | 1.6B | 40.61| 64.82| 36.66| 33.66| 61.88| 8.11| 40.96| 47.53|
| Chuxin-1.6B-1M | 1.6B | 41.13 |64.59| 35.76| 34.67| 62.43| 6.82| 40.9| 47.72|
### 大海捞针 (needle in a haystack)
<p align="center">
<img src="niah.png" style="width: 1200px"/>
<p>
## 引用 (Citation)
如果你觉得我们的工作对你有帮助,欢迎引用!
If you find our work helpful, feel free to give us a cite.
```
@article{chuxin,
title={CHUXIN: 1.6B TECHNICAL REPORT},
author={Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu},
journal={arXiv preprint arXiv:2405.04828},
year={2024}
}
```
<br> |