File size: 6,810 Bytes
12e8b13 f36eba9 12e8b13 f82330a 12e8b13 5279a98 5bd9a86 70f5d0c f82330a 5279a98 12e8b13 5279a98 f82330a 12e8b13 34e8ee0 9518b2b 12e8b13 5dda9b3 12e8b13 a57fde0 bf873e8 5dda9b3 12e8b13 f36eba9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
license: apache-2.0
datasets:
- cerebras/SlimPajama-627B
- EleutherAI/pile
language:
- en
---

### RWKV EagleX 7B v2 Model
> **!Important!: This is not meant to be used with huggingface transformers library**
> [Use the Hugging Face varient instead, found here (v5-EagleX-v2-7B-HF)](https://huggingface.co/RWKV/v5-EagleX-v2-7B-HF)
>
> The following is the raw representation of the EagleX 7B v2 model. For use with our own set of trainers
>
>
> This is not an instruct tune model! (soon...)
## Quickstart with the hugging face transformer library
[See the huggingface version here (v5-EagleX-v2-7B-HF)](huggingface.co/RWKV/v5-EagleX-v2-7B-HF)
```
model = AutoModelForCausalLM.from_pretrained("RWKV/v5-Eagle-7B-HF", trust_remote_code=True).to(torch.float32)
tokenizer = AutoTokenizer.from_pretrained("RWKV/v5-Eagle-7B-HF", trust_remote_code=True)
```
## Evaluation
The following shows the progression of the model from 1.1T trained to 2.25T trained.
|Model |Eagle-7B-HF|EagleX-7B-HF-v1|EagleX-7B-HF-v2|
|----------------------|-----------|---------------|---------------|
|Param Count |7.52 B |7.52 B |7.52 B |
|Tokens Trained |1.1 T |1.7 T |2.25 T |
|avg_acc |0.4822 |0.5391 |0.5495 |
|glue (acc) |0.5752 |0.7463 |0.7439 |
|anli (acc) |0.3594 |0.4847 |0.5097 |
|mnli (acc) |0.3802 |0.7928 |0.7884 |
|mnli_mismatch (acc) |0.3687 |0.7985 |0.784 |
|swag (acc) |0.568 |0.5814 |0.5905 |
|lambada_standard (acc)|0.685 |0.686 |0.7004 |
|lambada_openai (acc) |0.7425 |0.7522 |0.7502 |
|mmlu (acc) |0.3321 |0.4014 |0.438 |
|winogrande (acc) |0.674 |0.7206 |0.7332 |
|wnli (acc) |0.4225 |0.4648 |0.493 |
|truthfulqa (acc) |0.3303 |0.3268 |0.3401 |
|logiqa (acc) |0.2458 |0.2458 |0.2458 |
|logiqa2 (acc) |0.2494 |0.2595 |0.2621 |
|sciq (acc) |0.955 |0.96 |0.93 |
|piqa (acc) |0.7704 |0.7758 |0.7764 |
|arc_easy (acc) |0.7382 |0.7555 |0.7445 |
|arc_challenge (acc) |0.3951 |0.4087 |0.4155 |
|hellaswag (acc) |0.5264 |0.5411 |0.56 |
|openbookqa (acc) |0.302 |0.296 |0.304 |
|mathqa (acc) |0.26 |0.26 |0.2593 |
|arithmetic (acc) |0.245 |0.0634 |0.1703 |
Compared against other top performing models in the same weight class.
|Model |OLMo-7B |falcon-7b |Llama-2-7b-hf|EagleX-7B-HF-v2|Mistral-7B-v0.1|
|----------------------|---------------|----------------|-------------|---------------|---------------|
|Param Count |6.89 B |6.92 B |6.74 B |7.52 B |7.24 B |
|Tokens Trained |2.5 T |1.5 T |2 T |2.25 T |2 - 7 T? |
|avg_acc |0.4578 |0.4775 |0.5045 |0.5495 |0.5676 |
|glue (acc) |0.474 |0.4578 |0.4289 |0.7439 |0.515 |
|anli (acc) |0.3478 |0.3541 |0.3697 |0.5097 |0.3803 |
|mnli (acc) |0.3294 |0.3893 |0.4269 |0.7884 |0.4542 |
|mnli_mismatch (acc) |0.3348 |0.404 |0.4395 |0.784 |0.4632 |
|swag (acc) |0.5512 |0.5685 |0.5658 |0.5905 |0.5756 |
|lambada_standard (acc)|0.6396 |0.6868 |0.6808 |0.7004 |0.6944 |
|lambada_openai (acc) |0.6872 |0.746 |0.7353 |0.7502 |0.7553 |
|mmlu (acc) |0.2812 |0.2512 |0.4077 |0.438 |0.5964 |
|winogrande (acc) |0.6725 |0.6709 |0.6914 |0.7332 |0.7364 |
|wnli (acc) |0.5775 |0.4789 |0.4648 |0.493 |0.5775 |
|truthfulqa (acc) |0.3015 |0.2826 |0.3205 |0.3401 |0.3537 |
|logiqa (acc) |0.2335 |0.2151 |0.2535 |0.2458 |0.2427 |
|logiqa2 (acc) |0.2506 |0.2252 |0.2564 |0.2621 |0.3022 |
|sciq (acc) |0.927 |0.944 |0.939 |0.93 |0.959 |
|piqa (acc) |0.7878 |0.7949 |0.7807 |0.7764 |0.8052 |
|arc_easy (acc) |0.7353 |0.7479 |0.7643 |0.7445 |0.8081 |
|arc_challenge (acc) |0.3677 |0.4027 |0.4309 |0.4155 |0.5009 |
|hellaswag (acc) |0.5572 |0.5772 |0.5713 |0.56 |0.6131 |
|openbookqa (acc) |0.292 |0.306 |0.316 |0.304 |0.33 |
|mathqa (acc) |0.26 |0.2884 |0.2801 |0.2593 |0.3554 |
|arithmetic (acc) |0.0069 |0.2367 |0.4703 |0.1703 |0.9004 |
See the following, for the full details on this model: [https://blog.rwkv.com/p/eaglex-v2-soaring-past-llama2-7b](https://blog.rwkv.com/p/eaglex-v2-soaring-past-llama2-7b)
## Links
- [Our wiki](https://wiki.rwkv.com)
- [Full eval data](https://docs.google.com/spreadsheets/d/1CBLU6yKkW-8FMvGD4INO3qjeHZ0qkKnZFcM6n6lWNOs/edit#gid=912381775)
- [Recursal.AI Cloud Platform](https://recursal.ai)
- [HF Gradio Demo](https://huggingface.co/spaces/RWKV/v5-EagleX-v2-7B-gradio)
- [Blog article, detailing our model launch](https://blog.rwkv.com/p/eaglex-v2-soaring-past-llama2-7b)
## Acknowledgement
We are grateful for the help and support from the following key groups:
- [Recursal.ai](https://recursal.ai) team for financing the GPU resources, and managing the training of this foundation model - you can run the Eagle line of RWKV models on their cloud / on-premise platform today.
- EleutherAI for their support, especially in the v5/v6 Eagle/Finch paper
- Linux Foundation AI & Data group for supporting and hosting the RWKV project |