Update README.md
Browse files
README.md
CHANGED
@@ -7,3 +7,51 @@ library_name: transformers
|
|
7 |
---
|
8 |
|
9 |
# Model Card for Notbad v1.1 Mistral 24B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
# Model Card for Notbad v1.1 Mistral 24B
|
10 |
+
|
11 |
+
This model has better IFEval scores than our previous model
|
12 |
+
[Notbad v1.0 Mistral 24B](https://huggingface.co/notbadai/notbad_v1_0_mistral_24b).
|
13 |
+
|
14 |
+
Notbad v1.1 Mistral 24B is a reasoning model trained in math and Python coding.
|
15 |
+
This model is built upon the
|
16 |
+
[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
|
17 |
+
and has been further trained with reinforcement learning on math and coding.
|
18 |
+
|
19 |
+
One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs.
|
20 |
+
We used open datasets and employed reinforcement learning techniques developed continuing
|
21 |
+
from our work on
|
22 |
+
[Quiet Star](https://arxiv.org/abs/2403.09629),
|
23 |
+
and are similar to
|
24 |
+
[Dr. GRPO](https://arxiv.org/abs/2503.20783).
|
25 |
+
The reasoning capabilities in this model are from self-improvement and not distilled from any other model.
|
26 |
+
It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the
|
27 |
+
[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
|
28 |
+
|
29 |
+
Special thanks to [Lambda](https://lambda.ai/) and [Deep Infra](https://deepinfra.com/)
|
30 |
+
for providing help with compute resources for our research and training this model.
|
31 |
+
|
32 |
+
You can try the model on **[chat.labml.ai](https://chat.labml.ai)**.
|
33 |
+
|
34 |
+
## Benchmark results
|
35 |
+
|
36 |
+
| Evaluation | notbad_v1_1_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
37 |
+
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
|
38 |
+
| mmlu_pro | 0.673 | 0.642 | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
|
39 |
+
| gpqa_main | 0.467 | 0.447 | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
|
40 |
+
|
41 |
+
**Math & Coding**
|
42 |
+
|
43 |
+
| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
44 |
+
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
|
45 |
+
| humaneval | 0.872 | 0.869 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
|
46 |
+
| math | 0.749 | 0.752 | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
|
47 |
+
|
48 |
+
**Instruction following**
|
49 |
+
|
50 |
+
| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|
51 |
+
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
|
52 |
+
| ifeval | 0.779 | 0.514 | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
|
53 |
+
|
54 |
+
**Note**:
|
55 |
+
|
56 |
+
- Benchmarks are
|
57 |
+
from [Mistral-Small-24B-Instruct-2501 Model Card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)
|