vpj commited on
Commit
5e02e55
·
verified ·
1 Parent(s): ba47629

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -7,3 +7,51 @@ library_name: transformers
7
  ---
8
 
9
  # Model Card for Notbad v1.1 Mistral 24B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  # Model Card for Notbad v1.1 Mistral 24B
10
+
11
+ This model has better IFEval scores than our previous model
12
+ [Notbad v1.0 Mistral 24B](https://huggingface.co/notbadai/notbad_v1_0_mistral_24b).
13
+
14
+ Notbad v1.1 Mistral 24B is a reasoning model trained in math and Python coding.
15
+ This model is built upon the
16
+ [Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
17
+ and has been further trained with reinforcement learning on math and coding.
18
+
19
+ One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs.
20
+ We used open datasets and employed reinforcement learning techniques developed continuing
21
+ from our work on
22
+ [Quiet Star](https://arxiv.org/abs/2403.09629),
23
+ and are similar to
24
+ [Dr. GRPO](https://arxiv.org/abs/2503.20783).
25
+ The reasoning capabilities in this model are from self-improvement and not distilled from any other model.
26
+ It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the
27
+ [Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
28
+
29
+ Special thanks to [Lambda](https://lambda.ai/) and [Deep Infra](https://deepinfra.com/)
30
+ for providing help with compute resources for our research and training this model.
31
+
32
+ You can try the model on **[chat.labml.ai](https://chat.labml.ai)**.
33
+
34
+ ## Benchmark results
35
+
36
+ | Evaluation | notbad_v1_1_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
37
+ |------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
38
+ | mmlu_pro | 0.673 | 0.642 | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
39
+ | gpqa_main | 0.467 | 0.447 | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
40
+
41
+ **Math & Coding**
42
+
43
+ | Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
44
+ |------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
45
+ | humaneval | 0.872 | 0.869 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
46
+ | math | 0.749 | 0.752 | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
47
+
48
+ **Instruction following**
49
+
50
+ | Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
51
+ |------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
52
+ | ifeval | 0.779 | 0.514 | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
53
+
54
+ **Note**:
55
+
56
+ - Benchmarks are
57
+ from [Mistral-Small-24B-Instruct-2501 Model Card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)