notbadai
/

notbad_v1_1_mistral_24b

 ---
 # Model Card for Notbad v1.1 Mistral 24B
+This model has better IFEval scores than our previous model
+[Notbad v1.0 Mistral 24B](https://huggingface.co/notbadai/notbad_v1_0_mistral_24b).
+Notbad v1.1 Mistral 24B is a reasoning model trained in math and Python coding.
+This model is built upon the
+[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
+and has been further trained with reinforcement learning on math and coding.
+One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs.
+We used open datasets and employed reinforcement learning techniques developed continuing
+from our work on
+[Quiet Star](https://arxiv.org/abs/2403.09629),
+and are similar to
+[Dr. GRPO](https://arxiv.org/abs/2503.20783).
+The reasoning capabilities in this model are from self-improvement and not distilled from any other model.
+It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the
+[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
+Special thanks to [Lambda](https://lambda.ai/) and [Deep Infra](https://deepinfra.com/)
+for providing help with compute resources for our research and training this model.
+You can try the model on **[chat.labml.ai](https://chat.labml.ai)**.
+## Benchmark results
+| Evaluation | notbad_v1_1_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
+|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
+| mmlu_pro   | 0.673                   | 0.642                   | 0.663                           | 0.536        | 0.666         | 0.683       | 0.617                  |
+| gpqa_main  | 0.467                   | 0.447                   | 0.453                           | 0.344        | 0.531         | 0.404       | 0.377                  |
+**Math & Coding**
+| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
+|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
+| humaneval  | 0.872                   | 0.869                   | 0.848                           | 0.732        | 0.854         | 0.909       | 0.890                  |
+| math       | 0.749                   | 0.752                   | 0.706                           | 0.535        | 0.743         | 0.819       | 0.761                  |
+**Instruction following**
+| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
+|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------|
+| ifeval     | 0.779                   | 0.514                   | 0.829                           | 0.8065       | 0.8835        | 0.8401      | 0.8499                 |
+**Note**:
+- Benchmarks are
+  from [Mistral-Small-24B-Instruct-2501 Model Card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)