|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- mistralai/Mistral-Small-24B-Instruct-2501 |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
--- |
|
|
|
# Model Card for Notbad v1.1 Mistral 24B |
|
|
|
This model has better IFEval scores than our previous model |
|
[Notbad v1.0 Mistral 24B](https://huggingface.co/notbadai/notbad_v1_0_mistral_24b). |
|
|
|
Notbad v1.1 Mistral 24B is a reasoning model trained in math and Python coding. |
|
This model is built upon the |
|
[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501). |
|
and has been further trained with reinforcement learning on math and coding. |
|
|
|
One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs. |
|
We used open datasets and employed reinforcement learning techniques developed continuing |
|
from our work on |
|
[Quiet Star](https://arxiv.org/abs/2403.09629), |
|
and are similar to |
|
[Dr. GRPO](https://arxiv.org/abs/2503.20783). |
|
The reasoning capabilities in this model are from self-improvement and not distilled from any other model. |
|
It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the |
|
[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501). |
|
|
|
Special thanks to [Lambda](https://lambda.ai/) and [Deep Infra](https://deepinfra.com/) |
|
for providing help with compute resources for our research and training this model. |
|
|
|
You can try the model on **[chat.labml.ai](https://chat.labml.ai)**. |
|
|
|
## Benchmark results |
|
|
|
| Evaluation | notbad_v1_1_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |
|
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------| |
|
| mmlu_pro | 0.673 | 0.642 | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 | |
|
| gpqa_main | 0.467 | 0.447 | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 | |
|
|
|
**Math & Coding** |
|
|
|
| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |
|
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------| |
|
| humaneval | 0.872 | 0.869 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 | |
|
| math | 0.749 | 0.752 | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 | |
|
|
|
**Instruction following** |
|
|
|
| Evaluation | notbad_v1_0_mistral_24b | notbad_v1_0_mistral_24b | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 | |
|
|------------|-------------------------|-------------------------|---------------------------------|--------------|---------------|-------------|------------------------| |
|
| ifeval | 0.779 | 0.514 | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 | |
|
|
|
**Note**: |
|
|
|
- Benchmarks are |
|
from [Mistral-Small-24B-Instruct-2501 Model Card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) |