notbadai
/

notbad_v1_1_mistral_24b

Text Generation

text-generation-inference

Model card Files Files and versions

notbad_v1_1_mistral_24b / README.md

vpj's picture

vpj

Update README.md

5e02e55 verified 5 months ago

|

history blame contribute delete

3.59 kB

	---
	license: apache-2.0
	base_model:
	- mistralai/Mistral-Small-24B-Instruct-2501
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Model Card for Notbad v1.1 Mistral 24B

	This model has better IFEval scores than our previous model
	[Notbad v1.0 Mistral 24B](https://huggingface.co/notbadai/notbad_v1_0_mistral_24b).

	Notbad v1.1 Mistral 24B is a reasoning model trained in math and Python coding.
	This model is built upon the
	[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).
	and has been further trained with reinforcement learning on math and coding.

	One of the key features of Notbad v1.0 is its ability to produce shorter and cleaner reasoning outputs.
	We used open datasets and employed reinforcement learning techniques developed continuing
	from our work on
	[Quiet Star](https://arxiv.org/abs/2403.09629),
	and are similar to
	[Dr. GRPO](https://arxiv.org/abs/2503.20783).
	The reasoning capabilities in this model are from self-improvement and not distilled from any other model.
	It is the result of a fine-tuning from data sampled from multiple of our RL models starting with the
	[Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501).

	Special thanks to [Lambda](https://lambda.ai/) and [Deep Infra](https://deepinfra.com/)
	for providing help with compute resources for our research and training this model.

	You can try the model on [chat.labml.ai](https://chat.labml.ai).

	## Benchmark results

	\| Evaluation \| notbad_v1_1_mistral_24b \| notbad_v1_0_mistral_24b \| mistral-small-24B-instruct-2501 \| gemma-2b-27b \| llama-3.3-70b \| qwen2.5-32b \| gpt-4o-mini-2024-07-18 \|
	\|------------\|-------------------------\|-------------------------\|---------------------------------\|--------------\|---------------\|-------------\|------------------------\|
	\| mmlu_pro \| 0.673 \| 0.642 \| 0.663 \| 0.536 \| 0.666 \| 0.683 \| 0.617 \|
	\| gpqa_main \| 0.467 \| 0.447 \| 0.453 \| 0.344 \| 0.531 \| 0.404 \| 0.377 \|

	Math & Coding

	\| Evaluation \| notbad_v1_0_mistral_24b \| notbad_v1_0_mistral_24b \| mistral-small-24B-instruct-2501 \| gemma-2b-27b \| llama-3.3-70b \| qwen2.5-32b \| gpt-4o-mini-2024-07-18 \|
	\|------------\|-------------------------\|-------------------------\|---------------------------------\|--------------\|---------------\|-------------\|------------------------\|
	\| humaneval \| 0.872 \| 0.869 \| 0.848 \| 0.732 \| 0.854 \| 0.909 \| 0.890 \|
	\| math \| 0.749 \| 0.752 \| 0.706 \| 0.535 \| 0.743 \| 0.819 \| 0.761 \|

	Instruction following

	\| Evaluation \| notbad_v1_0_mistral_24b \| notbad_v1_0_mistral_24b \| mistral-small-24B-instruct-2501 \| gemma-2b-27b \| llama-3.3-70b \| qwen2.5-32b \| gpt-4o-mini-2024-07-18 \|
	\|------------\|-------------------------\|-------------------------\|---------------------------------\|--------------\|---------------\|-------------\|------------------------\|
	\| ifeval \| 0.779 \| 0.514 \| 0.829 \| 0.8065 \| 0.8835 \| 0.8401 \| 0.8499 \|

	Note:

	- Benchmarks are
	from [Mistral-Small-24B-Instruct-2501 Model Card](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)