File size: 1,957 Bytes
97d8ec2
 
07a484e
 
8685810
 
07a484e
 
8685810
07a484e
 
8685810
 
97d8ec2
 
8685810
97d8ec2
8685810
97d8ec2
8685810
97d8ec2
8685810
97d8ec2
8685810
97d8ec2
8685810
97d8ec2
8685810
97d8ec2
8685810
 
 
97d8ec2
8685810
 
 
 
 
 
 
97d8ec2
8685810
 
 
97d8ec2
8685810
 
 
97d8ec2
8685810
 
97d8ec2
8685810
97d8ec2
8685810
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
library_name: transformers
tags:
- math
- qwen2
- aimo
license: mit
datasets:
- Floppanacci/QWQ-LongCOT-AIMO
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
language:
- en
---

# DeepSeek-R1-Distill-Qwen-7B Fine-tuned for AIMO Math Problems

This model is a fine-tuned version of `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` on the [`Floppanacci/QWQ-LongCOT-AIMO`](https://huggingface.co/datasets/Floppanacci/QWQ-LongCOT-AIMO) dataset.

## Model Description

The model was fine-tuned to improve performance on mathematical reasoning tasks, particularly those involving step-by-step solutions (Chain-of-Thought) similar to problems found in the [AI Mathematical Olympiad (AIMO)](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2) competition.

It's trained on a dataset containing ~30k math questions paired with detailed solutions.

An [AWQ quantized version](https://huggingface.co/Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci-AWQ) is also available for faster inference and reduced memory usage.

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # or torch.float16
    device_map="auto"
)

# Example Prompt (adjust based on how the model expects input)
prompt = "Question: What is the value of $2+2$? Answer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(**inputs, max_new_tokens=8192, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
```

## Training Data

The model was fine-tuned on the train split of the [`Floppanacci/QWQ-LongCOT-AIMO`](https://huggingface.co/datasets/Floppanacci/QWQ-LongCOT-AIMO) dataset (29.5k examples).