lbourdois commited on
Commit
0ea6108
·
verified ·
1 Parent(s): 118e6e4

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +255 -241
README.md CHANGED
@@ -1,241 +1,255 @@
1
- ---
2
- license: apache-2.0
3
- library_name: transformers
4
- base_model:
5
- - Qwen/Qwen2.5-32B
6
- pipeline_tag: text-generation
7
- ---
8
-
9
- <p align="center">
10
- <img src="images/deep-cogito-logo.png" alt="Logo" width="40%">
11
- </p>
12
-
13
-
14
- # Cogito v1 preview - 32B
15
-
16
- [Blog Post](https://www.deepcogito.com/research/cogito-v1-preview)
17
-
18
- The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
19
-
20
- - Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
21
- - The LLMs are trained using **Iterated Distillation and Amplification (IDA)** - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
22
- - The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
23
- - In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
24
- - Each model is trained in over 30 languages and supports a context length of 128k.
25
-
26
- # Evaluations
27
- We compare our models against the state of the art size equivalent models in direct mode as well as the reasoning mode. For the direct mode, we compare against Llama / Qwen instruct counterparts. For reasoning, we use Deepseek's R1 distilled counterparts / Qwen's QwQ model.
28
-
29
- <p align="left">
30
- <img src="images/32b_benchmarks.png" alt="Logo" width="90%">
31
- </p>
32
-
33
- **Livebench Global Average:**
34
- <p align="left">
35
- <img src="images/livebench_global_average.png" alt="Logo" width="80%">
36
- </p>
37
-
38
- For detailed evaluations, please refer to the [Blog Post](https://www.deepcogito.com/research/cogito-v1-preview).
39
-
40
-
41
- # Usage
42
- Here is a snippet below for usage with Transformers:
43
-
44
- ```python
45
- import transformers
46
- import torch
47
-
48
- model_id = "deepcogito/cogito-v1-preview-qwen-32B"
49
-
50
- pipeline = transformers.pipeline(
51
- "text-generation",
52
- model=model_id,
53
- model_kwargs={"torch_dtype": torch.bfloat16},
54
- device_map="auto",
55
- )
56
-
57
- messages = [
58
- {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
59
- {"role": "user", "content": "Give me a short introduction to LLMs."},
60
- ]
61
-
62
- outputs = pipeline(
63
- messages,
64
- max_new_tokens=512,
65
- )
66
-
67
- print(outputs[0]["generated_text"][-1])
68
- ```
69
-
70
-
71
-
72
- ## Implementing extended thinking
73
- - By default, the model will answer in the standard mode.
74
- - To enable thinking, you can do any one of the two methods:
75
- - Add a specific system prompt, or
76
- - Set `enable_thinking=True` while applying the chat template.
77
-
78
-
79
- ### Method 1 - Add a specific system prompt.
80
- To enable thinking, simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'`
81
-
82
- If you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction`.
83
-
84
- Here is an example -
85
-
86
- ```python
87
- import transformers
88
- import torch
89
-
90
- model_id = "deepcogito/cogito-v1-preview-qwen-32B"
91
-
92
- pipeline = transformers.pipeline(
93
- "text-generation",
94
- model=model_id,
95
- model_kwargs={"torch_dtype": torch.bfloat16},
96
- device_map="auto",
97
- )
98
-
99
- DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
100
-
101
- messages = [
102
- {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
103
- {"role": "user", "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."},
104
- ]
105
-
106
- outputs = pipeline(
107
- messages,
108
- max_new_tokens=512,
109
- )
110
-
111
- print(outputs[0]["generated_text"][-1])
112
- ```
113
-
114
-
115
- Similarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way -
116
-
117
- ```python
118
- DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
119
-
120
- system_prompt = "Reply to each prompt with only the actual code - no explanations."
121
- prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
122
-
123
- messages = [
124
- {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
125
- {"role": "user", "content": prompt}
126
- ]
127
- ```
128
-
129
- ### Method 2 - Set enable_thinking=True in the tokenizer
130
- If you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template).
131
-
132
- Here is an example -
133
- ```python
134
- from transformers import AutoModelForCausalLM, AutoTokenizer
135
-
136
- model_name = "deepcogito/cogito-v1-preview-qwen-32B"
137
-
138
- model = AutoModelForCausalLM.from_pretrained(
139
- model_name,
140
- torch_dtype="auto",
141
- device_map="auto"
142
- )
143
- tokenizer = AutoTokenizer.from_pretrained(model_name)
144
-
145
- prompt = "Give me a short introduction to LLMs."
146
- messages = [
147
- {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
148
- {"role": "user", "content": prompt}
149
- ]
150
-
151
- text = tokenizer.apply_chat_template(
152
- messages,
153
- tokenize=False,
154
- add_generation_prompt=True,
155
- enable_thinking=True
156
- )
157
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
158
-
159
- generated_ids = model.generate(
160
- **model_inputs,
161
- max_new_tokens=512
162
- )
163
- generated_ids = [
164
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
165
- ]
166
-
167
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
168
- print(response)
169
- ```
170
-
171
- # Tool Calling
172
- Cogito models support tool calling (single, parallel, multiple and parallel_multiple) both in standard and extended thinking mode.
173
-
174
- Here is a snippet -
175
-
176
- ```python
177
- # First, define a tool
178
- def get_current_temperature(location: str) -> float:
179
- """
180
- Get the current temperature at a location.
181
-
182
- Args:
183
- location: The location to get the temperature for, in the format "City, Country"
184
- Returns:
185
- The current temperature at the specified location in the specified units, as a float.
186
- """
187
- return 22. # A real function should probably actually get the temperature!
188
-
189
- # Next, create a chat and apply the chat template
190
- messages = [
191
- {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
192
- ]
193
-
194
- model_inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)
195
-
196
- text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
197
- inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
198
- outputs = model.generate(**inputs, max_new_tokens=512)
199
- output_text = tokenizer.batch_decode(outputs)[0][len(text):]
200
- print(output_text)
201
- ```
202
-
203
- This will result in the output -
204
- ```
205
- <tool_call>
206
- {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
207
- </tool_call><|im_end|>
208
- ```
209
-
210
- You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so:
211
-
212
- ```python
213
- tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
214
- messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
215
- ```
216
-
217
- and then call the tool and append the result, with the `tool` role, like so:
218
-
219
- ```python
220
- messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
221
- ```
222
-
223
- After that, you can `generate()` again to let the model use the tool result in the chat:
224
-
225
- ```python
226
- text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
227
- inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
228
- outputs = model.generate(**inputs, max_new_tokens=512)
229
- output_text = tokenizer.batch_decode(outputs)[0][len(text):]
230
- ```
231
-
232
- This should result in the string -
233
- ```
234
- 'The current temperature in Paris is 22.0 degrees.<|im_end|>'
235
- ```
236
-
237
- ## License
238
- This repository and the model weights are licensed under the Apache 2.0 License Agreement.
239
-
240
- ## Contact
241
- If you would like to reach out to our team, send an email to [contact@deepcogito.com](contact@deepcogito.com).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ base_model:
5
+ - Qwen/Qwen2.5-32B
6
+ pipeline_tag: text-generation
7
+ language:
8
+ - zho
9
+ - eng
10
+ - fra
11
+ - spa
12
+ - por
13
+ - deu
14
+ - ita
15
+ - rus
16
+ - jpn
17
+ - kor
18
+ - vie
19
+ - tha
20
+ - ara
21
+ ---
22
+
23
+ <p align="center">
24
+ <img src="images/deep-cogito-logo.png" alt="Logo" width="40%">
25
+ </p>
26
+
27
+
28
+ # Cogito v1 preview - 32B
29
+
30
+ [Blog Post](https://www.deepcogito.com/research/cogito-v1-preview)
31
+
32
+ The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
33
+
34
+ - Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
35
+ - The LLMs are trained using **Iterated Distillation and Amplification (IDA)** - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
36
+ - The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
37
+ - In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks.
38
+ - Each model is trained in over 30 languages and supports a context length of 128k.
39
+
40
+ # Evaluations
41
+ We compare our models against the state of the art size equivalent models in direct mode as well as the reasoning mode. For the direct mode, we compare against Llama / Qwen instruct counterparts. For reasoning, we use Deepseek's R1 distilled counterparts / Qwen's QwQ model.
42
+
43
+ <p align="left">
44
+ <img src="images/32b_benchmarks.png" alt="Logo" width="90%">
45
+ </p>
46
+
47
+ **Livebench Global Average:**
48
+ <p align="left">
49
+ <img src="images/livebench_global_average.png" alt="Logo" width="80%">
50
+ </p>
51
+
52
+ For detailed evaluations, please refer to the [Blog Post](https://www.deepcogito.com/research/cogito-v1-preview).
53
+
54
+
55
+ # Usage
56
+ Here is a snippet below for usage with Transformers:
57
+
58
+ ```python
59
+ import transformers
60
+ import torch
61
+
62
+ model_id = "deepcogito/cogito-v1-preview-qwen-32B"
63
+
64
+ pipeline = transformers.pipeline(
65
+ "text-generation",
66
+ model=model_id,
67
+ model_kwargs={"torch_dtype": torch.bfloat16},
68
+ device_map="auto",
69
+ )
70
+
71
+ messages = [
72
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
73
+ {"role": "user", "content": "Give me a short introduction to LLMs."},
74
+ ]
75
+
76
+ outputs = pipeline(
77
+ messages,
78
+ max_new_tokens=512,
79
+ )
80
+
81
+ print(outputs[0]["generated_text"][-1])
82
+ ```
83
+
84
+
85
+
86
+ ## Implementing extended thinking
87
+ - By default, the model will answer in the standard mode.
88
+ - To enable thinking, you can do any one of the two methods:
89
+ - Add a specific system prompt, or
90
+ - Set `enable_thinking=True` while applying the chat template.
91
+
92
+
93
+ ### Method 1 - Add a specific system prompt.
94
+ To enable thinking, simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'`
95
+
96
+ If you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction`.
97
+
98
+ Here is an example -
99
+
100
+ ```python
101
+ import transformers
102
+ import torch
103
+
104
+ model_id = "deepcogito/cogito-v1-preview-qwen-32B"
105
+
106
+ pipeline = transformers.pipeline(
107
+ "text-generation",
108
+ model=model_id,
109
+ model_kwargs={"torch_dtype": torch.bfloat16},
110
+ device_map="auto",
111
+ )
112
+
113
+ DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
114
+
115
+ messages = [
116
+ {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
117
+ {"role": "user", "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."},
118
+ ]
119
+
120
+ outputs = pipeline(
121
+ messages,
122
+ max_new_tokens=512,
123
+ )
124
+
125
+ print(outputs[0]["generated_text"][-1])
126
+ ```
127
+
128
+
129
+ Similarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way -
130
+
131
+ ```python
132
+ DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
133
+
134
+ system_prompt = "Reply to each prompt with only the actual code - no explanations."
135
+ prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
136
+
137
+ messages = [
138
+ {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
139
+ {"role": "user", "content": prompt}
140
+ ]
141
+ ```
142
+
143
+ ### Method 2 - Set enable_thinking=True in the tokenizer
144
+ If you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template).
145
+
146
+ Here is an example -
147
+ ```python
148
+ from transformers import AutoModelForCausalLM, AutoTokenizer
149
+
150
+ model_name = "deepcogito/cogito-v1-preview-qwen-32B"
151
+
152
+ model = AutoModelForCausalLM.from_pretrained(
153
+ model_name,
154
+ torch_dtype="auto",
155
+ device_map="auto"
156
+ )
157
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
158
+
159
+ prompt = "Give me a short introduction to LLMs."
160
+ messages = [
161
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
162
+ {"role": "user", "content": prompt}
163
+ ]
164
+
165
+ text = tokenizer.apply_chat_template(
166
+ messages,
167
+ tokenize=False,
168
+ add_generation_prompt=True,
169
+ enable_thinking=True
170
+ )
171
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
172
+
173
+ generated_ids = model.generate(
174
+ **model_inputs,
175
+ max_new_tokens=512
176
+ )
177
+ generated_ids = [
178
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
179
+ ]
180
+
181
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
182
+ print(response)
183
+ ```
184
+
185
+ # Tool Calling
186
+ Cogito models support tool calling (single, parallel, multiple and parallel_multiple) both in standard and extended thinking mode.
187
+
188
+ Here is a snippet -
189
+
190
+ ```python
191
+ # First, define a tool
192
+ def get_current_temperature(location: str) -> float:
193
+ """
194
+ Get the current temperature at a location.
195
+
196
+ Args:
197
+ location: The location to get the temperature for, in the format "City, Country"
198
+ Returns:
199
+ The current temperature at the specified location in the specified units, as a float.
200
+ """
201
+ return 22. # A real function should probably actually get the temperature!
202
+
203
+ # Next, create a chat and apply the chat template
204
+ messages = [
205
+ {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
206
+ ]
207
+
208
+ model_inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)
209
+
210
+ text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
211
+ inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
212
+ outputs = model.generate(**inputs, max_new_tokens=512)
213
+ output_text = tokenizer.batch_decode(outputs)[0][len(text):]
214
+ print(output_text)
215
+ ```
216
+
217
+ This will result in the output -
218
+ ```
219
+ <tool_call>
220
+ {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
221
+ </tool_call><|im_end|>
222
+ ```
223
+
224
+ You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so:
225
+
226
+ ```python
227
+ tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
228
+ messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
229
+ ```
230
+
231
+ and then call the tool and append the result, with the `tool` role, like so:
232
+
233
+ ```python
234
+ messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
235
+ ```
236
+
237
+ After that, you can `generate()` again to let the model use the tool result in the chat:
238
+
239
+ ```python
240
+ text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
241
+ inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
242
+ outputs = model.generate(**inputs, max_new_tokens=512)
243
+ output_text = tokenizer.batch_decode(outputs)[0][len(text):]
244
+ ```
245
+
246
+ This should result in the string -
247
+ ```
248
+ 'The current temperature in Paris is 22.0 degrees.<|im_end|>'
249
+ ```
250
+
251
+ ## License
252
+ This repository and the model weights are licensed under the Apache 2.0 License Agreement.
253
+
254
+ ## Contact
255
+ If you would like to reach out to our team, send an email to [contact@deepcogito.com](contact@deepcogito.com).