Safetensors
GGUF
qwen3
conversational
AndyGulp commited on
Commit
7215f29
·
verified ·
1 Parent(s): 2b86fc5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ This requires the model to reason through multiple tool invocations (e.g., weath
18
 
19
  Our training pipeline leverages:
20
 
21
- - (**Dr. GRPO**)[https://arxiv.org/abs/2503.20783] for stable and sample-efficient reinforcement learning.
22
  - **Synthetic multi-step MCP interactions** with strong tool chaining behavior, generated using our internal data engine.
23
  - **SGLang + VeRL** for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.
24
 
 
18
 
19
  Our training pipeline leverages:
20
 
21
+ - [**Dr. GRPO**](https://arxiv.org/abs/2503.20783) for stable and sample-efficient reinforcement learning.
22
  - **Synthetic multi-step MCP interactions** with strong tool chaining behavior, generated using our internal data engine.
23
  - **SGLang + VeRL** for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.
24