Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ This requires the model to reason through multiple tool invocations (e.g., weath
|
|
18 |
|
19 |
Our training pipeline leverages:
|
20 |
|
21 |
-
-
|
22 |
- **Synthetic multi-step MCP interactions** with strong tool chaining behavior, generated using our internal data engine.
|
23 |
- **SGLang + VeRL** for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.
|
24 |
|
|
|
18 |
|
19 |
Our training pipeline leverages:
|
20 |
|
21 |
+
- [**Dr. GRPO**](https://arxiv.org/abs/2503.20783) for stable and sample-efficient reinforcement learning.
|
22 |
- **Synthetic multi-step MCP interactions** with strong tool chaining behavior, generated using our internal data engine.
|
23 |
- **SGLang + VeRL** for efficient multi-turn rollout environments, built on top of Qwen3-4B for its function-calling capabilities.
|
24 |
|