rStar2-Agent-14B: Advanced Agentic Reasoning Model

Model Description

This is a reproduced version of rStar2-Agent, a 14B parameter math reasoning model that achieves performance comparable to 67B DeepSeek-R1 through pure agentic reinforcement learning. The model excels at planning, reasoning, and autonomously using coding tools to efficiently explore, verify, and reflect for complex problem-solving.

Usage

This is an example usage. To reproduce the math evaluation results in technical report, please refer to @microsoft/rstar.

1. Start SGLang Server

First, serve the model using SGLang with the following command:

python -m sglang.launch_server \
    --model-path rstar2-reproduce/rstar2-agent \
    --port 30000 \
    --tensor-parallel-size 4 \
    --tool-call-parser qwen25

Parameters:

--model-path: Path to the rStar2-Agent model
--port: Server port (default: 30000)
--tensor-parallel-size: Number of GPUs for parallel processing
--tool-call-parser: Parser for tool calls (use "qwen25" for this model)

2. Use with OpenAI-compatible API

from openai import OpenAI
import json

# Initialize OpenAI client pointing to SGLang server
client = OpenAI(
    base_url="http://localhost:30000/v1",  # SGLang server URL
    api_key="EMPTY"  # No API key required for local server
)

# Define Python code execution tool for the model
tools = [
    {
        "type": "function", 
        "function": {
            "name": "execute_python_code_with_standard_io",
            "description": "Execute Python code with standard input and capture standard output.\nThis function takes a Python code string and an input string, provides the input string\nthrough standard input (stdin) to the code, and captures and returns any output produced\nthrough standard output (stdout). If the executed code raises an exception, the error\nmessage will be captured and returned instead.",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "A string containing Python code to be executed. The code can read from standard input using the input() function."
                    },
                    "input": {
                        "type": "string", 
                        "description": "A string that will be provided as standard input to the code when it calls input()."
                    }
                },
                "required": ["code", "input"]
            }
        }
    }
]

# Define Python code execution function
def execute_python_code_with_standard_io(code, input_data):
    """
    Execute Python code with standard input and capture output.
    
    Args:
        code (str): Python code to execute
        input_data (str): Input data to provide to the code
        
    Returns:
        str: Output from the executed code or error message
    """
    import subprocess
    import sys
    
    try:
        # Create subprocess to execute Python code
        process = subprocess.Popen(
            [sys.executable, "-c", code],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True
        )
        
        # Send input and get output
        stdout, stderr = process.communicate(input=input_data)
        
        if stderr:
            return f"Error: {stderr}"
        return stdout.strip()
        
    except Exception as e:
        return f"Execution error: {str(e)}"

# Example: Create a math problem conversation
messages = [
    {
        "role": "user", 
        "content": "You must put your answer inside <answer> </answer> tags, i.e., <answer> answer here </answer>. And your final answer will be extracted automatically by the \\boxed{} tag. Solve this math problem: Find the sum of all prime numbers less than 20."
    }
]

# Main conversation loop - handle tool calls until completion
turn_idx = 0
while True:
    print(f'========== Turn: {turn_idx} ==========')
    turn_idx += 1
    
    # Get model response with tool support
    response = client.chat.completions.create(
        model="rstar2-reproduce/rstar2-agent",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # Let model decide when to use tools
        temperature=0.6      # Adjust for creativity vs consistency
    )
    
    # Add the assistant's response to conversation history
    messages.append(response.choices[0].message)
    
    print(f'{response.choices[0].message.content}')
    
    # Check if model wants to use tools
    if response.choices[0].message.tool_calls:        
        # Process each tool call
        for tool_call in response.choices[0].message.tool_calls:
            function_args = json.loads(tool_call.function.arguments)
            
            print(f">>> Executing Code:\n{function_args['code']}")
            input_text = function_args.get('input', '')
            print(f">>> With Input: {input_text if input_text else '(no input)'}")
            
            # Execute the Python code
            result = execute_python_code_with_standard_io(
                function_args["code"], 
                function_args.get("input", "")
            )
            
            print(f">>> Tool result: {result}")
            
            # Add tool response to conversation
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })
    else:
        # No more tool calls, conversation finished
        print("✅ No more tool calls. Conversation finished.")
        break

Citation

If you use this model in your research, please cite:

@misc{shang2025rstar2agentagenticreasoningtechnical,
      title={rStar2-Agent: Agentic Reasoning Technical Report}, 
      author={Ning Shang and Yifei Liu and Yi Zhu and Li Lyna Zhang and Weijiang Xu and Xinyu Guan and Buze Zhang and Bingcheng Dong and Xudong Zhou and Bowen Zhang and Ying Xin and Ziming Miao and Scarlett Li and Fan Yang and Mao Yang},
      year={2025},
      eprint={2508.20722},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.20722}, 
}

License

This model is released under the MIT License.

rstar2-reproduce
/

rStar2-Agent-14B