-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 9 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 6 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 1
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
Organization Card
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated • 4 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated • 6 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 9 • 2
-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 9 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 6 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 1
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated • 4 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated • 6 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 9 • 2
models
20

Cornell-AGI/apo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
6

Cornell-AGI/ppo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
7

Cornell-AGI/rebel_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
8

Cornell-AGI/grpo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
7

Cornell-AGI/grpo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
8

Cornell-AGI/ppo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
11

Cornell-AGI/rebel_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
7

Cornell-AGI/apo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
7

Cornell-AGI/grpo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
8

Cornell-AGI/ppo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
8
datasets
15
Cornell-AGI/math_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.5k
•
31
Cornell-AGI/math_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.5k
•
3
Cornell-AGI/math_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.5k
•
13
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.47k
•
1
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.47k
•
6
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.47k
•
9
Cornell-AGI/amazon_movie_tv_item_mxbai
Viewer
•
Updated
•
10.5k
•
9
Cornell-AGI/amazon_movie_tv_llama_mxbai
Viewer
•
Updated
•
17.1k
•
10
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Viewer
•
Updated
•
116k
•
6
•
1
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer
•
Updated
•
64.6k
•
9
•
2