stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 73
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 13
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 11
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 88
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 7
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 8
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 131
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 16
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 70
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 18
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 7
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated Jun 13 • 235
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated Jun 13 • 11
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated Jun 13 • 18