Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
Recent Activity
View all activity
models
11
RLAIF/grpo_5e-7_4_1.7B-best
2B
•
Updated
RLAIF/Qwen3-1.7B_grpo_lr2e-7_n4_step30
2B
•
Updated
RLAIF/reward-model-grpo
0.8B
•
Updated
•
2
RLAIF/llama-3b-open-r1-50k-sft
4B
•
Updated
•
2
RLAIF/sft-external
Text Generation
•
8B
•
Updated
RLAIF/sft-llama-3.1-8b-external
Text Generation
•
8B
•
Updated
RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64
Text Generation
•
9B
•
Updated
RLAIF/sft-llama8b-prm-800k-correct-only
Text Generation
•
8B
•
Updated
RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8
8B
•
Updated
RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking
8B
•
Updated
datasets
59
RLAIF/val-grm
Viewer
•
Updated
•
2k
•
8
RLAIF/train-grm
Viewer
•
Updated
•
20k
•
9
RLAIF/val-policy-filtered
Viewer
•
Updated
•
3.49k
•
7
RLAIF/train-policy-filtered
Viewer
•
Updated
•
20k
•
7
RLAIF/multi-model-judge-comparison-question-view
Viewer
•
Updated
•
100
•
94
RLAIF/multi-model-judge-comparison-flat
Viewer
•
Updated
•
200
•
88
RLAIF/multi-model-judge-comparison-20250729-154625
Viewer
•
Updated
•
200
•
71
RLAIF/multi-model-judge-comparison
Viewer
•
Updated
•
200
•
72
RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-4b-jt07-j200-n200-20250729-192837
Viewer
•
Updated
•
200
•
70
RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-14b-jt07-j200-n200-20250729-192716
Viewer
•
Updated
•
200
•
68