RLAIF

Team

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

AngelRaychev updated a model 13 minutes ago

RLAIF/grpo_5e-7_4_1.7B-best

AngelRaychev published a model 15 minutes ago

RLAIF/grpo_5e-7_4_1.7B-best

AngelRaychev updated a model about 2 hours ago

RLAIF/Qwen3-1.7B_grpo_lr2e-7_n4_step30

View all activity

Collections 2

models 11

RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64

Text Generation • 9B • Updated Oct 30, 2024

RLAIF/sft-llama8b-prm-800k-correct-only

Text Generation • 8B • Updated Oct 24, 2024

RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8

8B • Updated Oct 13, 2024

RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking

8B • Updated Oct 11, 2024

View 11 models

datasets 59

RLAIF/val-grm

Viewer • Updated 5 days ago • 2k • 8

RLAIF/train-grm

Viewer • Updated 5 days ago • 20k • 9

RLAIF/val-policy-filtered

Viewer • Updated 5 days ago • 3.49k • 7

RLAIF/train-policy-filtered

Viewer • Updated 5 days ago • 20k • 7

RLAIF/multi-model-judge-comparison-question-view

Viewer • Updated 7 days ago • 100 • 94

RLAIF/multi-model-judge-comparison-flat

Viewer • Updated 7 days ago • 200 • 88

RLAIF/multi-model-judge-comparison-20250729-154625

Viewer • Updated 7 days ago • 200 • 71

RLAIF/multi-model-judge-comparison

Viewer • Updated 7 days ago • 200 • 72

RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-4b-jt07-j200-n200-20250729-192837

Viewer • Updated 7 days ago • 200 • 70

RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-14b-jt07-j200-n200-20250729-192716

Viewer • Updated 7 days ago • 200 • 68

View 59 datasets

RLAIF

AI & ML interests

Recent Activity

Collections 2

SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

SynthLabsAI/ALP_R1_Qwen1.5B

RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks

SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

SynthLabsAI/ALP_R1_Qwen1.5B

RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks

models 11

RLAIF/grpo_5e-7_4_1.7B-best

RLAIF/Qwen3-1.7B_grpo_lr2e-7_n4_step30

RLAIF/reward-model-grpo

RLAIF/llama-3b-open-r1-50k-sft

RLAIF/sft-external

RLAIF/sft-llama-3.1-8b-external

RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64

RLAIF/sft-llama8b-prm-800k-correct-only

RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8

RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking

datasets 59

RLAIF/val-grm

RLAIF/train-grm

RLAIF/val-policy-filtered

RLAIF/train-policy-filtered

RLAIF/multi-model-judge-comparison-question-view

RLAIF/multi-model-judge-comparison-flat

RLAIF/multi-model-judge-comparison-20250729-154625

RLAIF/multi-model-judge-comparison

RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-4b-jt07-j200-n200-20250729-192837

RLAIF/genrm-uf-qwen3-4b-angel-judge-qwen-3-14b-jt07-j200-n200-20250729-192716

AI & ML interests

Recent Activity

Team members 9

Collections 2

models 11 Sort: Recently updated

datasets 59 Sort: Recently updated

models 11

datasets 59