SynthLabsAI
/

ALP_DeepScaleR_1.5B_C16K

Reinforcement Learning

Model card Files Files and versions

ALP_DeepScaleR_1.5B_C16K / README.md

nlile's picture

Create README.md

2676b72 verified about 2 months ago

|

history blame contribute delete

715 Bytes

metadata

license: apache-2.0
tags:
  - reasoning
  - mathematics
  - reinforcement-learning
datasets:
  - AIME
  - AMC
  - Omni-Math
base_model: DeepScaleR-1.5B

ALP_DeepScaleR_1.5B_C16K

DeepScaleR-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.

Training

100 steps GRPO, batch 512, LR 1e-6, β=1e-7
16 rollouts/prompt for difficulty estimation
16K context window

Performance (Pass@1)

MATH-500: 0.80
AIME: 0.24
OlympiadBench: 0.51

Token Usage

MATH: 2326→646 (-72%)
AIME: 3906→2254 (-42%)
Olympiad: 3309→2107 (-36%)

Usage

prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."