TRL documentation
Paper Index
You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v0.21.0).
Paper Index
Section under construction. Feel free to contribute!
Group Sequence Policy Optimization
📜 Paper: https://huggingface.co/papers/2507.18071
GSPO is a GRPO variant that computes importance sampling weights at the sequence level instead of per-token. To reproduce the paper’s setting, use this configuration:
from trl import GRPOConfig
training_args = GRPOConfig(
importance_sampling_level="sequence",
loss_type="grpo",
beta=0.0, # GSPO set kl regularization to zero: https://github.com/volcengine/verl/pull/2775#issuecomment-3131807306
epsilon=3e-4, # GSPO paper (v2), section 5.1
epsilon_high=4e-4, # GSPO paper (v2), section 5.1
gradient_accumulation_steps=1,
steps_per_generation=4, # partition rollout batch into 4 mini-batches. GSPO paper (v2), section 5.1. Must be 4 times gradient_accumulation_steps
)