thomasjhuang
/

qwen2-rloo-countdown-step250

Text Generation

reinforcement-learning

Model card Files Files and versions Community

qwen2-rloo-countdown-step250 / merges.txt

thomasjhuang's picture

RLOO checkpoint at optimizer step 250 - Fixed prompt format, temp=0.1, lr=3e-6

e4ad155 verified 2 months ago

history contribute delete

1.67 MB

File too large to display, you can check the raw version instead.