view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 878
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL By toslali-ibm and 5 others • Jun 3 • 88
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient Paper • 2410.08893 • Published Oct 11, 2024