Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 97
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL Paper • 2504.15077 • Published Apr 21 • 16
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 137
CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era Paper • 2412.18702 • Published Dec 24, 2024 • 8
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 313