Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.01191

about 6 hours ago

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 5.64k • 42
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 271
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 12 days ago • 22
Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published 7 days ago • 21
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 9 days ago • 14
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published 13 days ago • 242

research-catchup

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Paper • 2508.01059 • Published Aug 1 • 33
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published 27 days ago • 169
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published 26 days ago • 171

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published 20 days ago • 91
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models

Paper • 2508.03363 • Published 30 days ago • 1

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 31
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 138
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 134
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 87

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

Papers Pertinent or Protuberant

The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

Paper • 2507.23313 • Published Jul 31 • 1
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

Paper • 2508.03448 • Published 30 days ago • 1
C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor

Paper • 2508.01311 • Published Aug 2 • 2
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model

Paper • 2505.21179 • Published May 27 • 13

about 6 hours ago

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 5.64k • 42
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 271
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 31
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 138
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 134
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 87

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published 12 days ago • 22
Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published 7 days ago • 21
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 9 days ago • 14
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published 13 days ago • 242

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

research-catchup

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Paper • 2508.01059 • Published Aug 1 • 33
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published 27 days ago • 169
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published 26 days ago • 171

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published 20 days ago • 91
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models

Paper • 2508.03363 • Published 30 days ago • 1

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 233

Papers Pertinent or Protuberant

The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

Paper • 2507.23313 • Published Jul 31 • 1
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

Paper • 2508.03448 • Published 30 days ago • 1
C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor

Paper • 2508.01311 • Published Aug 2 • 2
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model

Paper • 2505.21179 • Published May 27 • 13

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs