Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper • 2508.09726 • Published 20 days ago • 13
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published 26 days ago • 45
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30 • 65
Skywork-Reward-V2 Collection Scaling preference data curation to the extreme • 9 items • Updated Jul 4 • 23
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models Paper • 2506.06751 • Published Jun 7 • 72
Exploring the Latent Capacity of LLMs for One-Step Text Generation Paper • 2505.21189 • Published May 27 • 62
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20 • 78
Falcon-H1 Collection Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 38 items • Updated Jul 31 • 53
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 69
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29 • 93
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Paper • 2504.17192 • Published Apr 24 • 115
RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts Paper • 2504.06947 • Published Apr 9 • 4
Gemma 3 QAT INT4 (from Flax) Collection These are converted from the official QAT INT4 Flax checkpoints on Kaggle. Supported formats: AutoAWQ, GGUF • 12 items • Updated Apr 6 • 6
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper • 2406.09279 • Published Jun 13, 2024 • 3
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 39
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 7 items • Updated May 22 • 13