EgoTwin: Dreaming Body and View in First Person Paper • 2508.13013 • Published 26 days ago • 19
4DNeX: Feed-Forward 4D Generative Modeling Made Easy Paper • 2508.13154 • Published 26 days ago • 58
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Paper • 2505.17022 • Published May 22 • 27
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 120
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family Paper • 2504.18225 • Published Apr 25 • 13
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 52
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published Mar 10 • 43
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 74