Collections
Discover the best community collections!
Collections including paper arxiv:2506.07491
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Paper • 2508.09983 • Published • 67 -
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Paper • 2503.01710 • Published • 6 -
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 126
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Paper • 2506.16504 • Published • 26 -
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Paper • 2507.23478 • Published • 15
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 96 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 290 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 60 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 133
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
RynnEC: Bringing MLLMs into Embodied World
Paper • 2508.14160 • Published • 18 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 290
-
A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals' Navigation
Paper • 2407.16777 • Published -
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection
Paper • 2308.05991 • Published -
jxu124/objects365
Viewer • Updated • 1.82M • 90 • 2
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 112 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 101
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
manycore-research/SpatialLM1.1-Qwen-0.5B
Text Generation • 0.6B • Updated • 4.19k • 21 -
manycore-research/SpatialLM1.1-Llama-1B
Text Generation • 1B • Updated • 349 • 13 -
manycore-research/SpatialLM-Qwen-0.5B
Text Generation • 0.5B • Updated • 378 • 91
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
RynnEC: Bringing MLLMs into Embodied World
Paper • 2508.14160 • Published • 18 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 290
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Paper • 2508.09983 • Published • 67 -
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Paper • 2503.01710 • Published • 6 -
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 126
-
A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals' Navigation
Paper • 2407.16777 • Published -
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection
Paper • 2308.05991 • Published -
jxu124/objects365
Viewer • Updated • 1.82M • 90 • 2
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 112 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 101
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Paper • 2506.16504 • Published • 26 -
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Paper • 2507.23478 • Published • 15
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 96 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 290 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 60 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 133
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 49 -
manycore-research/SpatialLM1.1-Qwen-0.5B
Text Generation • 0.6B • Updated • 4.19k • 21 -
manycore-research/SpatialLM1.1-Llama-1B
Text Generation • 1B • Updated • 349 • 13 -
manycore-research/SpatialLM-Qwen-0.5B
Text Generation • 0.5B • Updated • 378 • 91