-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 45 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2505.07263
-
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper • 2504.16656 • Published • 58 -
Skywork/Skywork-R1V2-38B
Image-Text-to-Text • 38B • Updated • 72 • 126 -
Skywork/Skywork-R1V2-38B-AWQ
Image-Text-to-Text • Updated • 42 • 11 -
Skywork/Skywork-VL-Reward-7B
Image-Text-to-Text • 8B • Updated • 523 • 45
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 89
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
shuttleai/shuttle-3.5
Text Generation • 33B • Updated • 1.1k • • 44 -
Tesslate/UIGEN-T2-7B-Q8_0-GGUF
Text Generation • 8B • Updated • 38 • 134 -
nvidia/OpenCodeReasoning-Nemotron-32B
Text Generation • 33B • Updated • 1.19k • 72 -
nvidia/OpenCodeReasoning-Nemotron-14B
Text Generation • 15B • Updated • 745 • 18
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.88k • 1.16k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.15k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 45 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 24
-
shuttleai/shuttle-3.5
Text Generation • 33B • Updated • 1.1k • • 44 -
Tesslate/UIGEN-T2-7B-Q8_0-GGUF
Text Generation • 8B • Updated • 38 • 134 -
nvidia/OpenCodeReasoning-Nemotron-32B
Text Generation • 33B • Updated • 1.19k • 72 -
nvidia/OpenCodeReasoning-Nemotron-14B
Text Generation • 15B • Updated • 745 • 18
-
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper • 2504.16656 • Published • 58 -
Skywork/Skywork-R1V2-38B
Image-Text-to-Text • 38B • Updated • 72 • 126 -
Skywork/Skywork-R1V2-38B-AWQ
Image-Text-to-Text • Updated • 42 • 11 -
Skywork/Skywork-VL-Reward-7B
Image-Text-to-Text • 8B • Updated • 523 • 45
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.88k • 1.16k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.15k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 89
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5