Sourav Mishra's picture

11

Sourav Mishra

srvmishra832

·

AI & ML interests

LLMs and VLMs

Recent Activity

upvoted a paper 8 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

upvoted a paper 8 days ago

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

upvoted a paper 8 days ago

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

View all activity

Organizations

upvoted 11 papers 8 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published 11 days ago • 179

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published 23 days ago • 54

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published 16 days ago • 35

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published 16 days ago • 57

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20 • 27

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Paper • 2506.04614 • Published Jun 5 • 18

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 135

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published 10 days ago • 39

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 241

VibeVoice Technical Report

Paper • 2508.19205 • Published 10 days ago • 120

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published 15 days ago • 130