Haozhan Shen

SZhanZ

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

upvoted a paper about 2 months ago

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

upvoted a paper about 2 months ago

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

View all activity

Organizations

upvoted 4 papers about 2 months ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 3

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper • 2403.06892 • Published Mar 11, 2024 • 2

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Paper • 2308.13177 • Published Aug 25, 2023 • 1

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Paper • 2306.11300 • Published Jun 20, 2023 • 2

upvoted a collection about 2 months ago

Multimodal Research

Collection

10 items • Updated Apr 14 • 4

upvoted 5 papers about 2 months ago

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Paper • 2312.15043 • Published Dec 22, 2023 • 2

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Paper • 2207.00221 • Published Jul 1, 2022 • 2

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

Paper • 2209.05946 • Published Sep 10, 2022 • 2

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Paper • 2407.04923 • Published Jul 6, 2024 • 2

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Paper • 2411.16044 • Published Nov 25, 2024 • 2

liked 3 models about 2 months ago

upvoted a paper 2 months ago

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

Paper • 2506.20923 • Published Jun 26 • 4

liked a model 2 months ago

HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v2

Feature Extraction • 0.5B • Updated Jun 28 • 605 • 25

upvoted a paper 4 months ago

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 186

updated a Space 5 months ago

VLM R1 Referral Expression

💬

Mark regions in images based on text descriptions

updated a dataset 5 months ago

omlab/VLM-R1

Preview • Updated Apr 23 • 830 • 18

upvoted a paper 5 months ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10 • 32

liked a Space 5 months ago

VLM R1 Referral Expression

💬

Mark regions in images based on text descriptions

Haozhan Shen

AI & ML interests

Recent Activity

Organizations

SZhanZ's activity

VLM R1 Referral Expression

VLM R1 Referral Expression