
Qwen/Qwen2.5-Omni-7B
Any-to-Any
•
11B
•
Updated
•
116k
•
1.73k
This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update
Generate text and speech responses from various inputs
A unified multimodal understanding and generation model.
Chat with images, videos, or PDFs to generate text
Generate answers by combining text and images
Generate answers by combining text and images
Generate responses using images and text input
Annotate and describe images with text prompts
Generate text or segment objects from an image
Demo for ShieldGemma 2, multimodal safety model
Check if text and images are safe
Interact with a multimodal chatbot using text and images
Chat with images and videos using Qwen
Generate responses to video or image inputs