view article Article Introducing Marvis TTS: Real-Time Streaming Speech Synthesis By prince-canuma • 4 days ago • 5
view article Article NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset By nvidia and 4 others • 11 days ago • 15
Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decode Paper • 2508.04107 • Published 25 days ago • 4
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation Paper • 2411.10086 • Published Nov 15, 2024 • 2
FLAIR-HUB: Large-scale Multimodal Dataset for Land Cover and Crop Mapping Paper • 2506.07080 • Published Jun 8 • 6
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts Paper • 2508.07785 • Published 20 days ago • 25
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated 2 days ago • 75
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated 10 days ago • 259
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • 20 days ago • 68
view article Article How To Build a News Agent with GPT-OSS, Hugging Face Inference & Gradio By fdaudens • 17 days ago • 21
MM Grounding DINO Collection See: https://github.com/huggingface/transformers/pull/37925 • 8 items • Updated Jun 26 • 4
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • Jul 29 • 164
view article Article Build an AI Shopping Assistant with Gradio MCP Servers By freddyaboulton • Jul 31 • 51
view article Article Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ By Wauplin and 2 others • Jul 25 • 80
view article Article Unlocking Healthcare AI: I'm Releasing State-of-the-Art Medical Models for Free. Forever. By MaziyarPanahi • Jul 16 • 135
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others • Jul 23 • 39