Shikhar Singh's picture

94 433

Shikhar Singh

AxAI

·

axe--

AI & ML interests

Commonsense & Language Grounding

Recent Activity

liked a model about 21 hours ago

apple/FastVLM-7B

upvoted a paper 4 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

liked a dataset 8 days ago

lmms-lab/DocVQA

View all activity

Organizations

None yet

upvoted a paper 4 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published 5 days ago • 161

upvoted an article 16 days ago

Article

Faster fine-tuning using TRL & Unsloth

By

•

Jan 10, 2024

• 69

upvoted a collection 19 days ago

Qwen2.5-VL (All Versions)

All versions of Qwen2.5-VL including the new 32B version and 4-bit, 16-bit and more! • 16 items • Updated 9 days ago • 20

upvoted a collection 20 days ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 533

upvoted an article about 2 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

By

and 3 others •

Mar 12

• 457

upvoted 4 articles 2 months ago

Article

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

By

and 11 others •

Jun 27

• 28

Article

Vision Language Models (Better, Faster, Stronger)

By

and 4 others •

May 12

• 519

Article

Gemma 3n fully available in the open-source ecosystem!

By

and 7 others •

Jun 26

• 115

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29, 2024

• 355

upvoted a collection 4 months ago

Describe Anything

Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated about 15 hours ago • 55

upvoted a paper 5 months ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 117

upvoted an article 5 months ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

By

and 6 others •

Apr 5

• 146

upvoted a collection 5 months ago

Whisper Release

Whisper includes both English-only and multilingual checkpoints for ASR and ST, ranging from 38M params for the tiny models to 1.5B params for large. • 12 items • Updated Sep 13, 2023 • 130

upvoted a collection 6 months ago

Gemma 3 Release

28 items • Updated 19 days ago • 490

upvoted 5 papers 6 months ago

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20 • 91

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 193

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 146

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 106

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 203

upvoted an article 6 months ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

By

and 2 others •

Feb 19

• 72