Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
kevin1020 's Collections
Data
RAG
Prompting
Inference Acceleration
LLM Agents
Code Generation
Efficient Tuning
Token Compression
Efficient VLM via Image Token Compression
VLM
Long Context
Reasoning
Visualizations
Forward tuning
PEFT
ViT
Modular
Benchmarks
Efficient LLM

Efficient VLM via Image Token Compression

updated Dec 4, 2024
Upvote
-

  • An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

    Paper • 2403.06764 • Published Mar 11, 2024 • 28

  • TokenPacker: Efficient Visual Projector for Multimodal LLM

    Paper • 2407.02392 • Published Jul 2, 2024 • 24

  • Efficient Inference of Vision Instruction-Following Models with Elastic Cache

    Paper • 2407.18121 • Published Jul 25, 2024 • 17

  • Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

    Paper • 2411.05222 • Published Nov 7, 2024 • 2
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs