Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ZyangLee 's Collections
Multimodal
NLP

Multimodal

updated Aug 7, 2024
Upvote
-

  • Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

    Paper • 2406.17294 • Published Jun 25, 2024 • 11

  • TokenPacker: Efficient Visual Projector for Multimodal LLM

    Paper • 2407.02392 • Published Jul 2, 2024 • 24

  • Understanding Alignment in Multimodal LLMs: A Comprehensive Study

    Paper • 2407.02477 • Published Jul 2, 2024 • 24

  • InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Paper • 2407.03320 • Published Jul 3, 2024 • 96

  • Unveiling Encoder-Free Vision-Language Models

    Paper • 2406.11832 • Published Jun 17, 2024 • 55

  • ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

    Paper • 2407.04172 • Published Jul 4, 2024 • 27

  • LLaVA-OneVision: Easy Visual Task Transfer

    Paper • 2408.03326 • Published Aug 6, 2024 • 61
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs