Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
EAustino 's Collections
Conversational Avatar( photorealistic)
Video creation
LLM
Image generation
LMM
Agents
Speech generation
Robotic agents
Open source LLM
Vision Transformer
Medical Al paper
Al safety
3D modeling

LMM

updated Jan 13, 2024
Upvote
-

  • COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

    Paper • 2401.00849 • Published Jan 1, 2024 • 17

  • Learning Vision from Models Rivals Learning Vision from Data

    Paper • 2312.17742 • Published Dec 28, 2023 • 16

  • Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

    Paper • 2312.17661 • Published Dec 29, 2023 • 15

  • A Vision Check-up for Language Models

    Paper • 2401.01862 • Published Jan 3, 2024 • 11

  • Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

    Paper • 2401.01974 • Published Jan 3, 2024 • 7

  • DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Paper • 2401.06066 • Published Jan 11, 2024 • 56

  • LEGO:Language Enhanced Multi-modal Grounding Model

    Paper • 2401.06071 • Published Jan 11, 2024 • 13
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs