llava-chat / README.md
Prashant26am's picture
Move app.py to root directory for Hugging Face Space deployment
1ea681e

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: LLaVA Chat
emoji: 🖼️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: false
license: mit

LLaVA Chat

A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment.

Features

  • Efficient model loading with 8-bit quantization
  • Memory-optimized inference
  • FastAPI backend with Gradio interface
  • Support for image understanding and visual conversations
  • Optimized for deployment on Hugging Face Spaces

Quick Start

  1. Visit the Hugging Face Space
  2. Upload an image
  3. Ask questions about the image
  4. Get AI-powered responses

Local Development

  1. Clone the repository:
git clone https://github.com/Prashant-ambati/llava-implementation.git
cd llava-implementation
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python llava-chat/app.py

Model Architecture

  • Vision Model: CLIP ViT-Base
  • Language Model: TinyLlama-1.1B-Chat
  • Projection Layer: MLP with configurable hidden dimensions

Memory Optimization

The implementation includes several memory optimization techniques:

  • 8-bit quantization for language model
  • Efficient image processing
  • Gradient checkpointing
  • Memory-efficient attention
  • Automatic mixed precision

API Endpoints

  • POST /process_image: Process an image with a prompt
  • GET /status: Check model and application status

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Based on the paper "Visual Instruction Tuning" (NeurIPS 2023)
  • Uses models from Hugging Face Transformers
  • Built with FastAPI and Gradio