Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.42.0
metadata
title: LLaVA Chat
emoji: 🖼️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: false
license: mit
LLaVA Chat
A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment.
Features
- Efficient model loading with 8-bit quantization
- Memory-optimized inference
- FastAPI backend with Gradio interface
- Support for image understanding and visual conversations
- Optimized for deployment on Hugging Face Spaces
Quick Start
- Visit the Hugging Face Space
- Upload an image
- Ask questions about the image
- Get AI-powered responses
Local Development
- Clone the repository:
git clone https://github.com/Prashant-ambati/llava-implementation.git
cd llava-implementation
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python llava-chat/app.py
Model Architecture
- Vision Model: CLIP ViT-Base
- Language Model: TinyLlama-1.1B-Chat
- Projection Layer: MLP with configurable hidden dimensions
Memory Optimization
The implementation includes several memory optimization techniques:
- 8-bit quantization for language model
- Efficient image processing
- Gradient checkpointing
- Memory-efficient attention
- Automatic mixed precision
API Endpoints
POST /process_image
: Process an image with a promptGET /status
: Check model and application status
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Based on the paper "Visual Instruction Tuning" (NeurIPS 2023)
- Uses models from Hugging Face Transformers
- Built with FastAPI and Gradio