Spaces:

Prashant26am
/

llava-chat

Sleeping

App Files Files Community

llava-chat / README.md

Prashant26am's picture

Move app.py to root directory for Hugging Face Space deployment

1ea681e 3 months ago

|

history blame contribute delete

1.79 kB

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

metadata

title: LLaVA Chat
emoji: 🖼️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: false
license: mit

LLaVA Chat

A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment.

Features

Efficient model loading with 8-bit quantization
Memory-optimized inference
FastAPI backend with Gradio interface
Support for image understanding and visual conversations
Optimized for deployment on Hugging Face Spaces

Quick Start

Visit the Hugging Face Space
Upload an image
Ask questions about the image
Get AI-powered responses

Local Development

Clone the repository:

git clone https://github.com/Prashant-ambati/llava-implementation.git
cd llava-implementation

Install dependencies:

pip install -r requirements.txt

Run the application:

python llava-chat/app.py

Model Architecture

Vision Model: CLIP ViT-Base
Language Model: TinyLlama-1.1B-Chat
Projection Layer: MLP with configurable hidden dimensions

Memory Optimization

The implementation includes several memory optimization techniques:

8-bit quantization for language model
Efficient image processing
Gradient checkpointing
Memory-efficient attention
Automatic mixed precision

API Endpoints

POST /process_image: Process an image with a prompt
GET /status: Check model and application status

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Based on the paper "Visual Instruction Tuning" (NeurIPS 2023)
Uses models from Hugging Face Transformers
Built with FastAPI and Gradio