Spaces:
Sleeping
Sleeping
title: LLaVA Chat | |
emoji: 🖼️ | |
colorFrom: blue | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 4.19.2 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# LLaVA Chat | |
A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment. | |
## Features | |
- Efficient model loading with 8-bit quantization | |
- Memory-optimized inference | |
- FastAPI backend with Gradio interface | |
- Support for image understanding and visual conversations | |
- Optimized for deployment on Hugging Face Spaces | |
## Quick Start | |
1. Visit the [Hugging Face Space](https://huggingface.co/spaces/Prashant26am/llava-chat) | |
2. Upload an image | |
3. Ask questions about the image | |
4. Get AI-powered responses | |
## Local Development | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/Prashant-ambati/llava-implementation.git | |
cd llava-implementation | |
``` | |
2. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Run the application: | |
```bash | |
python llava-chat/app.py | |
``` | |
## Model Architecture | |
- Vision Model: CLIP ViT-Base | |
- Language Model: TinyLlama-1.1B-Chat | |
- Projection Layer: MLP with configurable hidden dimensions | |
## Memory Optimization | |
The implementation includes several memory optimization techniques: | |
- 8-bit quantization for language model | |
- Efficient image processing | |
- Gradient checkpointing | |
- Memory-efficient attention | |
- Automatic mixed precision | |
## API Endpoints | |
- `POST /process_image`: Process an image with a prompt | |
- `GET /status`: Check model and application status | |
## License | |
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
## Acknowledgments | |
- Based on the paper "Visual Instruction Tuning" (NeurIPS 2023) | |
- Uses models from Hugging Face Transformers | |
- Built with FastAPI and Gradio | |