---
title: MiniCPM-o Video Analyzer
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.20.0
app_file: app.py
pinned: false
license: apache-2.0
---

# 🎬 MiniCPM-o Video Analyzer

A powerful video analysis tool powered by **MiniCPM-o 2.6** - a GPT-4o level multimodal model that can analyze both visual and audio content simultaneously.

## 🚀 Features

- **🎯 Frame-by-Frame Analysis**: Detailed narrative and visual analysis of each video frame
- **🎨 Visual Psychology**: Color, composition, and emotional trigger analysis
- **🚀 Marketing Mechanics**: Persuasion techniques and conversion strategy identification
- **📊 Comprehensive Summaries**: Executive-level insights for marketing effectiveness
- **🎵 Audio-Visual Integration**: Unified analysis of both visual and audio elements
- **⚡ Local Processing**: No external API calls - all processing happens locally

## 🎯 How It Works

1. **Upload Your Video**: Marketing videos up to 30 seconds work best
2. **Automatic Processing**: 
   - Extracts frames at 1fps
   - Extracts audio track
   - Analyzes with MiniCPM-o 2.6
3. **Get Insights**: Comprehensive analysis covering narrative, psychology, and marketing effectiveness

## 💡 Key Advantages Over GPT-4o

- **💰 Cost-Effective**: No API costs - runs locally on HF Spaces
- **🔒 Privacy**: Your videos never leave the processing environment
- **🎭 Multimodal**: Analyzes audio and visual elements together
- **⚡ Optimized**: Designed for efficiency on consumer hardware

## 📋 What You'll Get

### 📊 Analysis Report
- Processing time and technical details
- Comprehensive summary of findings
- Marketing effectiveness insights

### 🎬 Frame Analysis
- Detailed breakdown of each frame
- Visual psychology insights
- Narrative progression analysis

### 📝 Executive Summary
- High-level marketing strategy insights
- Conversion optimization recommendations
- Competitive analysis angles

## 🛠️ Technical Details

- **Model**: MiniCPM-o 2.6 (openbmb/MiniCPM-o-2_6)
- **Framework**: Gradio + PyTorch
- **Processing**: 1 frame/second extraction + audio analysis
- **Hardware**: Optimized for GPU acceleration
- **Memory**: Efficient memory usage with torch.float16

## 🔧 Deployment Instructions

### For Hugging Face Spaces:

1. **Create New Space**:
   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
   - Click "Create new Space"
   - Choose "Gradio" as SDK
   - Set to "Public" or "Private" based on your preference

2. **Upload Files**:
   - Upload `app.py`
   - Upload `requirements.txt`
   - Upload this `README.md`

3. **Configure Hardware**:
   - With HF Pro account, upgrade to GPU (T4 or better recommended)
   - Set timeout to 30+ minutes for longer video processing

4. **Deploy**:
   - Space will automatically build and deploy
   - First run may take 5-10 minutes to download the model

### Hardware Requirements:
- **Minimum**: 8GB RAM, 4GB VRAM
- **Recommended**: 16GB RAM, 8GB+ VRAM
- **Optimal**: 32GB RAM, 12GB+ VRAM

## 📈 Performance Expectations

- **Model Loading**: 2-5 minutes (first time)
- **30-second video**: 3-8 minutes processing
- **Frame Analysis**: ~10-30 seconds per frame
- **Summary Generation**: 1-2 minutes

## 🔄 Comparison with Original System

| Feature | Original (GPT-4o) | MiniCPM-o Version |
|---------|-------------------|-------------------|
| **Cost** | $0.10-0.50/video | Free (after hardware) |
| **Privacy** | Sends to OpenAI | Fully local |
| **Multimodal** | Separate audio/visual | Integrated analysis |
| **Speed** | 2-3 minutes | 5-10 minutes |
| **Customization** | Limited | Fully customizable |

## 📝 Usage Tips

1. **Video Format**: MP4 works best, other formats supported
2. **Duration**: 15-30 seconds optimal for detailed analysis
3. **Quality**: Higher resolution videos provide better insights
4. **Audio**: Include audio for comprehensive analysis
5. **Content**: Marketing/advertising videos work best

## 🐛 Troubleshooting

**Model Loading Issues**:
- Check internet connection for initial model download
- Ensure sufficient GPU memory (8GB+ recommended)
- Try restarting the space if model fails to load

**Video Processing Errors**:
- Ensure video file is valid and not corrupted
- Check file size (under 100MB recommended)
- Try converting to MP4 format

**Memory Issues**:
- Reduce video length or resolution
- Close other applications if running locally
- Use GPU acceleration if available

## 🤝 Contributing

This is a test implementation for comparing MiniCPM-o with GPT-4o based analysis. 

Potential improvements:
- Add more sophisticated audio analysis
- Implement batch processing
- Add custom prompt templates
- Include more detailed performance metrics

## 📄 License

Licensed under Apache 2.0. See LICENSE file for details.

## 🙏 Acknowledgments

- **MiniCPM-o Team**: For the excellent multimodal model
- **OpenBMB**: For open-sourcing the model
- **Hugging Face**: For the fantastic Spaces platform
- **Gradio**: For the user-friendly interface framework

---

*Built with ❤️ for testing MiniCPM-o capabilities vs GPT-4o*