--- title: MiniCPM-o Video Analyzer emoji: 🎬 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.20.0 app_file: app.py pinned: false license: apache-2.0 --- # 🎬 MiniCPM-o Video Analyzer A powerful video analysis tool powered by **MiniCPM-o 2.6** - a GPT-4o level multimodal model that can analyze both visual and audio content simultaneously. ## 🚀 Features - **🎯 Frame-by-Frame Analysis**: Detailed narrative and visual analysis of each video frame - **🎨 Visual Psychology**: Color, composition, and emotional trigger analysis - **🚀 Marketing Mechanics**: Persuasion techniques and conversion strategy identification - **📊 Comprehensive Summaries**: Executive-level insights for marketing effectiveness - **🎵 Audio-Visual Integration**: Unified analysis of both visual and audio elements - **⚡ Local Processing**: No external API calls - all processing happens locally ## 🎯 How It Works 1. **Upload Your Video**: Marketing videos up to 30 seconds work best 2. **Automatic Processing**: - Extracts frames at 1fps - Extracts audio track - Analyzes with MiniCPM-o 2.6 3. **Get Insights**: Comprehensive analysis covering narrative, psychology, and marketing effectiveness ## 💡 Key Advantages Over GPT-4o - **💰 Cost-Effective**: No API costs - runs locally on HF Spaces - **🔒 Privacy**: Your videos never leave the processing environment - **🎭 Multimodal**: Analyzes audio and visual elements together - **⚡ Optimized**: Designed for efficiency on consumer hardware ## 📋 What You'll Get ### 📊 Analysis Report - Processing time and technical details - Comprehensive summary of findings - Marketing effectiveness insights ### 🎬 Frame Analysis - Detailed breakdown of each frame - Visual psychology insights - Narrative progression analysis ### 📝 Executive Summary - High-level marketing strategy insights - Conversion optimization recommendations - Competitive analysis angles ## 🛠️ Technical Details - **Model**: MiniCPM-o 2.6 (openbmb/MiniCPM-o-2_6) - **Framework**: Gradio + PyTorch - **Processing**: 1 frame/second extraction + audio analysis - **Hardware**: Optimized for GPU acceleration - **Memory**: Efficient memory usage with torch.float16 ## 🔧 Deployment Instructions ### For Hugging Face Spaces: 1. **Create New Space**: - Go to [Hugging Face Spaces](https://huggingface.co/spaces) - Click "Create new Space" - Choose "Gradio" as SDK - Set to "Public" or "Private" based on your preference 2. **Upload Files**: - Upload `app.py` - Upload `requirements.txt` - Upload this `README.md` 3. **Configure Hardware**: - With HF Pro account, upgrade to GPU (T4 or better recommended) - Set timeout to 30+ minutes for longer video processing 4. **Deploy**: - Space will automatically build and deploy - First run may take 5-10 minutes to download the model ### Hardware Requirements: - **Minimum**: 8GB RAM, 4GB VRAM - **Recommended**: 16GB RAM, 8GB+ VRAM - **Optimal**: 32GB RAM, 12GB+ VRAM ## 📈 Performance Expectations - **Model Loading**: 2-5 minutes (first time) - **30-second video**: 3-8 minutes processing - **Frame Analysis**: ~10-30 seconds per frame - **Summary Generation**: 1-2 minutes ## 🔄 Comparison with Original System | Feature | Original (GPT-4o) | MiniCPM-o Version | |---------|-------------------|-------------------| | **Cost** | $0.10-0.50/video | Free (after hardware) | | **Privacy** | Sends to OpenAI | Fully local | | **Multimodal** | Separate audio/visual | Integrated analysis | | **Speed** | 2-3 minutes | 5-10 minutes | | **Customization** | Limited | Fully customizable | ## 📝 Usage Tips 1. **Video Format**: MP4 works best, other formats supported 2. **Duration**: 15-30 seconds optimal for detailed analysis 3. **Quality**: Higher resolution videos provide better insights 4. **Audio**: Include audio for comprehensive analysis 5. **Content**: Marketing/advertising videos work best ## 🐛 Troubleshooting **Model Loading Issues**: - Check internet connection for initial model download - Ensure sufficient GPU memory (8GB+ recommended) - Try restarting the space if model fails to load **Video Processing Errors**: - Ensure video file is valid and not corrupted - Check file size (under 100MB recommended) - Try converting to MP4 format **Memory Issues**: - Reduce video length or resolution - Close other applications if running locally - Use GPU acceleration if available ## 🤝 Contributing This is a test implementation for comparing MiniCPM-o with GPT-4o based analysis. Potential improvements: - Add more sophisticated audio analysis - Implement batch processing - Add custom prompt templates - Include more detailed performance metrics ## 📄 License Licensed under Apache 2.0. See LICENSE file for details. ## 🙏 Acknowledgments - **MiniCPM-o Team**: For the excellent multimodal model - **OpenBMB**: For open-sourcing the model - **Hugging Face**: For the fantastic Spaces platform - **Gradio**: For the user-friendly interface framework --- *Built with ❤️ for testing MiniCPM-o capabilities vs GPT-4o*