Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.41.0
metadata
license: mit
title: ' π€ Long-Form Text-to-Speech Generator'
sdk: gradio
emoji: π
colorFrom: indigo
colorTo: red
pinned: true
short_description: 'Unlimited Text Length**: Handle texts of any size.'
π€ Long-Form Text-to-Speech Generator
A powerful Hugging Face Space that converts text of any length into natural, human-like speech using completely free AI models.
β¨ Features
- π Unlimited Text Length: Handle texts of any size, from short sentences to entire articles
- π€ Human-like Voice: Uses Microsoft's SpeechT5 model for natural speech synthesis
- β‘ Smart Text Processing: Intelligent chunking preserves sentence flow and natural pauses
- π Completely Free: Uses only open-source models, no API keys required
- π§ Auto-preprocessing: Handles abbreviations, numbers, and text normalization
- π± Easy to Use: Simple web interface built with Gradio
π οΈ How It Works
- Text Preprocessing: Cleans and normalizes input text, handling abbreviations and numbers
- Smart Chunking: Splits long text at natural sentence boundaries (max 500 chars per chunk)
- Speech Generation: Processes each chunk using SpeechT5 TTS model
- Audio Merging: Combines all audio segments with natural pauses between chunks
π Models Used
- Text-to-Speech:
microsoft/speecht5_tts
- High-quality neural TTS - Vocoder:
microsoft/speecht5_hifigan
- Neural vocoder for audio generation - Speaker Embeddings: CMU Arctic dataset for consistent voice characteristics
π» Usage
- Enter or paste your text in the input box (no length limit!)
- Click "Generate Speech"
- Wait for processing (longer texts take more time)
- Download or play the generated audio
π Tips for Best Results
- Use proper punctuation for natural pauses
- Well-formatted text produces better speech quality
- The system automatically handles common abbreviations
- Numbers are converted to spoken form
π§ Technical Details
- Architecture: Transformer-based neural TTS
- Sample Rate: 16 kHz
- Audio Format: WAV
- Processing: CPU-optimized (works on free Hugging Face hardware)
- Memory Efficient: Processes text in chunks to handle large documents
π Local Installation
git clone <your-space-url>
cd <your-space-name>
pip install -r requirements.txt
python app.py
π License
This project uses open-source models and is available for free use. Please check individual model licenses:
- SpeechT5: Microsoft Research License
- CMU Arctic: Academic/Research License
π€ Contributing
Feel free to submit issues and enhancement requests!
π Links
Built with β€οΈ using Hugging Face Transformers and Gradio