metadata

license: mit
title: ' 🎤 Long-Form Text-to-Speech Generator'
sdk: gradio
emoji: 🚀
colorFrom: indigo
colorTo: red
pinned: true
short_description: 'Unlimited Text Length**: Handle texts of any size.'

🎤 Long-Form Text-to-Speech Generator

A powerful Hugging Face Space that converts text of any length into natural, human-like speech using completely free AI models.

✨ Features

🚀 Unlimited Text Length: Handle texts of any size, from short sentences to entire articles
🤖 Human-like Voice: Uses Microsoft's SpeechT5 model for natural speech synthesis
⚡ Smart Text Processing: Intelligent chunking preserves sentence flow and natural pauses
🆓 Completely Free: Uses only open-source models, no API keys required
🔧 Auto-preprocessing: Handles abbreviations, numbers, and text normalization
📱 Easy to Use: Simple web interface built with Gradio

🛠️ How It Works

Text Preprocessing: Cleans and normalizes input text, handling abbreviations and numbers
Smart Chunking: Splits long text at natural sentence boundaries (max 500 chars per chunk)
Speech Generation: Processes each chunk using SpeechT5 TTS model
Audio Merging: Combines all audio segments with natural pauses between chunks

🚀 Models Used

Text-to-Speech: microsoft/speecht5_tts - High-quality neural TTS
Vocoder: microsoft/speecht5_hifigan - Neural vocoder for audio generation
Speaker Embeddings: CMU Arctic dataset for consistent voice characteristics

💻 Usage

Enter or paste your text in the input box (no length limit!)
Click "Generate Speech"
Wait for processing (longer texts take more time)
Download or play the generated audio

📝 Tips for Best Results

Use proper punctuation for natural pauses
Well-formatted text produces better speech quality
The system automatically handles common abbreviations
Numbers are converted to spoken form

🔧 Technical Details

Architecture: Transformer-based neural TTS
Sample Rate: 16 kHz
Audio Format: WAV
Processing: CPU-optimized (works on free Hugging Face hardware)
Memory Efficient: Processes text in chunks to handle large documents

🚀 Local Installation

git clone <your-space-url>
cd <your-space-name>
pip install -r requirements.txt
python app.py

📄 License

This project uses open-source models and is available for free use. Please check individual model licenses:

SpeechT5: Microsoft Research License
CMU Arctic: Academic/Research License

🤝 Contributing

Feel free to submit issues and enhancement requests!

🔗 Links

Built with ❤️ using Hugging Face Transformers and Gradio