Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.42.0
YourMT3+ Local Setup Guide
π Quick Start (Local Installation)
1. Install Dependencies
pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa
2. Setup Model Weights
- Download YourMT3 model weights
- Place them in:
amt/logs/2024/
- Default expected:
mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt
3. Run Setup Check
cd /path/to/YourMT3
python setup_local.py
4. Quick Test
python test_local.py
5. Launch Web Interface
python app.py
Then open: http://127.0.0.1:7860
π― New Features
Instrument Conditioning
- Problem: YourMT3+ switches instruments mid-track (vocals β violin β guitar)
- Solution: Select target instrument from dropdown
- Options: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute
How It Works
- Upload audio or paste YouTube URL
- Select instrument from dropdown menu
- Click Transcribe
- Get focused transcription without instrument confusion
π§ Troubleshooting
"Unknown event type: transcribe_singing"
This is expected! The error indicates your model doesn't have special task tokens, which is normal. The system will:
- Try task tokens (may fail - that's OK)
- Fall back to post-processing filtering
- Still give you better results
Debug Output
Look for these messages in console:
=== TRANSCRIBE FUNCTION CALLED ===
Audio file: /path/to/audio.wav
Instrument hint: vocals
=== INSTRUMENT CONDITIONING ACTIVATED ===
Model Task Configuration Debug:
β Model has task_manager
Task name: mc13_full_plus_256
Available subtask prefixes: ['default']
=== APPLYING INSTRUMENT FILTER ===
Found instruments in transcription: {0: 45, 100: 123, 40: 12}
Primary instrument: 100 (73% of notes)
Target program for vocals: 100
Converted 57 notes to primary instrument 100
Common Issues
1. Import Errors
pip install torch torchaudio transformers gradio pytorch-lightning
2. Model Not Found
- Download model weights to
amt/logs/2024/
- Check filename matches exactly
3. No Audio Examples
- Place test audio files in
examples/
folder - Supported formats: .wav, .mp3
4. Port Already in Use
- Web interface runs on port 7860
- If busy, it will try 7861, 7862, etc.
π Expected Results
Before (Original YourMT3+)
- Vocals file β outputs: vocals + violin + guitar tracks
- Saxophone solo β incomplete transcription
- Flute solo β single note only
After (With Instrument Conditioning)
- Select "Vocals/Singing" β clean vocal transcription only
- Select "Saxophone" β complete saxophone solo
- Select "Flute" β full flute transcription
π οΈ Advanced Usage
Command Line
python transcribe_cli.py audio.wav --instrument vocals --verbose
Python API
from model_helper import transcribe, load_model_checkpoint
# Load model
model = load_model_checkpoint(args=model_args, device="cuda")
# Transcribe with instrument conditioning
midifile = transcribe(model, audio_info, instrument_hint="vocals")
Confidence Tuning
- High confidence (0.8): Strict instrument filtering
- Low confidence (0.4): Allows more mixed content
- Auto-adjusts based on task token availability
π Files Modified
app.py
- Added instrument dropdown to web interfacemodel_helper.py
- Enhanced transcription with conditioningtranscribe_cli.py
- New command-line interfacesetup_local.py
- Local setup checkertest_local.py
- Quick functionality test
π΅ Enjoy Better Transcriptions!
No more instrument confusion - you now have full control over what gets transcribed! π