Spaces:
Runtime error
Runtime error
File size: 3,732 Bytes
c207bc4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# YourMT3+ Local Setup Guide
## π Quick Start (Local Installation)
### 1. Install Dependencies
```bash
pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa
```
### 2. Setup Model Weights
- Download YourMT3 model weights
- Place them in: `amt/logs/2024/`
- Default expected: `mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt`
### 3. Run Setup Check
```bash
cd /path/to/YourMT3
python setup_local.py
```
### 4. Quick Test
```bash
python test_local.py
```
### 5. Launch Web Interface
```bash
python app.py
```
Then open: http://127.0.0.1:7860
## π― New Features
### Instrument Conditioning
- **Problem**: YourMT3+ switches instruments mid-track (vocals β violin β guitar)
- **Solution**: Select target instrument from dropdown
- **Options**: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute
### How It Works
1. **Upload audio** or paste YouTube URL
2. **Select instrument** from dropdown menu
3. **Click Transcribe**
4. **Get focused transcription** without instrument confusion
## π§ Troubleshooting
### "Unknown event type: transcribe_singing"
**This is expected!** The error indicates your model doesn't have special task tokens, which is normal. The system will:
1. Try task tokens (may fail - that's OK)
2. Fall back to post-processing filtering
3. Still give you better results
### Debug Output
Look for these messages in console:
```
=== TRANSCRIBE FUNCTION CALLED ===
Audio file: /path/to/audio.wav
Instrument hint: vocals
=== INSTRUMENT CONDITIONING ACTIVATED ===
Model Task Configuration Debug:
β Model has task_manager
Task name: mc13_full_plus_256
Available subtask prefixes: ['default']
=== APPLYING INSTRUMENT FILTER ===
Found instruments in transcription: {0: 45, 100: 123, 40: 12}
Primary instrument: 100 (73% of notes)
Target program for vocals: 100
Converted 57 notes to primary instrument 100
```
### Common Issues
**1. Import Errors**
```bash
pip install torch torchaudio transformers gradio pytorch-lightning
```
**2. Model Not Found**
- Download model weights to `amt/logs/2024/`
- Check filename matches exactly
**3. No Audio Examples**
- Place test audio files in `examples/` folder
- Supported formats: .wav, .mp3
**4. Port Already in Use**
- Web interface runs on port 7860
- If busy, it will try 7861, 7862, etc.
## π Expected Results
### Before (Original YourMT3+)
- Vocals file β outputs: vocals + violin + guitar tracks
- Saxophone solo β incomplete transcription
- Flute solo β single note only
### After (With Instrument Conditioning)
- Select "Vocals/Singing" β clean vocal transcription only
- Select "Saxophone" β complete saxophone solo
- Select "Flute" β full flute transcription
## π οΈ Advanced Usage
### Command Line
```bash
python transcribe_cli.py audio.wav --instrument vocals --verbose
```
### Python API
```python
from model_helper import transcribe, load_model_checkpoint
# Load model
model = load_model_checkpoint(args=model_args, device="cuda")
# Transcribe with instrument conditioning
midifile = transcribe(model, audio_info, instrument_hint="vocals")
```
### Confidence Tuning
- High confidence (0.8): Strict instrument filtering
- Low confidence (0.4): Allows more mixed content
- Auto-adjusts based on task token availability
## π Files Modified
- `app.py` - Added instrument dropdown to web interface
- `model_helper.py` - Enhanced transcription with conditioning
- `transcribe_cli.py` - New command-line interface
- `setup_local.py` - Local setup checker
- `test_local.py` - Quick functionality test
## π΅ Enjoy Better Transcriptions!
No more instrument confusion - you now have full control over what gets transcribed! π
|