Spaces:

asdd12e2ad
/

yourmt3

Runtime error

File size: 3,732 Bytes

c207bc4

# YourMT3+ Local Setup Guide

## 🚀 Quick Start (Local Installation)

### 1. Install Dependencies
```bash
pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa
```

### 2. Setup Model Weights
- Download YourMT3 model weights
- Place them in: `amt/logs/2024/`
- Default expected: `mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt`

### 3. Run Setup Check
```bash
cd /path/to/YourMT3
python setup_local.py
```

### 4. Quick Test
```bash
python test_local.py
```

### 5. Launch Web Interface
```bash
python app.py
```
Then open: http://127.0.0.1:7860

## 🎯 New Features

### Instrument Conditioning
- **Problem**: YourMT3+ switches instruments mid-track (vocals → violin → guitar)
- **Solution**: Select target instrument from dropdown
- **Options**: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute

### How It Works
1. **Upload audio** or paste YouTube URL
2. **Select instrument** from dropdown menu  
3. **Click Transcribe**
4. **Get focused transcription** without instrument confusion

## 🔧 Troubleshooting

### "Unknown event type: transcribe_singing"
**This is expected!** The error indicates your model doesn't have special task tokens, which is normal. The system will:
1. Try task tokens (may fail - that's OK)
2. Fall back to post-processing filtering
3. Still give you better results

### Debug Output
Look for these messages in console:
```
=== TRANSCRIBE FUNCTION CALLED ===
Audio file: /path/to/audio.wav
Instrument hint: vocals

=== INSTRUMENT CONDITIONING ACTIVATED ===
Model Task Configuration Debug:
✓ Model has task_manager
  Task name: mc13_full_plus_256
  Available subtask prefixes: ['default']

=== APPLYING INSTRUMENT FILTER ===
Found instruments in transcription: {0: 45, 100: 123, 40: 12}
Primary instrument: 100 (73% of notes)
Target program for vocals: 100
Converted 57 notes to primary instrument 100
```

### Common Issues

**1. Import Errors**
```bash
pip install torch torchaudio transformers gradio pytorch-lightning
```

**2. Model Not Found**
- Download model weights to `amt/logs/2024/`
- Check filename matches exactly

**3. No Audio Examples**
- Place test audio files in `examples/` folder
- Supported formats: .wav, .mp3

**4. Port Already in Use**
- Web interface runs on port 7860
- If busy, it will try 7861, 7862, etc.

## 📊 Expected Results

### Before (Original YourMT3+)
- Vocals file → outputs: vocals + violin + guitar tracks
- Saxophone solo → incomplete transcription
- Flute solo → single note only

### After (With Instrument Conditioning)
- Select "Vocals/Singing" → clean vocal transcription only
- Select "Saxophone" → complete saxophone solo
- Select "Flute" → full flute transcription

## 🛠️ Advanced Usage

### Command Line
```bash
python transcribe_cli.py audio.wav --instrument vocals --verbose
```

### Python API
```python
from model_helper import transcribe, load_model_checkpoint

# Load model
model = load_model_checkpoint(args=model_args, device="cuda")

# Transcribe with instrument conditioning
midifile = transcribe(model, audio_info, instrument_hint="vocals")
```

### Confidence Tuning
- High confidence (0.8): Strict instrument filtering
- Low confidence (0.4): Allows more mixed content
- Auto-adjusts based on task token availability

## 📝 Files Modified

- `app.py` - Added instrument dropdown to web interface
- `model_helper.py` - Enhanced transcription with conditioning
- `transcribe_cli.py` - New command-line interface
- `setup_local.py` - Local setup checker
- `test_local.py` - Quick functionality test

## 🎵 Enjoy Better Transcriptions!

No more instrument confusion - you now have full control over what gets transcribed! 🎉