File size: 3,732 Bytes
c207bc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# YourMT3+ Local Setup Guide

## πŸš€ Quick Start (Local Installation)

### 1. Install Dependencies
```bash
pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa
```

### 2. Setup Model Weights
- Download YourMT3 model weights
- Place them in: `amt/logs/2024/`
- Default expected: `mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt`

### 3. Run Setup Check
```bash
cd /path/to/YourMT3
python setup_local.py
```

### 4. Quick Test
```bash
python test_local.py
```

### 5. Launch Web Interface
```bash
python app.py
```
Then open: http://127.0.0.1:7860

## 🎯 New Features

### Instrument Conditioning
- **Problem**: YourMT3+ switches instruments mid-track (vocals β†’ violin β†’ guitar)
- **Solution**: Select target instrument from dropdown
- **Options**: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute

### How It Works
1. **Upload audio** or paste YouTube URL
2. **Select instrument** from dropdown menu  
3. **Click Transcribe**
4. **Get focused transcription** without instrument confusion

## πŸ”§ Troubleshooting

### "Unknown event type: transcribe_singing"
**This is expected!** The error indicates your model doesn't have special task tokens, which is normal. The system will:
1. Try task tokens (may fail - that's OK)
2. Fall back to post-processing filtering
3. Still give you better results

### Debug Output
Look for these messages in console:
```
=== TRANSCRIBE FUNCTION CALLED ===
Audio file: /path/to/audio.wav
Instrument hint: vocals

=== INSTRUMENT CONDITIONING ACTIVATED ===
Model Task Configuration Debug:
βœ“ Model has task_manager
  Task name: mc13_full_plus_256
  Available subtask prefixes: ['default']

=== APPLYING INSTRUMENT FILTER ===
Found instruments in transcription: {0: 45, 100: 123, 40: 12}
Primary instrument: 100 (73% of notes)
Target program for vocals: 100
Converted 57 notes to primary instrument 100
```

### Common Issues

**1. Import Errors**
```bash
pip install torch torchaudio transformers gradio pytorch-lightning
```

**2. Model Not Found**
- Download model weights to `amt/logs/2024/`
- Check filename matches exactly

**3. No Audio Examples**
- Place test audio files in `examples/` folder
- Supported formats: .wav, .mp3

**4. Port Already in Use**
- Web interface runs on port 7860
- If busy, it will try 7861, 7862, etc.

## πŸ“Š Expected Results

### Before (Original YourMT3+)
- Vocals file β†’ outputs: vocals + violin + guitar tracks
- Saxophone solo β†’ incomplete transcription
- Flute solo β†’ single note only

### After (With Instrument Conditioning)
- Select "Vocals/Singing" β†’ clean vocal transcription only
- Select "Saxophone" β†’ complete saxophone solo
- Select "Flute" β†’ full flute transcription

## πŸ› οΈ Advanced Usage

### Command Line
```bash
python transcribe_cli.py audio.wav --instrument vocals --verbose
```

### Python API
```python
from model_helper import transcribe, load_model_checkpoint

# Load model
model = load_model_checkpoint(args=model_args, device="cuda")

# Transcribe with instrument conditioning
midifile = transcribe(model, audio_info, instrument_hint="vocals")
```

### Confidence Tuning
- High confidence (0.8): Strict instrument filtering
- Low confidence (0.4): Allows more mixed content
- Auto-adjusts based on task token availability

## πŸ“ Files Modified

- `app.py` - Added instrument dropdown to web interface
- `model_helper.py` - Enhanced transcription with conditioning
- `transcribe_cli.py` - New command-line interface
- `setup_local.py` - Local setup checker
- `test_local.py` - Quick functionality test

## 🎡 Enjoy Better Transcriptions!

No more instrument confusion - you now have full control over what gets transcribed! πŸŽ‰