yourmt3 / LOCAL_SETUP.md
asdd12e2ad's picture
asd
c207bc4

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

YourMT3+ Local Setup Guide

πŸš€ Quick Start (Local Installation)

1. Install Dependencies

pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa

2. Setup Model Weights

  • Download YourMT3 model weights
  • Place them in: amt/logs/2024/
  • Default expected: mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt

3. Run Setup Check

cd /path/to/YourMT3
python setup_local.py

4. Quick Test

python test_local.py

5. Launch Web Interface

python app.py

Then open: http://127.0.0.1:7860

🎯 New Features

Instrument Conditioning

  • Problem: YourMT3+ switches instruments mid-track (vocals β†’ violin β†’ guitar)
  • Solution: Select target instrument from dropdown
  • Options: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute

How It Works

  1. Upload audio or paste YouTube URL
  2. Select instrument from dropdown menu
  3. Click Transcribe
  4. Get focused transcription without instrument confusion

πŸ”§ Troubleshooting

"Unknown event type: transcribe_singing"

This is expected! The error indicates your model doesn't have special task tokens, which is normal. The system will:

  1. Try task tokens (may fail - that's OK)
  2. Fall back to post-processing filtering
  3. Still give you better results

Debug Output

Look for these messages in console:

=== TRANSCRIBE FUNCTION CALLED ===
Audio file: /path/to/audio.wav
Instrument hint: vocals

=== INSTRUMENT CONDITIONING ACTIVATED ===
Model Task Configuration Debug:
βœ“ Model has task_manager
  Task name: mc13_full_plus_256
  Available subtask prefixes: ['default']

=== APPLYING INSTRUMENT FILTER ===
Found instruments in transcription: {0: 45, 100: 123, 40: 12}
Primary instrument: 100 (73% of notes)
Target program for vocals: 100
Converted 57 notes to primary instrument 100

Common Issues

1. Import Errors

pip install torch torchaudio transformers gradio pytorch-lightning

2. Model Not Found

  • Download model weights to amt/logs/2024/
  • Check filename matches exactly

3. No Audio Examples

  • Place test audio files in examples/ folder
  • Supported formats: .wav, .mp3

4. Port Already in Use

  • Web interface runs on port 7860
  • If busy, it will try 7861, 7862, etc.

πŸ“Š Expected Results

Before (Original YourMT3+)

  • Vocals file β†’ outputs: vocals + violin + guitar tracks
  • Saxophone solo β†’ incomplete transcription
  • Flute solo β†’ single note only

After (With Instrument Conditioning)

  • Select "Vocals/Singing" β†’ clean vocal transcription only
  • Select "Saxophone" β†’ complete saxophone solo
  • Select "Flute" β†’ full flute transcription

πŸ› οΈ Advanced Usage

Command Line

python transcribe_cli.py audio.wav --instrument vocals --verbose

Python API

from model_helper import transcribe, load_model_checkpoint

# Load model
model = load_model_checkpoint(args=model_args, device="cuda")

# Transcribe with instrument conditioning
midifile = transcribe(model, audio_info, instrument_hint="vocals")

Confidence Tuning

  • High confidence (0.8): Strict instrument filtering
  • Low confidence (0.4): Allows more mixed content
  • Auto-adjusts based on task token availability

πŸ“ Files Modified

  • app.py - Added instrument dropdown to web interface
  • model_helper.py - Enhanced transcription with conditioning
  • transcribe_cli.py - New command-line interface
  • setup_local.py - Local setup checker
  • test_local.py - Quick functionality test

🎡 Enjoy Better Transcriptions!

No more instrument confusion - you now have full control over what gets transcribed! πŸŽ‰