yourmt3 / README_SPACES.md
asdd12e2ad's picture
asd
c207bc4

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

YourMT3+ Enhanced Music Transcription

This is an enhanced version of YourMT3+ with instrument conditioning capabilities to solve instrument switching mid-track issues.

Features

  • Instrument Conditioning: Choose your target instrument to maintain consistency throughout transcription
  • Multi-track Support: Transcribe multiple instruments from polyphonic audio
  • Format Options: Output as MIDI, MusicXML, ABC notation, or audio
  • Free CPU Inference: Optimized to run on HuggingFace Spaces free tier (CPU-only, 16GB RAM)

How to Use

  1. Upload Your Audio: Drag and drop or select an audio file
  2. Select Target Instrument: Choose from the dropdown (vocals, piano, guitar, drums, etc.)
  3. Choose Output Format: MIDI, MusicXML, ABC, or audio
  4. Transcribe: Click the transcribe button and wait for results

Instrument Conditioning System

This enhanced version addresses the common issue where YourMT3+ switches instruments mid-track (e.g., vocals → violin → guitar). The system uses:

  • Task Tokens: Special conditioning tokens when available in the model
  • Post-processing Filtering: Consistent instrument filtering based on MIDI program numbers
  • Debug Output: Console logs showing instrument detection and filtering results

Supported Instruments

  • Vocals/Singing
  • Piano
  • Guitar (Electric/Acoustic)
  • Bass
  • Drums
  • Violin
  • Trumpet
  • Saxophone
  • And many more...

Technical Details

  • Model: YourMT3+ (Multi-channel T5 decoder with Perceiver-TF encoder)
  • Framework: PyTorch Lightning + Gradio
  • Inference: CPU-only for free tier compatibility
  • Memory: Optimized for 16GB RAM constraint

Credits

Based on the original YourMT3 by the MT3 team, enhanced with instrument conditioning capabilities.