Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT

Model Details

Model Description

Model type: HuBERT-base

Model Sources

Repository: Code
Paper: arXiv:2409.10103
Demo: Project page

How to Get Started with the Model

Use the code below to get started with the model.

git clone https://github.com/ryota-komatsu/speaker_disentangled_hubert.git
cd speaker_disentangled_hubert

sudo apt install git-lfs  # for UTMOS

conda create -y -n py310 -c pytorch -c nvidia -c conda-forge python=3.10.18 pip=24.0 faiss-gpu=1.11.0
conda activate py310
pip install -r requirements/requirements.txt

sh scripts/setup.sh

import torchaudio

from src.flow_matching import FlowMatchingWithBigVGan
from src.s5hubert import S5HubertForSyllableDiscovery

wav_path = "/path/to/wav"

# download pretrained models from hugging face hub
encoder = S5HubertForSyllableDiscovery.from_pretrained("ryota-komatsu/s5-hubert", device_map="cuda")
decoder = FlowMatchingWithBigVGan.from_pretrained("ryota-komatsu/s5-hubert-decoder", device_map="cuda")

# load a waveform
waveform, sr = torchaudio.load(wav_path)
waveform = torchaudio.functional.resample(waveform, sr, 16000)

# encode a waveform into syllabic units
outputs = encoder(waveform.cuda())

# syllabic units
units = outputs[0]["units"]  # [3950, 67, ..., 503]
units = units.unsqueeze(0)

# unit-to-speech synthesis
audio_values = decoder(units)

Training Hyperparameters

Training regime: bf16 mixed precision

Hardware

1 x NVIDIA RTX A6000

Citation

BibTeX:

@inproceedings{Komatsu_Self-Supervised_Syllable_Discovery_2024,
  author    = {Komatsu, Ryota and Shinozaki, Takahiro},
  title     = {Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT},
  year      = {2024},
  month     = {Dec.},
  booktitle = {IEEE Spoken Language Technology Workshop},
  pages     = {1131--1136},
  doi       = {10.1109/SLT61566.2024.10832325},
}

Model Card Authors

Ryota Komatsu

ryota-komatsu
/

s5-hubert

Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT

Model Details

Model Description

Model Sources

How to Get Started with the Model

Training Hyperparameters

Hardware

Citation

Model Card Authors

Model tree for ryota-komatsu/s5-hubert

Dataset used to train ryota-komatsu/s5-hubert

Collection including ryota-komatsu/s5-hubert

Speaker-Disentangled HuBERT