MERaLiON
/

MERaLiON-SpeechEncoder-2

Feature Extraction

meralion_bestrq

Model card Files Files and versions

huzy0 commited on Jul 30

Commit

a8e5393

·

verified ·

1 Parent(s): c3b680d

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -48,7 +48,7 @@ Building on [MERaLiON-SpeechEncoder-v1](https://huggingface.co/MERaLiON/MERaLiON
 The model retains near state-of-the-art results on the SUPERB benchmark for English, and showcases strong multilingual capabilities demonstrated through its integration into a [high-performance ASR system](#automatic-speech-recognition-asr).
 #### Innovative pre-training techniques
-MERaLiON-SpeechEncoder-2 was trained from scratch with a **novel extension of the BEST-RQ** self-supervised objective, by using more informative latent targets. We also adopted the **Muon optimizer**, which has previously only been shown to outperform the popular AdamW for LLM training. We find its advantages also carry over to speech-based models.
 ## Model Summary
@@ -69,7 +69,7 @@ For details on background, pre-training, tuning experiments and evaluation, plea
 | MERaLiON-SpeechEncoder-v1       | 82.62         | 3.14 | 4.16 | 97.63 | 0.0590 | 91.09 | 5.18 | 5.06 | 68.02 | 98.60 | 88.99 / 23.89        |
 | MERaLiON-SpeechEncoder-2        | 82.72         | 3.40 | 4.96 | 97.57 | 0.0575 | 88.96 | 3.93 | 3.90 | 68.80 | 98.95 | 89.50 / 23.46        |
-[SUPERB](https://superbbenchmark.org/) is an English-based benchmark for speech encoders covering a wide range of downstream speech tasks across domains such as recognition, detection, semantics, speaker, and paralinguistics, where each task is finetuned separately with a frozen encoder.
 MERaLiON-SpeechEncoder-2 is competitive to state-of-the-art, improving slightly against our own v1 model on speaker and paralinguistic tasks.

 The model retains near state-of-the-art results on the SUPERB benchmark for English, and showcases strong multilingual capabilities demonstrated through its integration into a [high-performance ASR system](#automatic-speech-recognition-asr).
 #### Innovative pre-training techniques
+MERaLiON-SpeechEncoder-2 was trained from scratch with a **novel extension of the BEST-RQ** self-supervised objective, by using more informative latent targets. We also adopted the **Muon optimizer**, which has previously only been shown to outperform the wide-used AdamW optimizer for LLM training. We find its advantages also carry over to speech-based models.
 ## Model Summary
 | MERaLiON-SpeechEncoder-v1       | 82.62         | 3.14 | 4.16 | 97.63 | 0.0590 | 91.09 | 5.18 | 5.06 | 68.02 | 98.60 | 88.99 / 23.89        |
 | MERaLiON-SpeechEncoder-2        | 82.72         | 3.40 | 4.96 | 97.57 | 0.0575 | 88.96 | 3.93 | 3.90 | 68.80 | 98.95 | 89.50 / 23.46        |
+[SUPERB](https://superbbenchmark.github.io/#/) is an English-based benchmark for speech encoders covering a wide range of downstream speech tasks across domains such as recognition, detection, semantics, speaker, and paralinguistics, where each task is finetuned separately with a frozen encoder.
 MERaLiON-SpeechEncoder-2 is competitive to state-of-the-art, improving slightly against our own v1 model on speaker and paralinguistic tasks.