Update README.md
Browse files
README.md
CHANGED
@@ -48,7 +48,7 @@ Building on [MERaLiON-SpeechEncoder-v1](https://huggingface.co/MERaLiON/MERaLiON
|
|
48 |
The model retains near state-of-the-art results on the SUPERB benchmark for English, and showcases strong multilingual capabilities demonstrated through its integration into a [high-performance ASR system](#automatic-speech-recognition-asr).
|
49 |
|
50 |
#### Innovative pre-training techniques
|
51 |
-
MERaLiON-SpeechEncoder-2 was trained from scratch with a **novel extension of the BEST-RQ** self-supervised objective, by using more informative latent targets. We also adopted the **Muon optimizer**, which has previously only been shown to outperform the
|
52 |
|
53 |
## Model Summary
|
54 |
|
@@ -69,7 +69,7 @@ For details on background, pre-training, tuning experiments and evaluation, plea
|
|
69 |
| MERaLiON-SpeechEncoder-v1 | 82.62 | 3.14 | 4.16 | 97.63 | 0.0590 | 91.09 | 5.18 | 5.06 | 68.02 | 98.60 | 88.99 / 23.89 |
|
70 |
| MERaLiON-SpeechEncoder-2 | 82.72 | 3.40 | 4.96 | 97.57 | 0.0575 | 88.96 | 3.93 | 3.90 | 68.80 | 98.95 | 89.50 / 23.46 |
|
71 |
|
72 |
-
[SUPERB](https://superbbenchmark.
|
73 |
|
74 |
MERaLiON-SpeechEncoder-2 is competitive to state-of-the-art, improving slightly against our own v1 model on speaker and paralinguistic tasks.
|
75 |
|
|
|
48 |
The model retains near state-of-the-art results on the SUPERB benchmark for English, and showcases strong multilingual capabilities demonstrated through its integration into a [high-performance ASR system](#automatic-speech-recognition-asr).
|
49 |
|
50 |
#### Innovative pre-training techniques
|
51 |
+
MERaLiON-SpeechEncoder-2 was trained from scratch with a **novel extension of the BEST-RQ** self-supervised objective, by using more informative latent targets. We also adopted the **Muon optimizer**, which has previously only been shown to outperform the wide-used AdamW optimizer for LLM training. We find its advantages also carry over to speech-based models.
|
52 |
|
53 |
## Model Summary
|
54 |
|
|
|
69 |
| MERaLiON-SpeechEncoder-v1 | 82.62 | 3.14 | 4.16 | 97.63 | 0.0590 | 91.09 | 5.18 | 5.06 | 68.02 | 98.60 | 88.99 / 23.89 |
|
70 |
| MERaLiON-SpeechEncoder-2 | 82.72 | 3.40 | 4.96 | 97.57 | 0.0575 | 88.96 | 3.93 | 3.90 | 68.80 | 98.95 | 89.50 / 23.46 |
|
71 |
|
72 |
+
[SUPERB](https://superbbenchmark.github.io/#/) is an English-based benchmark for speech encoders covering a wide range of downstream speech tasks across domains such as recognition, detection, semantics, speaker, and paralinguistics, where each task is finetuned separately with a frozen encoder.
|
73 |
|
74 |
MERaLiON-SpeechEncoder-2 is competitive to state-of-the-art, improving slightly against our own v1 model on speaker and paralinguistic tasks.
|
75 |
|