MERaLiON
/

MERaLiON-SpeechEncoder-2

@@ -22,7 +22,7 @@ language:
 <h1 align="center">🎧 MERaLiON-SpeechEncoder-2 🎧</h1>
 <p align="center">
-  <a href="https://meralion.org/demo/">💻 ASR Web Demo (Coming Soon!)</a> |
 </p>
@@ -31,6 +31,8 @@ We introduce **MERaLiON-SpeechEncoder-2**, an update of [MERaLiON-SpeechEncoder-
 Unlike many existing models optimized for high-resource, Western languages, MERaLiON-SpeechEncoder-2 is designed from the ground up to reflect the linguistic diversity and complexity of Southeast Asia. The model can be finetuned on custom datasets, allowing developers to build speech systems tailored to their specific needs.
 ## Model Highlights
 ### Small model size
@@ -52,18 +54,13 @@ MERaLiON-SpeechEncoder-2 was trained from scratch with an novel extension of the
 - **Language(s):** Primarily English (Global & Singapore), Chinese, Malay, Tamil, Thai, Indonesian, and Vietnamese. See [pre-training data](#Language coverage of pre-training data) for full breakdown of language coverage.
 - **License:** [MERaLiON Public License](https://huggingface.co/MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION/blob/main/MERaLiON-Public-Licence-v1.pdf)
-The following Hugging Face-compatible models are implemented:
--   **`MeralionBestRqModel`**: The base BEST-RQ Conformer encoder. It outputs the final hidden states and is suitable for feature extraction or as a base for other heads.
--   **`MeralionBestRqModelForCTC`**: The Conformer model with a linear CTC head for ASR.
--   **`MeralionBestRqModelForLSTMCTC`**: The Conformer model with a more powerful CTC head that includes two LSTM layers before the final projection layer. This version can also be configured to use a weighted sum of all encoder hidden states.
 For details on background, pre-training, tuning experiments and evaluation, please refer to our [technical report](https://arxiv.org/abs/2412.11538).
 ## Language coverage of pre-training data
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->

 <h1 align="center">🎧 MERaLiON-SpeechEncoder-2 🎧</h1>
 <p align="center">
+  <a href="https://meralion.org/demo/">💻 ASR Web Demo (Coming Soon!)</a>
 </p>
 Unlike many existing models optimized for high-resource, Western languages, MERaLiON-SpeechEncoder-2 is designed from the ground up to reflect the linguistic diversity and complexity of Southeast Asia. The model can be finetuned on custom datasets, allowing developers to build speech systems tailored to their specific needs.
+<img src="data1.svg" width="425"/> <img src="data2.svg" width="425"/>
 ## Model Highlights
 ### Small model size
 - **Language(s):** Primarily English (Global & Singapore), Chinese, Malay, Tamil, Thai, Indonesian, and Vietnamese. See [pre-training data](#Language coverage of pre-training data) for full breakdown of language coverage.
 - **License:** [MERaLiON Public License](https://huggingface.co/MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION/blob/main/MERaLiON-Public-Licence-v1.pdf)
 For details on background, pre-training, tuning experiments and evaluation, please refer to our [technical report](https://arxiv.org/abs/2412.11538).
 ## Language coverage of pre-training data
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->