Improve model card (#1)
Browse files- Improve model card (6447db44107b2ae5248dfb268179d6b93751dea3)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,15 +1,19 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
language:
|
4 |
- en
|
|
|
|
|
5 |
context_length:
|
6 |
- 4k
|
7 |
-
|
8 |
---
|
|
|
9 |
## Model Summary
|
10 |
|
11 |
Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.
|
12 |
|
|
|
|
|
13 |
|
14 |
References: <br>
|
15 |
📖 [SlimMoE](https://arxiv.org/pdf/2506.18349) <br>
|
|
|
1 |
---
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
license: mit
|
5 |
+
pipeline_tag: text-generation
|
6 |
context_length:
|
7 |
- 4k
|
8 |
+
library_name: transformers
|
9 |
---
|
10 |
+
|
11 |
## Model Summary
|
12 |
|
13 |
Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.
|
14 |
|
15 |
+
Project Page: https://huggingface.co/microsoft/Phi-mini-MoE-instruct
|
16 |
+
Code: https://github.com/microsoft/LMOps/tree/main/src/moe
|
17 |
|
18 |
References: <br>
|
19 |
📖 [SlimMoE](https://arxiv.org/pdf/2506.18349) <br>
|