Text Generation
Transformers
Safetensors
English
phimoe
conversational
custom_code
cliang1453 nielsr HF Staff commited on
Commit
577939f
·
verified ·
1 Parent(s): e8b3fcb

Improve model card (#1)

Browse files

- Improve model card (6447db44107b2ae5248dfb268179d6b93751dea3)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -1,15 +1,19 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
 
5
  context_length:
6
  - 4k
7
- pipeline_tag: text-generation
8
  ---
 
9
  ## Model Summary
10
 
11
  Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.
12
 
 
 
13
 
14
  References: <br>
15
  📖 [SlimMoE](https://arxiv.org/pdf/2506.18349) <br>
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ pipeline_tag: text-generation
6
  context_length:
7
  - 4k
8
+ library_name: transformers
9
  ---
10
+
11
  ## Model Summary
12
 
13
  Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.
14
 
15
+ Project Page: https://huggingface.co/microsoft/Phi-mini-MoE-instruct
16
+ Code: https://github.com/microsoft/LMOps/tree/main/src/moe
17
 
18
  References: <br>
19
  📖 [SlimMoE](https://arxiv.org/pdf/2506.18349) <br>