Progressive Growth Transformers (PGT) [pretrain] Transformers grown layer-by-layer on frozen embeddings. Explores emergent capabilities with depth. Bochkov/abs-bvv-6 Text Generation • Updated Jul 14 • 3 Bochkov/abs-bvv-5 Text Generation • Updated Jul 14 • 3 Bochkov/abs-bvv-4 Text Generation • Updated Jul 14 • 3 Bochkov/abs-bvv-3 Text Generation • Updated Jul 14 • 3
Best demo models [pretrain] Frozen embedding LMs (en/ru/zh) & their MoE fusion. Baselines: frozen vs unfrozen embedding ablation. Bochkov/best_bvv_moe Text Generation • Updated Jul 14 • 2 Bochkov/best_bvv_ru Text Generation • Updated Jul 14 • 4 Bochkov/best_bvv_unfrozen_ru Text Generation • Updated Jul 14 • 1 Bochkov/best_bvv_zh Text Generation • Updated Jul 14 • 1
Progressive Growth Transformers (PGT) [pretrain] Transformers grown layer-by-layer on frozen embeddings. Explores emergent capabilities with depth. Bochkov/abs-bvv-6 Text Generation • Updated Jul 14 • 3 Bochkov/abs-bvv-5 Text Generation • Updated Jul 14 • 3 Bochkov/abs-bvv-4 Text Generation • Updated Jul 14 • 3 Bochkov/abs-bvv-3 Text Generation • Updated Jul 14 • 3
Best demo models [pretrain] Frozen embedding LMs (en/ru/zh) & their MoE fusion. Baselines: frozen vs unfrozen embedding ablation. Bochkov/best_bvv_moe Text Generation • Updated Jul 14 • 2 Bochkov/best_bvv_ru Text Generation • Updated Jul 14 • 4 Bochkov/best_bvv_unfrozen_ru Text Generation • Updated Jul 14 • 1 Bochkov/best_bvv_zh Text Generation • Updated Jul 14 • 1