opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy_no-multi-adj-adj-only dataset. It achieves the following results on the evaluation set:

Loss: 2.6945
Accuracy: 0.4776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.063	0.9998	2195	3.8259	0.3603
3.4199	1.9998	4390	3.3194	0.4080
3.1039	2.9998	6585	3.1005	0.4295
2.9458	3.9998	8780	2.9943	0.4401
2.846	4.9998	10975	2.9330	0.4464
2.7706	5.9998	13170	2.8929	0.4504
2.7301	6.9998	15365	2.8659	0.4535
2.6979	7.9998	17560	2.8467	0.4559
2.671	8.9998	19755	2.8326	0.4576
2.6463	9.9998	21950	2.8206	0.4589
2.6435	10.9998	24145	2.8114	0.4600
2.6281	11.9998	26340	2.8049	0.4608
2.6181	12.9998	28535	2.8006	0.4614
2.6056	13.9998	30730	2.7967	0.4618
2.5967	14.9998	32925	2.7827	0.4635
2.5648	15.9998	35120	2.7552	0.4670
2.5149	16.9998	37315	2.7341	0.4703
2.4582	17.9998	39510	2.7130	0.4733
2.3923	18.9998	41705	2.6958	0.4760
2.3178	19.9998	43900	2.6945	0.4776

Framework versions

Transformers 4.48.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.21.1

kanishka
/

opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3

opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3

Evaluation results