opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3
This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy_no-multi-adj-adj-only dataset. It achieves the following results on the evaluation set:
- Loss: 2.6945
- Accuracy: 0.4776
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
4.063 | 0.9998 | 2195 | 3.8259 | 0.3603 |
3.4199 | 1.9998 | 4390 | 3.3194 | 0.4080 |
3.1039 | 2.9998 | 6585 | 3.1005 | 0.4295 |
2.9458 | 3.9998 | 8780 | 2.9943 | 0.4401 |
2.846 | 4.9998 | 10975 | 2.9330 | 0.4464 |
2.7706 | 5.9998 | 13170 | 2.8929 | 0.4504 |
2.7301 | 6.9998 | 15365 | 2.8659 | 0.4535 |
2.6979 | 7.9998 | 17560 | 2.8467 | 0.4559 |
2.671 | 8.9998 | 19755 | 2.8326 | 0.4576 |
2.6463 | 9.9998 | 21950 | 2.8206 | 0.4589 |
2.6435 | 10.9998 | 24145 | 2.8114 | 0.4600 |
2.6281 | 11.9998 | 26340 | 2.8049 | 0.4608 |
2.6181 | 12.9998 | 28535 | 2.8006 | 0.4614 |
2.6056 | 13.9998 | 30730 | 2.7967 | 0.4618 |
2.5967 | 14.9998 | 32925 | 2.7827 | 0.4635 |
2.5648 | 15.9998 | 35120 | 2.7552 | 0.4670 |
2.5149 | 16.9998 | 37315 | 2.7341 | 0.4703 |
2.4582 | 17.9998 | 39510 | 2.7130 | 0.4733 |
2.3923 | 18.9998 | 41705 | 2.6958 | 0.4760 |
2.3178 | 19.9998 | 43900 | 2.6945 | 0.4776 |
Framework versions
- Transformers 4.48.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.1
- Downloads last month
- 7
Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3
Evaluation results
- Accuracy on kanishka/babylm2-rewritten-clean-spacy_no-multi-adj-adj-onlyself-reported0.478