opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy_no-multi-adj-adj-only dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6945
  • Accuracy: 0.4776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.063 0.9998 2195 3.8259 0.3603
3.4199 1.9998 4390 3.3194 0.4080
3.1039 2.9998 6585 3.1005 0.4295
2.9458 3.9998 8780 2.9943 0.4401
2.846 4.9998 10975 2.9330 0.4464
2.7706 5.9998 13170 2.8929 0.4504
2.7301 6.9998 15365 2.8659 0.4535
2.6979 7.9998 17560 2.8467 0.4559
2.671 8.9998 19755 2.8326 0.4576
2.6463 9.9998 21950 2.8206 0.4589
2.6435 10.9998 24145 2.8114 0.4600
2.6281 11.9998 26340 2.8049 0.4608
2.6181 12.9998 28535 2.8006 0.4614
2.6056 13.9998 30730 2.7967 0.4618
2.5967 14.9998 32925 2.7827 0.4635
2.5648 15.9998 35120 2.7552 0.4670
2.5149 16.9998 37315 2.7341 0.4703
2.4582 17.9998 39510 2.7130 0.4733
2.3923 18.9998 41705 2.6958 0.4760
2.3178 19.9998 43900 2.6945 0.4776

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.1
Downloads last month
7
Safetensors
Model size
97.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-earlystop_no-multi-adj-adj-only-bpe_seed-42_1e-3

Evaluation results

  • Accuracy on kanishka/babylm2-rewritten-clean-spacy_no-multi-adj-adj-only
    self-reported
    0.478