llama3-8b-mypo3_sim-full-beta7.5-lr4e-7

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3631
  • Rewards/chosen: 0.0450
  • Rewards/rejected: -0.3543
  • Rewards/accuracies: 0.7579
  • Rewards/margins: 0.3992
  • Logps/rejected: -1.5366
  • Logps/chosen: -1.2652
  • Logits/rejected: -1.1234
  • Logits/chosen: -1.0985

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.3793 0.0523 100 1.3815 -0.0364 -0.0923 0.6270 0.0559 -1.5017 -1.2760 -1.0412 -1.0098
1.3797 0.1047 200 1.3835 -0.0667 -0.2448 0.7103 0.1781 -1.5220 -1.2800 -1.0404 -1.0114
1.3748 0.1570 300 1.3803 0.0532 -0.2001 0.7341 0.2534 -1.5161 -1.2640 -1.0513 -1.0243
1.3639 0.2094 400 1.3851 0.0649 -0.2251 0.7302 0.2900 -1.5194 -1.2625 -1.0536 -1.0279
1.3736 0.2617 500 1.3799 0.0384 -0.3073 0.7282 0.3457 -1.5304 -1.2660 -1.0693 -1.0442
1.3698 0.3141 600 1.3888 -0.0230 -0.3563 0.7361 0.3333 -1.5369 -1.2742 -1.0838 -1.0584
1.3417 0.3664 700 1.3778 0.0230 -0.3367 0.7302 0.3597 -1.5343 -1.2681 -1.0931 -1.0674
1.413 0.4187 800 1.3758 -0.0158 -0.3821 0.7401 0.3663 -1.5404 -1.2732 -1.1025 -1.0780
1.3989 0.4711 900 1.3793 0.0086 -0.3610 0.7460 0.3696 -1.5375 -1.2700 -1.1075 -1.0822
1.3566 0.5234 1000 1.3717 0.1015 -0.2903 0.7440 0.3917 -1.5281 -1.2576 -1.1044 -1.0813
1.39 0.5758 1100 1.3751 0.1356 -0.2313 0.7341 0.3669 -1.5202 -1.2531 -1.1474 -1.1210
1.3829 0.6281 1200 1.3682 0.0202 -0.3839 0.7619 0.4041 -1.5406 -1.2684 -1.1289 -1.1032
1.3495 0.6805 1300 1.3676 0.0081 -0.3882 0.7540 0.3963 -1.5412 -1.2701 -1.0956 -1.0722
1.349 0.7328 1400 1.3697 0.1064 -0.2902 0.7421 0.3966 -1.5281 -1.2570 -1.1047 -1.0809
1.3702 0.7851 1500 1.3645 0.0567 -0.3406 0.7560 0.3973 -1.5348 -1.2636 -1.1163 -1.0916
1.3753 0.8375 1600 1.3645 0.0555 -0.3434 0.7520 0.3988 -1.5352 -1.2638 -1.1144 -1.0900
1.3577 0.8898 1700 1.3632 0.0357 -0.3637 0.7540 0.3994 -1.5379 -1.2664 -1.1254 -1.1003
1.3568 0.9422 1800 1.3634 0.0453 -0.3518 0.7520 0.3971 -1.5363 -1.2651 -1.1305 -1.1050
1.3632 0.9945 1900 1.3634 0.0445 -0.3542 0.7540 0.3986 -1.5366 -1.2652 -1.1235 -1.0986

Framework versions

  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta7.5-lr4e-7

Finetuned
(36)
this model

Dataset used to train aaaalongaa/llama3-8b-mypo3_sim-full-beta7.5-lr4e-7