llama3-8b-mypo3_sim-full-beta7.5-lr4e-7
This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 1.3631
- Rewards/chosen: 0.0450
- Rewards/rejected: -0.3543
- Rewards/accuracies: 0.7579
- Rewards/margins: 0.3992
- Logps/rejected: -1.5366
- Logps/chosen: -1.2652
- Logits/rejected: -1.1234
- Logits/chosen: -1.0985
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.3793 | 0.0523 | 100 | 1.3815 | -0.0364 | -0.0923 | 0.6270 | 0.0559 | -1.5017 | -1.2760 | -1.0412 | -1.0098 |
1.3797 | 0.1047 | 200 | 1.3835 | -0.0667 | -0.2448 | 0.7103 | 0.1781 | -1.5220 | -1.2800 | -1.0404 | -1.0114 |
1.3748 | 0.1570 | 300 | 1.3803 | 0.0532 | -0.2001 | 0.7341 | 0.2534 | -1.5161 | -1.2640 | -1.0513 | -1.0243 |
1.3639 | 0.2094 | 400 | 1.3851 | 0.0649 | -0.2251 | 0.7302 | 0.2900 | -1.5194 | -1.2625 | -1.0536 | -1.0279 |
1.3736 | 0.2617 | 500 | 1.3799 | 0.0384 | -0.3073 | 0.7282 | 0.3457 | -1.5304 | -1.2660 | -1.0693 | -1.0442 |
1.3698 | 0.3141 | 600 | 1.3888 | -0.0230 | -0.3563 | 0.7361 | 0.3333 | -1.5369 | -1.2742 | -1.0838 | -1.0584 |
1.3417 | 0.3664 | 700 | 1.3778 | 0.0230 | -0.3367 | 0.7302 | 0.3597 | -1.5343 | -1.2681 | -1.0931 | -1.0674 |
1.413 | 0.4187 | 800 | 1.3758 | -0.0158 | -0.3821 | 0.7401 | 0.3663 | -1.5404 | -1.2732 | -1.1025 | -1.0780 |
1.3989 | 0.4711 | 900 | 1.3793 | 0.0086 | -0.3610 | 0.7460 | 0.3696 | -1.5375 | -1.2700 | -1.1075 | -1.0822 |
1.3566 | 0.5234 | 1000 | 1.3717 | 0.1015 | -0.2903 | 0.7440 | 0.3917 | -1.5281 | -1.2576 | -1.1044 | -1.0813 |
1.39 | 0.5758 | 1100 | 1.3751 | 0.1356 | -0.2313 | 0.7341 | 0.3669 | -1.5202 | -1.2531 | -1.1474 | -1.1210 |
1.3829 | 0.6281 | 1200 | 1.3682 | 0.0202 | -0.3839 | 0.7619 | 0.4041 | -1.5406 | -1.2684 | -1.1289 | -1.1032 |
1.3495 | 0.6805 | 1300 | 1.3676 | 0.0081 | -0.3882 | 0.7540 | 0.3963 | -1.5412 | -1.2701 | -1.0956 | -1.0722 |
1.349 | 0.7328 | 1400 | 1.3697 | 0.1064 | -0.2902 | 0.7421 | 0.3966 | -1.5281 | -1.2570 | -1.1047 | -1.0809 |
1.3702 | 0.7851 | 1500 | 1.3645 | 0.0567 | -0.3406 | 0.7560 | 0.3973 | -1.5348 | -1.2636 | -1.1163 | -1.0916 |
1.3753 | 0.8375 | 1600 | 1.3645 | 0.0555 | -0.3434 | 0.7520 | 0.3988 | -1.5352 | -1.2638 | -1.1144 | -1.0900 |
1.3577 | 0.8898 | 1700 | 1.3632 | 0.0357 | -0.3637 | 0.7540 | 0.3994 | -1.5379 | -1.2664 | -1.1254 | -1.1003 |
1.3568 | 0.9422 | 1800 | 1.3634 | 0.0453 | -0.3518 | 0.7520 | 0.3971 | -1.5363 | -1.2651 | -1.1305 | -1.1050 |
1.3632 | 0.9945 | 1900 | 1.3634 | 0.0445 | -0.3542 | 0.7540 | 0.3986 | -1.5366 | -1.2652 | -1.1235 | -1.0986 |
Framework versions
- Transformers 4.43.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta7.5-lr4e-7
Base model
princeton-nlp/Llama-3-Base-8B-SFT