zephyr-7b-mypo3_sim-qlora-lr5e6-beta0.30
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 1.3540
- Rewards/chosen: -0.0639
- Rewards/rejected: -0.3680
- Rewards/accuracies: 0.7050
- Rewards/margins: 0.3041
- Logps/rejected: -2.2845
- Logps/chosen: -1.1386
- Logits/rejected: -1.9700
- Logits/chosen: -2.0510
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 5
- gradient_accumulation_steps: 4
- total_train_batch_size: 40
- total_eval_batch_size: 20
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.3799 | 0.0654 | 100 | 1.3804 | -0.0062 | -0.0408 | 0.6600 | 0.0346 | -1.1940 | -0.9462 | -2.1974 | -2.2810 |
1.3728 | 0.1308 | 200 | 1.3734 | -0.0308 | -0.1119 | 0.6900 | 0.0811 | -1.4310 | -1.0283 | -2.2618 | -2.3330 |
1.3605 | 0.1963 | 300 | 1.3670 | -0.0656 | -0.2070 | 0.7200 | 0.1414 | -1.7478 | -1.1442 | -2.1971 | -2.2674 |
1.3607 | 0.2617 | 400 | 1.3637 | -0.0644 | -0.2551 | 0.6975 | 0.1908 | -1.9084 | -1.1401 | -2.2602 | -2.3277 |
1.3642 | 0.3271 | 500 | 1.3625 | -0.0744 | -0.3109 | 0.6875 | 0.2366 | -2.0943 | -1.1734 | -2.1841 | -2.2534 |
1.3489 | 0.3925 | 600 | 1.3649 | -0.1095 | -0.4197 | 0.6850 | 0.3101 | -2.4568 | -1.2906 | -2.0263 | -2.1039 |
1.3735 | 0.4580 | 700 | 1.3653 | -0.1046 | -0.4143 | 0.7000 | 0.3097 | -2.4389 | -1.2743 | -1.9237 | -2.0155 |
1.3606 | 0.5234 | 800 | 1.3592 | -0.0745 | -0.3701 | 0.6950 | 0.2956 | -2.2915 | -1.1739 | -1.9493 | -2.0356 |
1.3462 | 0.5888 | 900 | 1.3568 | -0.0854 | -0.3668 | 0.7050 | 0.2815 | -2.2807 | -1.2100 | -1.9785 | -2.0609 |
1.3527 | 0.6542 | 1000 | 1.3548 | -0.0626 | -0.3514 | 0.7050 | 0.2888 | -2.2291 | -1.1342 | -1.9978 | -2.0771 |
1.3483 | 0.7197 | 1100 | 1.3558 | -0.0665 | -0.3741 | 0.7025 | 0.3076 | -2.3048 | -1.1471 | -1.9802 | -2.0598 |
1.3558 | 0.7851 | 1200 | 1.3542 | -0.0628 | -0.3646 | 0.7050 | 0.3018 | -2.2733 | -1.1348 | -1.9719 | -2.0522 |
1.3515 | 0.8505 | 1300 | 1.3543 | -0.0644 | -0.3702 | 0.7050 | 0.3058 | -2.2918 | -1.1402 | -1.9694 | -2.0505 |
1.3572 | 0.9159 | 1400 | 1.3540 | -0.0639 | -0.3674 | 0.7075 | 0.3035 | -2.2825 | -1.1385 | -1.9716 | -2.0522 |
1.3527 | 0.9814 | 1500 | 1.3541 | -0.0637 | -0.3677 | 0.7025 | 0.3039 | -2.2834 | -1.1380 | -1.9704 | -2.0513 |
Framework versions
- PEFT 0.10.0
- Transformers 4.43.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Kimory-X/zephyr-7b-mypo3_sim-qlora-lr5e6-beta0.30
Base model
mistralai/Mistral-7B-v0.1