Visualize in Weights & Biases

zephyr-7b-mypo3_sim-qlora-lr5e6-beta0.30

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3540
  • Rewards/chosen: -0.0639
  • Rewards/rejected: -0.3680
  • Rewards/accuracies: 0.7050
  • Rewards/margins: 0.3041
  • Logps/rejected: -2.2845
  • Logps/chosen: -1.1386
  • Logits/rejected: -1.9700
  • Logits/chosen: -2.0510

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 5
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 40
  • total_eval_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.3799 0.0654 100 1.3804 -0.0062 -0.0408 0.6600 0.0346 -1.1940 -0.9462 -2.1974 -2.2810
1.3728 0.1308 200 1.3734 -0.0308 -0.1119 0.6900 0.0811 -1.4310 -1.0283 -2.2618 -2.3330
1.3605 0.1963 300 1.3670 -0.0656 -0.2070 0.7200 0.1414 -1.7478 -1.1442 -2.1971 -2.2674
1.3607 0.2617 400 1.3637 -0.0644 -0.2551 0.6975 0.1908 -1.9084 -1.1401 -2.2602 -2.3277
1.3642 0.3271 500 1.3625 -0.0744 -0.3109 0.6875 0.2366 -2.0943 -1.1734 -2.1841 -2.2534
1.3489 0.3925 600 1.3649 -0.1095 -0.4197 0.6850 0.3101 -2.4568 -1.2906 -2.0263 -2.1039
1.3735 0.4580 700 1.3653 -0.1046 -0.4143 0.7000 0.3097 -2.4389 -1.2743 -1.9237 -2.0155
1.3606 0.5234 800 1.3592 -0.0745 -0.3701 0.6950 0.2956 -2.2915 -1.1739 -1.9493 -2.0356
1.3462 0.5888 900 1.3568 -0.0854 -0.3668 0.7050 0.2815 -2.2807 -1.2100 -1.9785 -2.0609
1.3527 0.6542 1000 1.3548 -0.0626 -0.3514 0.7050 0.2888 -2.2291 -1.1342 -1.9978 -2.0771
1.3483 0.7197 1100 1.3558 -0.0665 -0.3741 0.7025 0.3076 -2.3048 -1.1471 -1.9802 -2.0598
1.3558 0.7851 1200 1.3542 -0.0628 -0.3646 0.7050 0.3018 -2.2733 -1.1348 -1.9719 -2.0522
1.3515 0.8505 1300 1.3543 -0.0644 -0.3702 0.7050 0.3058 -2.2918 -1.1402 -1.9694 -2.0505
1.3572 0.9159 1400 1.3540 -0.0639 -0.3674 0.7075 0.3035 -2.2825 -1.1385 -1.9716 -2.0522
1.3527 0.9814 1500 1.3541 -0.0637 -0.3677 0.7025 0.3039 -2.2834 -1.1380 -1.9704 -2.0513

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kimory-X/zephyr-7b-mypo3_sim-qlora-lr5e6-beta0.30

Adapter
(2284)
this model

Dataset used to train Kimory-X/zephyr-7b-mypo3_sim-qlora-lr5e6-beta0.30