Quantization made by Richard Erkhov.
Model presented in Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment.
Code: https://github.com/general-preference/general-preference-model
SPPO-Llama-3-8B-Instruct-GPM-2B - bnb 4bits
- Model creator: https://huggingface.co/general-preference/
- Original model: https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B/
Original model description:
language: - en license: apache-2.0 datasets: - openbmb/UltraFeedback pipeline_tag: text-generation model-index: - name: SPPO-Llama-3-8B-Instruct-GPM-2B results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 60.24 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 27.89 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 8.01 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 1.23 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 3.19 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 29.53 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B name: Open LLM Leaderboard
General Preference Modeling with Preference Representations for Aligning Language Models (https://arxiv.org/abs/2410.02197)
SPPO-Llama-3-8B-Instruct-GPM-2B
This model was developed using SPPO at iteration 3 and the General Preference representation Model (GPM) (specifically, using GPM-Gemma-2B), based on the meta-llama/Meta-Llama-3-8B-Instruct architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.
Links to Other Models
Model Description
- Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
- Language(s) (NLP): Primarily English
- License: Apache-2.0
- Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct
AlpacaEval Leaderboard Evaluation Results
Model | LC. Win Rate | Win Rate | Avg. Length |
---|---|---|---|
SPPO-Llama-3-8B-Instruct-GPM-2B | 35.30 | 45.44 | 2490 |
Open LLM Leaderboard Evaluation Results
Results are reported by using lm-evaluation-harness v0.4.1
arc_challenge | truthfulqa_mc2 | winogrande | gsm8k | hellaswag | mmlu | average | |
---|---|---|---|---|---|---|---|
SPPO-Llama-3-8B-Instruct-GPM-2B | 62.03 | 52.95 | 76.56 | 75.36 | 78.57 | 65.66 | 68.52 |
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- eta: 1000
- per_device_train_batch_size: 8
- gradient_accumulation_steps: 1
- seed: 42
- distributed_type: deepspeed_zero3
- num_devices: 8
- optimizer: RMSProp
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_train_epochs: 6.0 (stop at epoch=1.0)
Citation
@article{zhang2024general,
title={General Preference Modeling with Preference Representations for Aligning Language Models},
author={Zhang, Yifan and Zhang, Ge and Wu, Yue and Xu, Kangping and Gu, Quanquan},
journal={arXiv preprint arXiv:2410.02197},
year={2024}
}
- Downloads last month
- 4