weqweasdas commited on
Commit
ec428a8
·
verified ·
1 Parent(s): f645c10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -8
README.md CHANGED
@@ -3,7 +3,7 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- This is a process-supervised reward (PRM) trained on Mistral-generated data from the project [RLHFlow/RLHF-Reward-Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling)
7
 
8
  The model is trained from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on [RLHFlow/Deepseek-PRM-Data](https://huggingface.co/datasets/RLHFlow/Deepseek-PRM-Data) for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml .
9
 
@@ -55,13 +55,10 @@ The automatic annotation was proposed in the Math-shepherd paper:
55
  If you find the training recipe useful, please consider cite it as follows.
56
 
57
  ```
58
- @misc{xiong2024rlhflowmath,
59
- author={Wei Xiong and Hanning Zhang and Nan Jiang and Tong Zhang},
60
- title = {An Implementation of Generative PRM},
61
- year = {2024},
62
- publisher = {GitHub},
63
- journal = {GitHub repository},
64
- howpublished = {\url{https://github.com/RLHFlow/RLHF-Reward-Modeling}}
65
  }
66
  ```
67
 
 
3
  tags: []
4
  ---
5
 
6
+ This is a process-supervised reward (PRM) from the project [RLHFlow/RLHF-Reward-Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling)
7
 
8
  The model is trained from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on [RLHFlow/Deepseek-PRM-Data](https://huggingface.co/datasets/RLHFlow/Deepseek-PRM-Data) for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml .
9
 
 
55
  If you find the training recipe useful, please consider cite it as follows.
56
 
57
  ```
58
+ @misc{xiong2024implementation,
59
+ title={An implementation of generative prm},
60
+ author={Xiong, Wei and Zhang, Hanning and Jiang, Nan and Zhang, Tong},
61
+ year={2024}
 
 
 
62
  }
63
  ```
64