RLHFlow
/

Llama3.1-8B-PRM-Deepseek-Data

Text Generation

text-generation-inference

Model card Files Files and versions Community

weqweasdas commited on May 10

Commit

ec428a8

·

verified ·

1 Parent(s): f645c10

Update README.md

Files changed (1) hide show

README.md +5 -8

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ library_name: transformers
 tags: []
 ---
-This is a process-supervised reward (PRM) trained on Mistral-generated data from the project [RLHFlow/RLHF-Reward-Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling)
 The model is trained from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on [RLHFlow/Deepseek-PRM-Data](https://huggingface.co/datasets/RLHFlow/Deepseek-PRM-Data) for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml .
@@ -55,13 +55,10 @@ The automatic annotation was proposed in the Math-shepherd paper:
 If you find the training recipe useful, please consider cite it as follows.
 ```
-@misc{xiong2024rlhflowmath,
-      author={Wei Xiong and Hanning Zhang and Nan Jiang and Tong Zhang},
-  title = {An Implementation of Generative PRM},
-  year = {2024},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/RLHFlow/RLHF-Reward-Modeling}}
 }
 ```

 tags: []
 ---
+This is a process-supervised reward (PRM) from the project [RLHFlow/RLHF-Reward-Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling)
 The model is trained from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on [RLHFlow/Deepseek-PRM-Data](https://huggingface.co/datasets/RLHFlow/Deepseek-PRM-Data) for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml .
 If you find the training recipe useful, please consider cite it as follows.
 ```
+@misc{xiong2024implementation,
+  title={An implementation of generative prm},
+  author={Xiong, Wei and Zhang, Hanning and Jiang, Nan and Zhang, Tong},
+  year={2024}
 }
 ```