Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ library_name: transformers
|
|
3 |
tags: []
|
4 |
---
|
5 |
|
6 |
-
This is a process-supervised reward (PRM)
|
7 |
|
8 |
The model is trained from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on [RLHFlow/Deepseek-PRM-Data](https://huggingface.co/datasets/RLHFlow/Deepseek-PRM-Data) for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml .
|
9 |
|
@@ -55,13 +55,10 @@ The automatic annotation was proposed in the Math-shepherd paper:
|
|
55 |
If you find the training recipe useful, please consider cite it as follows.
|
56 |
|
57 |
```
|
58 |
-
@misc{
|
59 |
-
|
60 |
-
|
61 |
-
year
|
62 |
-
publisher = {GitHub},
|
63 |
-
journal = {GitHub repository},
|
64 |
-
howpublished = {\url{https://github.com/RLHFlow/RLHF-Reward-Modeling}}
|
65 |
}
|
66 |
```
|
67 |
|
|
|
3 |
tags: []
|
4 |
---
|
5 |
|
6 |
+
This is a process-supervised reward (PRM) from the project [RLHFlow/RLHF-Reward-Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling)
|
7 |
|
8 |
The model is trained from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on [RLHFlow/Deepseek-PRM-Data](https://huggingface.co/datasets/RLHFlow/Deepseek-PRM-Data) for 1 epochs. We use a global batch size of 32 and a learning rate of 2e-6, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/math/llama-3.1-prm.yaml .
|
9 |
|
|
|
55 |
If you find the training recipe useful, please consider cite it as follows.
|
56 |
|
57 |
```
|
58 |
+
@misc{xiong2024implementation,
|
59 |
+
title={An implementation of generative prm},
|
60 |
+
author={Xiong, Wei and Zhang, Hanning and Jiang, Nan and Zhang, Tong},
|
61 |
+
year={2024}
|
|
|
|
|
|
|
62 |
}
|
63 |
```
|
64 |
|