RLCR
Collection
Collection of models and datasets for Beyond Binary Rewards: Training LMs to Reason about their Uncertainty
•
10 items
•
Updated
•
3
This model is a fine-tuned version of Qwen/Qwen2.5-7B on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.5622 | 0.08 | 25 | 0.5628 | 0.717 |
0.4747 | 0.16 | 50 | 0.5106 | 0.747 |
0.4651 | 0.24 | 75 | 0.5284 | 0.73 |
0.4415 | 0.32 | 100 | 0.5070 | 0.755 |
0.4442 | 0.4 | 125 | 0.4878 | 0.76 |
0.4253 | 0.48 | 150 | 0.4866 | 0.757 |
0.4829 | 0.56 | 175 | 0.5158 | 0.741 |
0.4456 | 0.64 | 200 | 0.4799 | 0.76 |
0.4249 | 0.72 | 225 | 0.4830 | 0.766 |
0.4527 | 0.8 | 250 | 0.4816 | 0.764 |
0.4169 | 0.88 | 275 | 0.4833 | 0.763 |
0.4743 | 0.96 | 300 | 0.4828 | 0.768 |
Base model
Qwen/Qwen2.5-7B