wh-zhu
/

DeepSeek-R1-TrRa-iter2-1.5B-lambda_2

Model card Files Files and versions

DeepSeek-R1-TrRa-iter2-1.5B-lambda_2 / README.md

wh-zhu's picture

Upload folder using huggingface_hub

a16cc13 verified 2 months ago

|

history blame contribute delete

964 Bytes


	<h1 align="center">🛠️ ReAligner</h1>
	<p align="center">
	<a href="https://arxiv.org/pdf/2506.12704"><img src="https://img.shields.io/badge/arXiv-arXiv%20Preprint-B31B1B?style=flat&logo=arxiv&logoColor=white" alt="arXiv Paper"></a>

	<a href="https://github.com/zwhong714/ReAligner"><img src="https://img.shields.io/badge/Homepage-Project%20Page-brightgreen?style=flat&logo=github" alt="Homepage"></a>

	<a href="https://huggingface.co/wh-zhu"><img src="https://img.shields.io/badge/Huggingface-Models-yellow?style=flat&logo=huggingface" alt="Models"></a>
	</p>



	<div>
	A flexible realignment framework is proposed to quantitatively control alignment during training and inference, combining Training-time Realignment (TrRa) and Inference-time Realignment (InRa).

	- We realign DeepScaleR-1.5B model and reduce token usage without performance loss and even enhance reasoning capabilities.


	</div>

	</div>

	<div>
	<br>



	![img](./exp1.png)