TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs16-rsstatic Updated about 3 hours ago
TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs16-rsnone Updated 21 days ago
TrandeLik/augmentedrewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs16 Updated about 1 month ago
TrandeLik/vanilarewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs32 Updated Aug 10
TrandeLik/vanilarewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs8 Updated Aug 9
TrandeLik/augmentedrewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs8 Updated Aug 9