TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs16-rsstatic Updated about 6 hours ago
TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs16-rsnone Updated 12 days ago
TrandeLik/augmentedrewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs16 Updated 22 days ago
TrandeLik/vanilarewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs32 Updated 23 days ago
TrandeLik/vanilarewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs8 Updated 24 days ago
TrandeLik/augmentedrewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs8 Updated 24 days ago