gupta-tanish/llama-off-policy-qwq-10k-perturbation-iter1 Text Generation • 8B • Updated 29 days ago • 129
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-1.0-iteration2 Text Generation • 8B • Updated Jun 9 • 3
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-20.0-iteration1 Text Generation • 8B • Updated Jun 8 • 3
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-1.0-eos-increase-iteration2-lamda-0.1 Text Generation • 8B • Updated Jun 7 • 3
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-0.1-eos-increase-iteration2 Text Generation • 8B • Updated Jun 7 • 3
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.001-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 7 • 3
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.01-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 7 • 3
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.1-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 6 • 2
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-1.0-lr-1e-6-iteration1 Text Generation • 8B • Updated Jun 6 • 3
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-8-binned-data Viewer • Updated 11 days ago • 60.8k • 96
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-4-binned-data Viewer • Updated 11 days ago • 60.8k • 101
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-2-binned-data Viewer • Updated 11 days ago • 60.8k • 126
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-logp-10 Viewer • Updated 17 days ago • 59k • 104