Taneesh Gupta's picture

1

Taneesh Gupta

gupta-tanish

·

https://tanish-g.github.io/

AI & ML interests

Post-Training @MicrosoftResearch

Recent Activity

updated a dataset 6 days ago

gupta-tanish/Qwen2.5-14B-Instruct-top-vs-bottom

updated a dataset 6 days ago

gupta-tanish/Qwen2.5-32B-Instruct-top-vs-bottom

published a dataset 6 days ago

gupta-tanish/Qwen2.5-32B-Instruct-top-vs-bottom

View all activity

Organizations

None yet

Papers 4

arxiv:2502.18293

arxiv:2412.04628

arxiv:2410.21545

arxiv:2407.08726

models 26

gupta-tanish/llama-off-policy-qwq-10k-perturbation-iter1

Text Generation • 8B • Updated 29 days ago • 129

gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-1.0-iteration2

Text Generation • 8B • Updated Jun 9 • 3

gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-20.0-iteration1

Text Generation • 8B • Updated Jun 8 • 3

gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-1.0-eos-increase-iteration2-lamda-0.1

Text Generation • 8B • Updated Jun 7 • 3

gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-0.1-eos-increase-iteration2

Text Generation • 8B • Updated Jun 7 • 3

gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.001-lr-1e-6-iteration1

Text Generation • 8B • Updated Jun 7 • 3

gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.01-lr-1e-6-iteration1

Text Generation • 8B • Updated Jun 7 • 3

gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.1-lr-1e-6-iteration1

Text Generation • 8B • Updated Jun 6 • 2

gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-1.0-lr-1e-6-iteration1

Text Generation • 8B • Updated Jun 6 • 3

gupta-tanish/mistral-7b-instruct-refa-iteration2

Text Generation • 7B • Updated Jun 1 • 4

datasets 156

gupta-tanish/Qwen2.5-14B-Instruct-top-vs-bottom

Viewer • Updated 6 days ago • 62.1k • 122

gupta-tanish/Qwen2.5-32B-Instruct-top-vs-bottom

Viewer • Updated 6 days ago • 62.1k • 55

gupta-tanish/Llama3-8B-Instruct-16-Responses-16-bins-MPO

Viewer • Updated 7 days ago • 60.8k • 112

gupta-tanish/Qwen2.5-32B-Instruct-top2vsbottom2-selection

Viewer • Updated 11 days ago • 62.1k • 96

gupta-tanish/Qwen2.5-14B-Instruct-top2vsbottom2-selection

Viewer • Updated 11 days ago • 62.1k • 127

gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-8-binned-data

Viewer • Updated 11 days ago • 60.8k • 96

gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-4-binned-data

Viewer • Updated 11 days ago • 60.8k • 101

gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-2-binned-data

Viewer • Updated 11 days ago • 60.8k • 126

gupta-tanish/Ultrafeedback-Binarized-max-score-diff

Viewer • Updated 11 days ago • 219k • 99

gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-logp-10

Viewer • Updated 17 days ago • 59k • 104

View 156 datasets