ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated Jul 6 • 4
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated Jul 6 • 16
tensorblock/Nellyw888_VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb-GGUF Reinforcement Learning • 7B • Updated 9 days ago • 134
mradermacher/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct-GGUF Reinforcement Learning • 0.6B • Updated 8 days ago • 167
mradermacher/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct-GGUF Reinforcement Learning • 2B • Updated 8 days ago • 382