Edit Models filters

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

17

Full-text search

Active filters: PPO

fb700/chatglm-fitness-RLHF

Updated Mar 6, 2024 • 268

fb700/Bofan-chatglm-Best-lora

Updated Aug 24, 2023 • 3 • 11

sehyun66/Tiny-lama-1.3B-chat-ppo

Question Answering • Updated Jan 13, 2024

Lichang-Chen/ODIN-ppo-L230-best

Text Generation • Updated Feb 14, 2024 • 5

vibhorg/rl4llm_uofm_nlpo_super_t5_arxiv

Updated Mar 20, 2024 • 3

vibhorg/rl4llm_uofm_nlpo_unsuper_t5_arxiv

Updated Mar 20, 2024 • 8

Fizzarolli/sapphia-410m-RM

Updated Apr 2, 2024 • 4

pt-sk/GPT2-IMDB-Sentiment-FineTuned-with-PPO

Text Generation • 0.1B • Updated Jun 25, 2024 • 2

pt-sk/GPT2_NonToxic

Text Generation • 0.1B • Updated Jul 15, 2024 • 6

Kwaai/GPT2_NonToxic

Text Generation • 0.1B • Updated Jul 20, 2024 • 5

Nagi-ovo/Llama-3-8B-PPO

Text Generation • 8B • Updated Jan 21 • 7

sthenno/tempesthenno-ppo-ckpt40

15B • Updated Feb 19 • 7 • 4

xi0v/tempesthenno-ppo-ckpt40-archive

15B • Updated Mar 4

TEEN-D/RxRovers_Roaming_for_Rapid_Relief

Reinforcement Learning • Updated Mar 30

estnafinema0/smolLM-variation-ppo

Text Generation • 0.1B • Updated Mar 30 • 8

FlameF0X/CanoPy

Reinforcement Learning • Updated 8 days ago

AntonDergunov/LunarLander_PPO

Reinforcement Learning • Updated about 4 hours ago