yanan

yananchen

AI & ML interests

None yet

Recent Activity

upvoted an article 27 days ago

Visualize and understand GPU memory in PyTorch

upvoted an article about 2 months ago

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

commented on an article about 2 months ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

View all activity

Organizations

None yet

upvoted an article 27 days ago

Article

Visualize and understand GPU memory in PyTorch

•

Dec 24, 2024

• 243

upvoted an article about 2 months ago

Article

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

•

Jan 19

• 28

commented on DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge about 2 months ago

hi there.
i think there is an error in your PPO description, actually, PPO does not explicitly penalize the KL divergence from the initial (reference) policy.

upvoted 2 articles about 2 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 211

Article

Proximal Policy Optimization (PPO)

•

Aug 5, 2022

• 53

updated a dataset 4 months ago

yananchen/agentbank_mixture

Viewer • Updated May 8 • 53.2k • 5

published a dataset 4 months ago

yananchen/agentbank_mixture

Viewer • Updated May 8 • 53.2k • 5

updated 2 datasets 9 months ago

yananchen/natural_plan__calendar_scheduling

Viewer • Updated Dec 19, 2024 • 1k • 21

yananchen/natural_plan__trip_planning

Viewer • Updated Dec 19, 2024 • 1.6k • 19

updated 11 datasets 10 months ago

yanan

AI & ML interests

Recent Activity

Organizations

yananchen's activity

Visualize and understand GPU memory in PyTorch

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Proximal Policy Optimization (PPO)