PPO
Model Details
This repository features a Proximal Policy Optimization (PPO) model trained as part of VK's LLM course.
Model Description
PPO (Proximal Policy Optimization) is a well-established reinforcement learning approach applied to various tasks. Its versatility has made it one of the most widely used RL methods, and many production models that incorporate reinforcement learning rely on PPO during training.
Unlike DPO, PPO comes with several challenges inherent to reinforcement learning. One of the key difficulties is its strong dependence on the quality of the Reward model and training hyperparameters. In this case, HuggingFaceTB/SmolLM-135M-Instruct is used as the reward model. Trained PPO showed slightly worse results compared to trained DPO model.
Examples
Before fine-tuning
user
What's your morning routine like?
assistant
I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:
Morning Routine (10-15 minutes)
- Hydrate: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
- Eat a nutritious breakfast: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with spinach, or Greek yogurt with berries.
- Get some morning sunlight: Spend 10-15 minutes outside in natural light to help regulate your circadian rhythms.
- Take a few deep breaths: Inhale for 1-2 minutes, hold for 2-3 minutes, and exhale for 2-3 minutes.
- Stretch or move: Engage in some light stretching or movement to get your blood flowing and your muscles moving.
After fine-tuning
user
What's your morning routine like?
assistant
I'm excited to share my morning routine with you! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what I typically do in my morning routine. Here's a sample routine:
Morning (10:00 am - 1:00 pm)
- Wake up and stretch
- Stretching or yoga to get your body moving
- Taking a few deep breaths to calm your mind and body
- Hydrate and fuel your body with a nutritious breakfast
- Eating a balanced breakfast to give you energy and focus
- Avoiding sugary or caffeinated drinks
- Exercise or engage in some light physical activity (e.g., yoga, walking, or jogging)
- This can help increase energy levels and improve focus
Evening (1:00 pm - 4:00 pm)
- Prepare for the day ahead by:
- Reviewing my schedule and tasks
- Planning out my day and activities
- Writing down my thoughts and goals
- Reviewing my notes and readings
- Review my notes and readings to refresh my mind and stay focused
Analysis
References
- Model: HuggingFaceTB/SmolLM-135M-Instruct
- Dataset: HumanLLMs/Human-Like-DPO-Dataset
- Original Article: https://arxiv.org/abs/1707.06347
- Downloads last month
- 4