--- title: RLHF Pairwise Annotation Demo emoji: 🎯 colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit datasets: - openbmb/UltraFeedback --- # 🎯 AI Alignment: Binary Preference Annotation This app simulates the data annotation process used in RLHF (Reinforcement Learning from Human Feedback) training. Users compare two AI completions and select which one is better. ## How it works 1. The app loads random examples from the UltraFeedback dataset 2. Users see a prompt and two AI completions 3. Users select which completion is better or skip if unsure 4. All annotations are saved to a public dataset for research purposes