RL-HF

Help Train an AI Assistant with RLHF (Reinforcement Learning from Human Feedback)

Join our open research project and contribute to the future of AI! By comparing and ranking different chatbot responses, you'll provide valuable human feedback that helps improve large language models (LLMs) after pretraining. This process, called Reinforcement Learning from Human Feedback (RLHF), is essential for making AI assistants more accurate, helpful, and aligned with real user needs. Select the best response below and help shape the next generation of AI assistants.

Loading prompt...
Option A
Option B