Reinforcement learning from human feedback

This note has no content.