Pick the right post-training algorithm -- preference optimization, reasoning RL, and agent RL -- without drowning in research papers.