Post-Training: DPO, GRPO & RL for LLMs0%
0%
1 new paths
Mastery Course

Post-Training: DPO, GRPO & RL for LLMs

0%

Pick the right post-training algorithm -- preference optimization, reasoning RL, and agent RL -- without drowning in research papers.

6 modules·~17 min