Post-Training: DPO, GRPO & RL for LLMs0%
0%
2 nuevas rutas
Curso de dominio

Post-Training: DPO, GRPO & RL for LLMs

0%

Pick the right post-training algorithm -- preference optimization, reasoning RL, and agent RL -- without drowning in research papers.

Parte de: Fundamentos de ingeniería de IA

6 módulos·~17 min