Course · 5 chapters
LLM-Evaluierung
Build runnable LLM evals you can trust: golden datasets, deterministic scorers, calibrated LLM judges, Inspect AI suites, and CI gates. 5 chapters, advanced, for engineers.
What you'll be able to do
- Build a runnable LLM eval from scratch
- Design judge rubrics that resist bias
- Calibrate LLM judges against humans
- Run eval suites with Inspect AI
- Gate bad merges with CI evals
- Set thresholds that survive flaky judges
What's inside
- 1LLM-Evaluation: Hier starten
Eine 12-minütige Orientierung zum Skill Path LLM-Evaluation – das Gateway-Kapitel, dann die drei Schichten (Judges, Suites, Gates), die Eval-nach-Bauchgefühl in eine Disziplin verwandeln, die ausliefert.
- 2Eval-Grundlagen: Dein erstes LLM-Eval in 30 Minuten
Schluss mit Bauchgefühl-Checks. Baue ein lauffähiges Eval — Golden Dataset, deterministischer Scorer, LLM-Judge — und lies das Ergebnis wie ein Engineer.
- 3LLM-as-Judge: Rubrics, Bias und Reliabilität
Entwirf Judges, die CALM-Biases überleben, kalibriere sie gegen Menschen und verdiene ihnen einen Platz in deinem CI-Gate.
- 4Inspect AI: Produktionsreife Eval-Suiten im großen Maßstab
Erstelle, führe aus und visualisiere Frontier-Grade-Eval-Suiten mit dem Open-Source-Framework von UK AISI.
- 5Eval-Gating in CI: Schlechte Merges blockieren
Verdrahte Per-PR-Evals mit GitHub Actions, wähle Schwellenwerte, die Flakiness überstehen, und entscheide, wann ein Gate auf main gehört.
Frequently asked questions
- What will I learn in this LLM evaluation course?
- You build evaluation across three layers: a first runnable eval with a golden dataset and scorer, reliable LLM-as-judge rubrics calibrated against human ratings, and eval suites wired into CI as a merge gate. The path uses UK AISI's open-source Inspect AI framework and GitHub Actions.
- Who is this course for?
- It is for engineers building production AI features who need to test LLM outputs rigorously instead of checking them by vibes. The level is advanced, with a focus on software engineering and AI reliability.
- Do I need to code to take this course?
- Yes. This is a hands-on engineering path that involves writing eval scripts, configuring the Inspect AI framework, and setting up GitHub Actions workflows, so comfort with code and CI is expected.
- How long is the course and is there a certificate?
- The path has 5 chapters totaling about 100 minutes, starting with a 12-minute orientation. On completion you earn an AI Academy by Anthropos certificate.
- Is this course free?
- No, this is a paid skill path included with an AI Academy by Anthropos subscription.
Earn a certificate
Complete all chapters to receive your certificate of completion.