Question 1

What will I learn in this LLM evaluation course?

Accepted Answer

You build evaluation across three layers: a first runnable eval with a golden dataset and scorer, reliable LLM-as-judge rubrics calibrated against human ratings, and eval suites wired into CI as a merge gate. The path uses UK AISI's open-source Inspect AI framework and GitHub Actions.

Question 2

Who is this course for?

Accepted Answer

It is for engineers building production AI features who need to test LLM outputs rigorously instead of checking them by vibes. The level is advanced, with a focus on software engineering and AI reliability.

Question 3

Do I need to code to take this course?

Accepted Answer

Yes. This is a hands-on engineering path that involves writing eval scripts, configuring the Inspect AI framework, and setting up GitHub Actions workflows, so comfort with code and CI is expected.

Question 4

How long is the course and is there a certificate?

Accepted Answer

The path has 5 chapters totaling about 100 minutes, starting with a 12-minute orientation. On completion you earn an AI Academy by Anthropos certificate.

Question 5

Is this course free?

Accepted Answer

No, this is a paid skill path included with an AI Academy by Anthropos subscription.

LLM Evaluation

What you'll be able to do

What's inside

Frequently asked questions

Earn a certificate