Academy
LLM Evaluation
LLM-as-Judge: Rubrics, Bias, and Reliability
0%
0%
1 new paths
Buy AI Academy
$399
$349/yr
Log in
Sign up free
Log in
Buy AI Academy —
$399
$349/yr
LLM Evaluation
/
LLM-as-Judge: Rubrics, Bias, and Reliability
0%
AI Academy
›
AI for Engineers
›
LLM Evaluation
›
LLM-as-Judge: Rubrics, Bias, and Reliability
LLM Evaluation · Lesson 3 of 5
LLM-as-Judge: Rubrics, Bias, and Reliability
0%
Mastery Course
LLM-as-Judge: Rubrics, Bias, and Reliability
0
%
Design judges that survive CALM biases, calibrate against humans, and earn a place in your CI gate.
6 modules
·
~13 min
Continue
MODULE 1
Why Naive Judges Fail
The reliability problem with LLM-as-judge
· ~2 min
→
MODULE 2
The CALM Threat Landscape
The 12 biases you have to defend against
· ~3 min
MODULE 3
Rubric Design That Survives
From vague scores to criteria that carry the signal
· ~3 min
MODULE 4
Calibrating Against Humans
Inter-rater agreement is the only real validation
· ~2 min
MODULE 5
Judge Model Selection
Size, self-consistency, and ensemble voting
· ~2 min
MODULE 6
Shipping a Judge You Trust
The production checklist
· ~2 min