Build eval suites, catch prompt regressions, and stop shipping on vibes. Practical evaluation patterns for AI engineers.