Ship agents you can verify — mock tools, trace decisions, build golden datasets, and gate deploys on eval results.