← Back to research

Evals

Unified reasoner evaluation science
Unified reasoner evaluation science

Ada's Unified Reasoner replaced its Modular Reasoner with a re-baselined eval harness and Legitimacy Classifier. Adversarial pass rate: 88% to 97%.

Delta evaluation: Production replay pipeline
Delta evaluation: Production replay pipeline

DE replays production conversations through modified prompts & models, using an LLM-as-judge. Verdicts aggregate into win-rate metrics; traces into themes.