The first piece of validation: a baseline vs candidate diff at named points
Every previous optimisation post says 'we'll measure this when validation lands'. The first piece of that pipeline shipped this month - a head-to-head diff between two cycle runs at named pilot sites. Here is what it does, what it does not, and why it was the right first step.
