All posts
Optimisation·5 min read

Sounding and XContest validation: the comparison we plan to build

We do not have automated validation against radiosonde soundings or XContest flight activity yet. Here is what we plan to build, and why it is non-trivial.

There are two independent things you can verify a soaring forecast against. The atmosphere itself, via radiosonde soundings: temperature, humidity, wind as a function of height, twice a day from a handful of UK launch sites. And the pilot experience, via flight activity: where people flew, how far, and on which days the forecast told them to. The verification streams the validation pipeline will be built around come from those two sources, plus a third more speculative one (instrument feedback). Each has its own complications. The current overall state of validation is covered in the validation post - this post is the specific version: what each stream actually involves.

Stream one: 12Z radiosonde soundings against the model. The UK and nearby upper-air network has a small set of useful stations, with varying schedule reliability. The conceptual operation is straightforward: every cycle, extract the model's forecast sounding at the radiosonde site location, time-matched to the observation, and score boundary layer height and LCL against the observed profile. The non-trivial parts are: (a) automated ingest of the observation files in their published format, (b) consistent definition of boundary layer height between model and observation (the schemes use different criteria, so an apples-to-apples comparison needs care), and (c) persistence of the results in a way that produces a public dashboard rather than a stack of one-off plots.

Stream two: XContest correlation with day rating. XContest publishes daily flight activity that can be correlated against `day_rating` at known launch sites. The non-trivial parts are: (a) terms of service for automated access to flight data, (b) handling the obvious confounders (weekday vs weekend flying, school holidays, weather visibility distinct from soaring quality), and (c) the long sample window required - meaningful correlation needs months of summer data, not weeks.

Stream three, more speculative: structured pilot or instrument feedback on observed cloudbase and observed climb rates. Some integrators may be willing to pipe back anonymised summary data - observed cloudbase at known sites, observed average climb rates by hour - that would give us direct verification of the fields the API actually serves. This is the highest-value verification source if it can be set up, because it speaks directly to what pilots care about rather than to atmospheric quantities they care about indirectly.

What is in the way of just building it. Engineering capacity. Each stream is plumbing work that does not directly improve the forecast but is a precondition for any quantitative claim about forecast quality. We have prioritised getting the live forecast pipeline solid, the API endpoints out the door, and the site usable for early integrators ahead of building the verification work that would let us quote bias numbers in marketing material. That is an explicit choice. It is also the choice that has to flip soon.

What we plan to ship first. The 12Z sounding stream, against selected UK upper-air sites, with a public dashboard. That is the smallest meaningful verification deliverable: it gives us defensible numbers on boundary layer height and cloudbase against an established observation source, on a daily cadence, with a public artifact that anyone can check. Whatever it shows when it lands gets published here, alongside the broader validation roadmap.