All posts
Optimisation·5 min read

Validation: where the forecast stands

We do not yet have an automated verification pipeline. Here is what we currently rely on to know whether the forecast is working, what we plan to build, and how to spot a bad day in the meantime.

Every field in the API carries an implicit claim of accuracy. If `day_rating` says `good`, pilots who go flying on the strength of it should get a genuinely good-day experience. If `hglider_agl_m` says 1800 m, climbs should top out near 1800 m. None of this is automatic - forecasts can be self-consistent and entirely wrong. Validation is the pipeline that checks the model against reality, and right now we do not have one in production.

What we do have today. End-to-end runs that complete cleanly four times a day, artifacts published to R2 on a heartbeat, and a published configuration whose physics choices are all defensible defaults from the operational WRF literature. That gets you a forecast that is shaped right - the diurnal cycle behaves correctly, cloudbase moves with synoptic moisture, day rating tracks the obvious good-and-bad day pattern. It does not get you quantitative bias estimates against any independent observation.

What a proper validation pipeline looks like, and what it costs. The standard play is two streams: radiosonde soundings (Larkhill and Herstmonceux are the closest UK launch sites that publish on schedule) compared against the model's forecast sounding at the same point and time, and pilot-facing verification (XContest activity, club logbooks, instrument vario data if anyone wants to share) compared against `day_rating` and `wstar_ms`. Neither is conceptually hard. Both involve plumbing - automatic ingest of observations, time-matching to the right forecast cycle, and persisting the results in a way that drives a public dashboard rather than a one-off PDF. That is build work that does not exist yet.

What you can do today to spot a bad forecast. Three tactics. First, watch the synoptic context. If a fast-moving wet front cleared through last night and the morning forecast still says low cloudbase and warm trigger, the soil is probably wetter than the model thinks. Second, cross-check against any other available source - the Met Office UK forecast, a club's local pressure pattern read, your own pilot intuition for the airmass. Wide disagreement is a signal. Third, treat the first usable cycle of any day as more credible than the previous evening's run for that same day - a fresher initialisation usually beats a longer lead time.

What you should not assume. Do not read any specific quantitative claim about Convek's accuracy into a number that does not have a published source behind it. We have not published bias or RMSE numbers because the validation infrastructure to produce them is still being built. We will publish them when we have measured them.

What is on the roadmap. The first piece is automated 12Z sounding comparison against Larkhill and Herstmonceux, with the results landing in a public dashboard rather than a slide deck. The second is a structured way for pilots and instrument makers to feed back observed cloudbase and observed climb rates against forecasts at known launch sites. Both are 2026 deliverables. Both will be flagged on this blog when they land.

If you are an instrument maker, club, or app developer who has structured flight data (varios, traces, club logs) and would be willing to share anonymised samples to feed verification, that is the single most useful thing anyone outside Convek can do for the forecast quality. Contact details on the contact page.