All posts
Optimisation·6 min read

Why the live UK forecast runs on a trimmed domain (and is roughly 5x faster as a result)

Comparing a full-UK 4 km box against a tighter trimmed domain on the same cycle, the trimmed run finished WRF in about 13 minutes instead of nearly an hour. Here is the trade and why we made it.

The most expensive thing you can buy in a WRF pipeline is grid cells. Halve the grid spacing and the run gets roughly 8 times slower. Double the area covered at the same spacing and the run gets twice as slow. There is no way around this: WRF's compute scales linearly in cell count and roughly inverse-linear in time step. So the very first lever you have, before any physics tuning, is how big a box you decide to simulate.

We benchmarked this directly on the live host. Two runs on the same `2026-04-10 12z` cycle, same physics, same vertical levels, same provider, only the domain extent differing. The full-UK box (`uk_4km_full`, covering most of the British Isles plus a working buffer) took ~58 minutes of WRF wall-clock and ~75 minutes end-to-end including ingest and post-processing. The trimmed box (the live `uk_4km` configuration: 6 W to 2 E, 50 N to 56.5 N) took ~13 minutes of WRF wall-clock and ~16 minutes end-to-end. Five times faster, on the same machine, on the same source data, with the same physics.

What you trade. The trimmed domain cuts off most of Ireland, the north of Scotland, and the western approaches. Anyone soaring in those areas does not get useful forecasts from the live `uk_4km` artefact. They will when we add a separate domain for them, but they do not today. The coverage page shows what the trimmed box covers and what it does not.

Also: the western and southern edges of the trimmed box sit closer to UK soaring areas than is academically ideal. The outermost grid cells of any WRF run are contaminated by boundary forcing from the driving global model (GFS in our case), and a tighter domain pushes that contaminated rim closer to populated soaring areas like Cornwall and Pembrokeshire. On south-westerly flow days it is the first thing to look at if a forecast feels wrong, and it is on the list of things the validation work in the queue is set up to quantify.

What you buy. Three things. First, the obvious: four cycles a day fit comfortably on a single Hetzner box, with headroom left for occasional re-runs and experiments. The full-UK box would not. Second, faster turnaround means the morning 06Z forecast lands by mid-morning, which is when pilots are actually checking it - the lag between cycle initialisation and forecast availability is what users feel, not the academic correctness of the domain. Third, all the post-processing and artifact publishing downstream of WRF runs in roughly half a minute, so the end-to-end pipeline feels snappy.

There is one finding worth flagging that came out of the benchmark. We had been running the trimmed domain with `cu_physics = 0` (cumulus parameterisation off, on the theory that 4 km is on the boundary of resolved convection). The benchmark showed it gave no runtime win on this hardware once the same domain and provider were held constant, and the cu-on configuration produced cleaner outputs in the boundary-layer fields. So the live `uk_4km` configuration today runs `cu_physics = 1` (Kain-Fritsch on) and quilting disabled. Sometimes the obvious optimisation does not actually help.

What would change the trade. Two things would push us toward a bigger box. Either compute getting cheaper at this size (a multi-day horizon at 1 km, say, might justify a chunkier box for the spin-up region), or pilot demand for north Scotland or Ireland reaching the point where a separate trimmed domain for those areas is worth standing up. Both are on the roadmap, neither is imminent.

The shipping configuration ran the validated `uk_4km` cycle in 13 minutes 47 seconds of WRF wall-clock and 16 minutes 20 seconds end-to-end on the upgraded worker. Those numbers are kept up to date in the operations notes; current status lives at the top of the model page.