All posts
Optimisation·5 min read

Cutting GFS ingest from minutes to two minutes with subset GRIB

The first step of any WRF cycle is downloading the global model fields that drive it. Pulling full GFS files takes a long time. Pulling only the subset we need takes about two minutes. Here is how the live ingest works.

Before WRF runs, it has to be told what the atmosphere looks like at the cycle start time and what the global driving model thinks the boundaries will do over the forecast window. Both come from GFS in our setup. That ingest step is invisible to anyone using the API but it is the first place where wall-clock time gets spent on every cycle, and it used to be the slowest part of the pipeline that was not WRF itself.

GFS publishes its forecast on the NOAA NOMADS server as GRIB2 files, one per forecast hour. A single full GFS file at the 0.25 degree resolution we use is roughly 500 MB and contains hundreds of fields covering the whole globe from the surface to the stratosphere. The naive approach is to download every file across the forecast horizon and let WPS pick out what it needs. For a 48 hour forecast at 3 hour intervals, that is 17 files, multiple gigabytes, and a download time that varies wildly depending on how busy NOMADS is.

The trick is that NOMADS supports server-side subsetting. You can hit the same files with a query that asks for only the fields and the geographic window you actually need. WRF only needs a specific set of variables (winds, temperature, humidity, surface fields, soil state) at a specific set of pressure levels, and only over the geographic area covered by the boundary forcing. For a UK domain, that is a tiny slice of a global dataset. Asking the server to do the slicing means we download tens of megabytes per cycle instead of multiple gigabytes.

The shipping ingest provider is `gfs_nomads_subset_experiment`, which has been the live default since the trimmed-domain rollout. On the validated `2026-04-10 12z` cycle, the subset ingest pulled the full 48 hour horizon of GFS in about two minutes. The same cycle on the full-file ingest path took roughly six and a half minutes - and that was on a quiet day for the NOMADS server. On a busy day the full-file path can be much worse, with no obvious way to predict it.

Why this matters for forecast availability. The total wall-clock from cycle initialisation to API artifacts being published is the number that defines how fresh the forecast is when a pilot looks at it. WRF itself takes around 13 minutes on the trimmed domain. Post-processing takes about 30 seconds. Ingest used to take 6 to 10 minutes, sitting on the critical path before WRF could even start. Cutting it to 2 minutes brings the end-to-end pipeline down to 16 minutes 20 seconds for the validated cycle, with the saving showing up directly in how soon the morning forecast lands.

Why the experiment label is still there. NOMADS subset queries occasionally fail (the server returns an error rather than a partial file), and we still want to be able to fall back to the full-file path when that happens. The provider plumbing supports both, and the live scheduler picks subset by default but degrades gracefully if subset is unhealthy. Once we have logged enough cycles to be confident the subset path is reliable in steady state, the experiment label comes off and the full-file path becomes the explicit fallback rather than the implied default.

What is on the next-step list. Concurrent downloads across forecast hours - currently the ingest pulls files one at a time, and bounded parallelism is implemented but not yet validated as the live path. Better caching of geogrid output so the static parts of WPS do not re-run every cycle. And a longer-term move to other providers (HRRR for short-range over North America when we add coverage there, ICON for European domains) where the subset story is different again.

For users of the API, this all happens before any artifact you see. But it is the reason the morning 06Z run lands roughly when the morning weather check is happening, rather than 30 minutes later. The shipping pipeline is documented on the model page.