Getting cloudbase right: the LCL formulations that didn't work
We tried four different ways of computing cloudbase from raw WRF output before settling on one that matched pilot reports. Here is what failed, and why.
`cloudbase_agl_ft` is the field pilots look at more than any other. It decides whether you can fly at all (no cloudbase, no soaring) and how high the day is worth. The forecast needs to be within a few hundred feet of what pilots actually see, otherwise the API has no credibility. Getting there took four separate formulations of cloudbase and about six weeks of validation debt.
Cloudbase in WRF is not a directly-output field. You have to post-process it from the temperature and moisture profiles. Formally it is the lifting condensation level (LCL), the altitude at which a parcel lifted dry-adiabatically from the surface reaches saturation. The trick is: which parcel? There are at least four reasonable choices.
Formulation 1 was the surface parcel LCL: take the 2 m temperature and dewpoint at the grid cell and lift that parcel. Easy, fast, and wrong. On most UK days it put cloudbase 400 to 1000 ft too low. The reason: the surface parcel sees the boundary layer at its coldest, moistest point. Actual cumulus cloudbase is driven by parcels mixed up through the PBL, which have lost some moisture to entrainment and warmed by contact with drier air aloft. A pure surface parcel systematically undersells cloudbase.
Formulation 2 was the PBL-mean parcel: average temperature and dewpoint through the lowest 500 m, lift that. Closer, but now overshooting by 200 to 500 ft on days with strong surface heating and undershooting on days with shallow boundary layers. The fixed 500 m averaging depth is the problem: it does not track the actual PBL depth, so on big days it under-averages (too moist) and on small days it over-averages (too dry).
Formulation 3 was the mixed-layer LCL: average over the actual PBL depth from the PBL scheme, lift that. Better still, but we hit a new problem. On days with residual moisture layers above the mixed layer, the LCL calculation was sensitive to where you put the top of the averaging region. A 50 m difference in boundary layer height estimate could move cloudbase by 300 ft. Reproducibility mattered to us more than peak accuracy on edge cases, and this formulation was noisy.
Formulation 4, the one we ship, is a hybrid. Start with a mixed-layer LCL over the lowest 70 percent of the PBL (this avoids the sensitivity to the exact PBL top), then correct for entrainment using a simple closure that accounts for dry-air mixing at the top of the PBL. The correction typically raises cloudbase by 150 to 300 ft relative to the pure mixed-layer LCL, and matches observed cloudbase across our validation set to about 200 ft RMS, which is roughly the spread you get from different pilots' altimeter settings anyway.
Three things helped once we committed to formulation 4. First, switching microphysics to Thompson (see the WSM6 detour post) made the model's own cloudbase agree better with the post-processed LCL, which is a cross-check that you are doing the right thing. Second, a bias correction step that nudges the final value by a small constant trained against pilot reports, which removes systematic low bias in very humid airmasses. Third, a sanity check that rejects LCL values more than 200 m above the PBL top, which catches post-processing glitches where the moisture profile goes weird.
What did not help: none of our attempts to use the model's own cloud fraction field gave a more reliable cloudbase than the post-processed LCL did. The cloud fraction field is too grid-dependent at 4 km, and it saturates to 0 or 1 too easily.
The LCL code lives in the post-processing pipeline, not in WRF itself. This was a deliberate decision: it lets us iterate on the formulation without re-running the model, and it means a validation failure is fixable in hours instead of days. Any future post-processing work (per-region tuning, integration of observed cloudbase as a correction) lands in the same place.
Cloudbase is the field we spent the most time on and the one I trust most as a result. `cloudbase_agl_ft` is live in the API, returned alongside `wstar_ms` and `hglider_agl_m` on every site query. The LCL glossary entry has the formula detail.