Headline variables: WM-2 predicts core forecasting metrics across the Earth’s surface and atmosphere about 8% to 24% more accurately than GFS, HRES, and Google DeepMind’s GraphCast over equivalent time horizons. WM-2 also outperforms or performs as well as Microsoft’s Aurora model on 79% of the targets evaluated while being one-quarter of the size by parameter count and nearly 30x faster in inference time.
Hurricanes: WM-2 predicts hurricane ground tracks better than GFS at almost all forecast lead times up to seven days.
Cold snaps: WM-2 outperforms IFS HRES on daily low-temperature prediction for almost the entire duration of the December 2022 “North America cold air outbreak.”
2m Temperature: WM-2 outperforms IFS HRES in nearly all cities studied domestically and internationally across 1- to 14-day lead times. WM-2’s 14-day forecasts also nearly always outperform IFS’s 10-day forecasts, underscoring WM-2’s superior accuracy even at longer lead times.
This month, we announced the latest version of our record-breaking global, medium-range AI-based weather model: WeatherMesh-2.
We also shared some exciting accuracy gains: our model predicts core forecasting metrics across the Earth’s surface and atmosphere, including geopotential, wind speed and direction, temperature, precipitation, cloud cover, solar radiation, pressure, and humidity about 8% to 24% more accurately than the Global Forecasting Service (GFS) and the European Center for Medium-Range Weather Forecasting (ECMWF)’s HRES.
Among AI-based models WM-2 achieves the same accuracy gains over Google DeepMind’s GraphCast over equivalent time horizons; WM-2 also outperforms or performs as well as Microsoft’s Aurora on 79% of the targets evaluated while being one-quarter of the size by parameter count and nearly 30x faster in inference time. In many cases, WM-2 maintains this accuracy lead over other gold-standard models even when predicting further into the future.
WM-2 Performance Across Surface and Atmospheric Variables
Performance call-outs:
For 2 meter temperature, WM-2 is 14% more accurate than GraphCast at 14 days, 19% more accurate than HRES (10 days), and 23% more accurate than GFS (14 days).
For 500mb geopotential, WM-2 is 8% more accurate than GraphCast at 14 days, 13% more accurate than the ECMWF’s’ HRES at 10 days, and 19% more accurate than GFS at 14 days.
For 10 meter winds, WM-2 is 8% more accurate than GraphCast, 18% more accurate than HRES (10 days), and 21% more accurate than GFS (14 days)
Across headline variables and lead times, WM-2 outperforms or performs as well as Microsoft’s Aurora on 79% of the targets evaluated, while being one quarter of the size by parameter count and nearly 30x faster in inference time.
WM-2 Case Study Overview
We tested our model in a range of retrospective case studies, including:
The 2024 Atlantic Hurricane Season, which featured eight severe tropical storms, including Helene, Milton, Beryl, and more
2022 Winter Storm Cold Snaps, capturing a particularly severe cold snap season known as the “North America cold air outbreak of 2022,” and
Surface Temperature in Global Cities from January - February 2024, validated against observations collected across domestic and international cities
I. Atlantic Hurricane Study 2024 - Ground Track Error
What it covers: The chart shows the average ground track error as predicted by WM-2 and the National Weather Service’s GFS model across eight major storms in the 2024 Atlantic hurricane season.
Notable findings
WM-2 predicts hurricane ground tracks better than GFS at almost all forecast lead times up to seven days, despite major operational models performing well overall.
WM-2 also outperforms GFS by a larger margin at longer lead times.
Why it matters: Accurate hurricane track predictions are crucial for emergency preparedness. With even small improvements in accuracy, we can potentially save lives and better manage resources leading up to and during severe storms.
II. Winter Storm Cold Snaps 2022
What it covers: The chart compares the “root mean square error” (RMSE), a measure of forecast accuracy of WindBorne’s WM-2 vs the IFS’ HRES, of 2 meter temperature forecasts for North America during the “cold air outbreak” of December 2022. The lower the RMSE, the higher the accuracy. Stronger WM-2 performance is shown in blue, while stronger HRES performance is shown in red.
Notable findings
WM-2 outperforms IFS HRES on temperature prediction for almost the entire duration of the December 2022 North America cold air outbreak.
WM-2 is particularly strong at forecasting the start and end periods of a cold air outbreak, anticipating the transition points of the cold snap more accurately.
Why it matters: Accurate temperature forecasting during extreme cold events is essential for utilities to manage power grid demands, municipalities to plan emergency services, and for the public to protect itself from dangerous conditions.
Heat Bias Comparison
Our analysis found that the IFS HRES model has a “warm bias,” meaning it tends to predict that temperature will be warmer than it actually ends up being. HRES consistently underestimated how cold it would get for the entire duration of an extreme cold event.
WM-2 demonstrates significantly improved accuracy, indicating superior performance during extreme weather events.
III. Surface Temperature Across Global Cities
What it covers: The table compares surface temperature forecasts by WM-2 and IFS HRES across a range of domestic and international cities, validated against real-life observations from weather stations. Blue signals that WM-2 performs better, while red means IFS performed better.
Notable findings
WM-2 outperforms IFS in nearly all cities across 1- to 14-day lead times.
WM-2’s substantial performance gains are high globally, pronounced across both domestic and international cities.
WM-2’s particularly strong performance in Toronto, Canada and Boston, Mass. for the winter period highlights WM-2’s strength at predicting extremes.
Why it matters: Accurate temperature forecasts across diverse global locations enable better planning for everything from energy consumption to agricultural operations.