AI-Driven Weather Forecasting: Transforming Meteorology with Machine Learning

Introduction
Numerical Weather Prediction
FourCastNet
Pangu-Weather
GraphCast
Model Comparison
Ethical Implications
Conclusion
References

1. Introduction: When AI Meets the Atmosphere

Every day, weather forecasts quietly shape our lives. A storm warning can empty beaches and stock supermarket shelves. A temperature spike can shift energy demand across entire continents. In agriculture, aviation, disaster response, and climate science, accurate weather prediction isn’t just useful—it’s vital.

For decades, Numerical Weather Prediction (NWP) has been the gold standard. These physics-based models simulate the Earth’s atmosphere by solving massive systems of partial differential equations. They are scientifically rigorous but also notoriously expensive—requiring supercomputers, hours of computation, and still falling short when it comes to fast-changing or extreme events.

But a new kind of forecast is on the horizon. Fueled by advances in deep learning and access to decades of reanalysis data, AI-based weather models are beginning to rival—and in some cases surpass—traditional systems. These models don’t simulate the laws of physics directly. Instead, they learn the behavior of the atmosphere from data, identifying patterns invisible even to trained meteorologists.

In this blog, we dive into the rising frontier of AI-driven weather forecasting. We’ll explore leading-edge systems like GraphCast, Pangu-Weather, and FourCastNet—each pushing the boundaries of speed, resolution, and skill. Along the way, we’ll also examine the ethical challenges these models raise: fairness across regions, interpretability, public trust, and the role of humans in an increasingly automated forecasting landscape.

What happens when machine learning begins to understand the sky?
Let’s find out.

2. Numerical Weather Prediction: Foundations, Evolution, and Grand Challenges

For over 70 years, Numerical Weather Prediction (NWP) has served as the scientific foundation of operational forecasting. By solving partial differential equations (PDEs) based on the conservation of mass, momentum, energy, and moisture, NWP simulates atmospheric dynamics across the globe.

From Theory to Operational Systems

The idea of predicting weather via physics-based equations was introduced in the early 20th century by Cleveland Abbe and Vilhelm Bjerknes. Practical forecasting became feasible in 1950 when the first numerical hindcast was executed using the ENIAC computer at Princeton University. Real-time forecasting began in 1954, and by 1979, the ECMWF Integrated Forecasting System (IFS) was launched, setting a global standard for medium-range weather prediction.

How NWP Works

Global NWP models divide the Earth into a three-dimensional grid, with horizontal resolution in latitude–longitude and vertical levels based on height or pressure. The evolution of atmospheric variables is calculated by numerically solving PDEs at each grid point.

Schematic for Global Atmospheric Model

Structure of a global atmospheric model showing horizontal and vertical grids.
Image source: NOAA GFDL.

Modern systems like ECMWF’s IFS run:

At 0.1° spatial resolution with ~2 million grid points
With ~200 vertical levels
Using 10-minute timesteps, twice daily, for 10-day forecasts

Because the atmosphere is chaotic and sensitive to initial conditions, NWP employs ensemble forecasting. By running multiple simulations with slight variations in initial inputs, NWP systems generate a spread of possible outcomes, providing probabilistic information for decision-makers.

Ensemble forecasting and precipitation probability — Ensemble forecasting accounts for uncertainty by producing a range of forecasts. The resulting ensemble distribution can be used to calculate probabilities (e.g., for precipitation). Image source: Nature, Palmer & Stevens (2015).

NWP accuracy has improved significantly in recent decades, driven by better models, increased computing power, and improved data assimilation. Also, the prediction time has gradually increased from 3 days and 5 days to 7 days or even 10 days.

Forecast skill improvement over time — Forecast skill from 1981–2013 for 3-, 5-, 7-, and 10-day lead times. Northern Hemisphere (NH) forecasts are consistently more accurate than those in the Southern Hemisphere (SH) due to denser observational coverage. Source: ECMWF.

Challenges in NWP

Despite its success, NWP is reaching computational and structural limits. Major challenges include:

Complexity – Earth system processes like turbulence, cloud microphysics, and ocean–atmosphere coupling are difficult to model accurately.
Resolution – Higher resolution improves accuracy but increases compute demands exponentially.
Dimensionality – Trillions of variables make simulations memory- and compute-intensive.
Ensemble Size – More ensemble members improve reliability but amplify cost.
Scenario Diversity – Climate models must simulate a wide range of possible futures (e.g., emissions paths).
Throughput – Real-time forecasting requires high-speed computing pipelines.
Scalability – As resolution increases, data movement, not computation, becomes the bottleneck.
Interactivity – Policymakers demand real-time, interactive “digital twins”—which traditional NWP systems struggle to support.

A Paradigm Shift: From Equations to Data

NWP’s equation-based precision is unmatched, but its scalability and computational demands are increasingly constrained. In response, a new wave of AI-based weather models is emerging—trained not to simulate physics explicitly, but to learn directly from decades of atmospheric data.

Prominent examples include:

FourCastNet (2022) – by NVIDIA and Lawrence Berkeley National Lab
Pangu-Weather (2023) – by Huawei
GraphCast (2023) – by Google DeepMind

These models are trained on ERA5 reanalysis and generate high-resolution global forecasts in seconds, using a fraction of the energy and hardware required by traditional numerical models. Their performance increasingly rivals—and sometimes exceeds—state-of-the-art NWP systems on key metrics like RMSE and anomaly correlation.

What’s more, these models are now being operationally visualized and evaluated through the ECMWF AI Forecast Viewer, with direct access to their outputs:

Together, these breakthroughs mark the emergence of a data-driven forecasting paradigm—faster, more efficient, and increasingly critical in a world shaped by climate extremes and the need for timely, actionable information.

3. State-of-the-Art AI Models for Weather Forecasting

3.1 FourCastNet: A DL model with comparable accuracy to IFS

FourCastNet, developed by NVIDIA, is a pioneering AI-driven global weather forecasting model that leverages deep learning and spectral methods to deliver high-resolution, medium-range forecasts. By integrating the Adaptive Fourier Neural Operator (AFNO) into its architecture, FourCastNet efficiently captures complex atmospheric patterns, offering rapid and accurate predictions that rival traditional numerical weather prediction models.

Model Architecture

At the heart of FourCastNet is the Adaptive Fourier Neural Operator (AFNO), a novel architecture designed to efficiently capture global weather patterns across space and time. AFNO extends the Fourier Neural Operator (FNO) by incorporating learnable adaptive filters in the frequency domain, enabling efficient and scalable modeling of complex geophysical phenomena.

The diagram below (from FourCastNet: arXiv:2202.11214) illustrates the complete architecture—from input preprocessing to spectral processing and output generation—along with extensions for fine-tuning and precipitation modeling.

FourCastNet AFNO architecture — Figure: (a) AFNO architecture with encoder–processor–decoder design. (b) Fine-tuning setup; (c) Precipitation model head using a frozen backbone; (d) Inference mode with autoregressive steps. FFT: Fast Fourier Transform, MLP: Multi-Layer Perceptron, IFFT: Inverse FFT.

Traditional convolutional or attention-based models struggle to efficiently model long-range spatial dependencies. In contrast, AFNO operates in the frequency domain via Fast Fourier Transform (FFT), enabling global mixing of spatial information at lower computational cost.

The key steps in AFNO (refer to panel (a) in the figure) are:

Patch & Positional Embedding:

The model ingests 2D meteorological fields (e.g., U₁₀, V₁₀, T₈₅₀, Z₅₀₀) as multi-channel tensors and embeds spatial information via patch-based position encodings.
Fourier Transform (FFT):

Data is projected into the frequency domain using 2D FFT, yielding complex-valued spectral features.
Block Diagonalization:

Spectral channels are grouped into frequency blocks. This design reduces the dimensionality of mixing and enables efficient shared weight application.
Block-wise MLP Filtering:

Each frequency block is passed through a shared MLP to perform adaptive frequency filtering. This is where AFNO differs from standard FNOs—weights are shared across blocks to encourage generalization.
Soft Shrinkage:

A non-linear regularization step reduces the influence of insignificant frequency components, improving robustness.
Inverse FFT (IFFT):

The filtered spectral representation is transformed back to the spatial domain.
Residual Connection:

The output is added to the input to preserve low-frequency structure and ensure stable gradients.

To make the process more concrete, the following pseudocode summarizes the AFNO computation:

def AFNO(x):
    bias = x                              # Residual connection
    x = RFFT2(x)                          # Step 2: Convert to frequency domain
    x = x.reshape(b, h, w//2+1, k, d//k)  # Step 3: Block diagonalization
    x = BlockMLP(x)                       # Step 4: Apply shared MLP
    x = x.reshape(b, h, w//2+1, d)        # Restore shape
    x = SoftShrink(x)                    # Step 5: Regularize
    x = IRFFT2(x)                         # Step 6: Back to spatial domain
    return x + bias                       # Step 7: Residual connection

def BlockMLP(x):
    x = MatMul(x, W_1) + b_1
    x = ReLU(x)
    return MatMul(x, W_2) + b_2

Training and Performance

FourCastNet is trained in two stages—pretraining and fine-tuning—followed by the addition of a precipitation prediction module.

Pretraining: The model first learns to forecast the immediate next atmospheric state based on current inputs. It is trained to minimize the difference (using mean squared error) between the predicted and actual next time step, focusing on single-step accuracy.
Fine-tuning: Next, the model is trained to make forecasts across multiple future steps in sequence. For example, it predicts the next two time steps, compares them to the actual values, and updates its weights to reduce the error across both. This helps the model reduce the buildup of errors over longer forecast sequences.
Precipitation Head: To improve rainfall forecasting, a separate component is added after the main model has been trained. This lightweight module takes the predicted atmospheric state and estimates rainfall, improving precipitation accuracy without changing the main model.

During inference, FourCastNet generates forecasts by recursively feeding each output back as input for the next prediction—starting from an initial observed state and producing a sequence of predictions several days into the future. This autoregressive design allows the model to generate efficient and coherent forecasts up to 7 to 10 days ahead.

FourCastNet was benchmarked against ECMWF’s Integrated Forecasting System (IFS) using:

Anomaly Correlation Coefficient (ACC) – measures how well predicted anomaly patterns match observations.
Root Mean Squared Error (RMSE) – measures average forecast error.

FourCastNet vs IFS Performance — Figure: Forecast skill comparison across IFS (blue), FourCastNet deterministic (red), and FourCastNet ensemble mean (purple). Top row: 500 hPa geopotential height (Z500); Bottom row: 10-meter wind speed (U10). Left: ACC; Right: RMSE. Higher ACC and lower RMSE indicate better performance.

Even though IFS slightly outperforms FourCastNet at long horizons, the ensemble mean version of FourCastNet significantly narrows the gap — highlighting its strong potential for probabilistic forecasting with a small computational footprint.

Extreme Weather Prediction

FourCastNet has demonstrated strong performance in forecasting extreme weather events such as tropical cyclones and heavy rainfall. Its AFNO-based architecture, combined with autoregressive inference, enables the model to capture both large-scale atmospheric flow and localized severe phenomena.

A compelling example is its multi-day forecast of Typhoon Mangkhut (山竹) in 2018. Using open-source tools from HFAI Lab, researchers successfully reproduced the storm’s track and associated moisture fields. The model not only followed the cyclone trajectory with high accuracy, but also captured spiral structures in Total Column Water Vapour (TCWV)—a key indicator of cyclone intensity and rainfall potential.

These results highlight FourCastNet’s potential for fast, high-resolution forecasting in early warning systems and disaster preparedness.

Typhoon Mangkhut wind prediction — *Predicted wind field showing spiral structure of Typhoon Mangkhut*

Typhoon Mangkhut precipitation prediction — *Forecasted precipitation and moisture distribution*

3.2 Pangu-Weather: A ViT-based 3D Weather Foundation Model

Huawei Cloud’s Pangu-Weather marks a major milestone in AI-powered weather forecasting. It is the first publicly released model to outperform traditional numerical weather prediction (NWP) systems—such as the ECMWF’s High-Resolution (HRES)—across the entire range of 1-hour to 7-day forecasts, while achieving speeds over 10,000× faster.

Model Design and Architecture

The strength of Pangu-Weather stems from a dual architectural innovation:

A 3D Earth-Specific Transformer (3DEST) optimized for atmospheric geospatial data.
A Hierarchical Temporal Aggregation strategy that reduces error accumulation in multi-step forecasting.

Combined, these elements allow Pangu-Weather to deliver accurate, stable, high-resolution forecasts at an unmatched speed—outpacing both traditional NWP systems and prior AI baselines.

Earth-Specific Transformer (3DEST)

The 3DEST architecture is built on the insight that atmospheric data, while sharing structural similarities with image data (e.g., multi-channel, spatial continuity), exhibits domain-specific physical properties that demand specialized modeling.

While earlier models like FourCastNet used 2D neural architectures and struggled to model the Earth’s complex, non-uniform 3D atmospheric structure, Pangu-Weather introduces a 3D Vision Transformer tailored for meteorological data.

*3D Earth-Specific Transformer (3DEST) architecture diagram*

Key architectural features:

Compact Encoder–Decoder Design: A lightweight 2-stage architecture with just 8 transformer blocks, balancing accuracy with efficiency.
Sliding Window Attention: Based on the Swin Transformer, this mechanism captures local dependencies while reducing memory usage and FLOPs.
High-Resolution 3D Inputs: Despite optimizations, the model still processes over 3000 GFLOPs, underscoring the computational demands of fine-grained global forecasts.
Earth-Specific Positional Encoding: Unlike standard image models, atmospheric data is tied to physical geolocation (latitude, longitude, altitude). Pangu introduces spatially-aware encodings that reflect:
- Irregular grid spacing in latitude–longitude coordinates
- Latitude-dependent forces (e.g., Coriolis effect)
- Vertical variations in variables like pressure, temperature, and wind

Earth-specific positional bias — Why Earth-specific positional encoding is needed — Left: The latitude–longitude grid on Earth is spatially non-uniform; Center: Geopotential height strongly depends on both latitude and altitude; Right: Wind speed and temperature correlate closely with vertical height.

Hierarchical Temporal Aggregation

Medium-range forecasts (up to 7 days) often require dozens or even hundreds of recursive prediction steps. Traditional autoregressive models suffer from cumulative error amplification in these settings. For example:

FourCastNet uses 6-hour steps, requiring 28 recursive calls for a 7-day forecast.
A 1-hour model would require 168 calls, increasing both training complexity and inference instability.

To resolve this, Pangu-Weather introduces a Hierarchical Multi-Step Forecasting approach:

Trains four specialized models for predicting intervals of 1h, 3h, 6h, and 24h.
During inference, a greedy scheduling algorithm selects the most efficient sequence of steps to minimize recursive depth.
Example:
- 24h forecast → 1 call to 24h model
- 23h forecast → 3 × 6h + 1 × 3h + 2 × 1h

7-day forecast accuracy comparison — *Comparison of 7-day forecast accuracy across different baseline models.*

Benefits of this design:

Minimizes error accumulation across time steps.
Allows for efficient training via single-timestep supervision.
Reduces GPU memory requirements, improving training stability versus multi-timestep strategies used in models like FourCastNet.

This design enables long-range AI forecasting to be more scalable, accurate, and efficient.