SPDE Application · Part 2

Spatial Temperature Modeling with SPDE

Build a spatial temperature model using SPDE (Stochastic Partial Differential Equations) with INLA. Fit the model with elevation as a covariate, visualize spatial effects, and generate predictions with uncertainty quantification.

By Esmail Abdul Fattah and Elias Krainski

← Part 1: Collecting Temperature Data

The Statistical Model

We fit a spatial model for January 2024 average temperature:

$$T(s) = \beta_0 + \beta_1 \cdot \frac{\text{elevation}(s)}{1000} + u(s) + \varepsilon$$

Where:

1. Setup and Imports

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.tri import Triangulation
from scipy import stats
from shapely.geometry import Point
from shapely.ops import unary_union

# pyINLA imports
from pyinla import pyinla
from pyinla.fmesher import (
    fm_mesh_2d,
    fm_hexagon_lattice,
    spde2_pcmatern,
    spde_make_A,
    spde_grid_projector
)

2. Coordinate Reference System

For spatial modeling, we need projected coordinates (kilometers) rather than geographic coordinates (degrees):

# Coordinate Reference Systems
CRS_LONGLAT = "EPSG:4326"  # Standard WGS84 (degrees)
CRS_KM = "+proj=utm +zone=40 +ellps=intl +towgs84=-143,-236,7,0,0,0,0 +units=km +no_defs +type=crs"

# Study region
COUNTRIES = ["Lebanon", "Syria", "Jordan", "Saudi Arabia"]

This is essential because:

3. Load Temperature Data

We load the monthly temperature data from Part 1:

# Load temperature data from Part 1
tavg = pd.read_csv('tavg_month.csv')
print(f"Temperature data loaded: {tavg.shape[0]} stations")

# First 2 rows:
#        station  latitude  longitude  elevation  X202401  X202402  ...
# 0  BA000041150   26.2670      50.65        2.0    20.71    19.93  ...
# 1  IS000004640   32.9667      35.50      934.0     9.05     8.74  ...
Temperature data loaded: 55 stations

Load country boundaries and project to the km-based CRS:

# Load country boundaries
NE_URL = "https://naciscdn.org/naturalearth/50m/cultural/ne_50m_admin_0_countries.zip"
worldMapll = gpd.read_file(NE_URL)
map_ll = worldMapll[worldMapll["ADMIN"].isin(COUNTRIES)].copy()
map2km = map_ll.to_crs(CRS_KM)
# Create station locations in km CRS
geometry = [Point(lon, lat) for lon, lat in zip(tavg["longitude"], tavg["latitude"])]
tavg_gdf = gpd.GeoDataFrame(tavg, geometry=geometry, crs=CRS_LONGLAT)
tavg_km = tavg_gdf.to_crs(CRS_KM)
coords_km = np.column_stack([tavg_km.geometry.x, tavg_km.geometry.y])

# Plot stations: color = temperature, size = elevation
fig, ax = plt.subplots(figsize=(10, 8))
map2km.boundary.plot(ax=ax, color='black', linewidth=1.5)
sc = ax.scatter(
    coords_km[:, 0], coords_km[:, 1],
    c=tavg['X202401'], cmap='RdYlBu_r',
    s=tavg['elevation'] / 20 + 20,
    edgecolors='black', linewidth=0.5, zorder=5
)
plt.colorbar(sc, ax=ax, shrink=0.7, label='January 2024 mean temperature (°C)')
ax.set_title('Weather stations: color = temperature, size = elevation')
ax.set_xlabel('Easting (km)')
ax.set_ylabel('Northing (km)')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()
Weather stations colored by temperature with size by elevation
Weather stations: color indicates January temperature, size indicates elevation

4. Check Elevation Effect

Before fitting the full spatial model, we verify that elevation affects temperature using simple linear regression:

valid = ~np.isnan(tavg['X202401']) & ~np.isnan(tavg['elevation'])
slope, intercept, r, p, se = stats.linregress(
    tavg.loc[valid, 'elevation'], tavg.loc[valid, 'X202401']
)

print(f"Slope (per 1000m): {slope*1000:.2f} °C/km")
print(f"R-squared: {r**2:.4f}")
Slope (per 1000m): -2.28 °C/km
R-squared: 0.0518

What This Tells Us

5. Create the Triangular Mesh

The SPDE approach requires a triangular mesh that covers the study region. The mesh serves as the computational domain for the spatial random field.

Mesh Construction Steps

  1. Buffer the boundary by 100 km (to avoid edge effects)
  2. Generate hexagonal lattice points (edge length = 50 km)
  3. Build triangular mesh with specified resolution
# Create unified boundary and buffer by 100 km
boundary = unary_union(map2km.geometry)
boundary_buffered = boundary.buffer(100)

# Extract boundary coordinates for hexagonal lattice
if boundary_buffered.geom_type == 'MultiPolygon':
    largest = max(boundary_buffered.geoms, key=lambda p: p.area)
    boundary_coords = np.array(largest.exterior.coords)
else:
    boundary_coords = np.array(boundary_buffered.exterior.coords)

# Create hexagonal lattice points
hexpoints = fm_hexagon_lattice(boundary_coords, edge_len=50)
print(f"Hexagonal lattice: {len(hexpoints)} points")

# Build the triangular mesh
mesh = fm_mesh_2d(
    loc=hexpoints,
    offset=500,     # Extend 500 km beyond boundary
    max_edge=200    # Maximum triangle edge: 200 km
)

print(f"Mesh: {mesh.n} vertices, {mesh.n_triangle} triangles")
Mesh: 1,792 vertices, 3,499 triangles
# Plot mesh using built-in method
fig, ax = mesh.plot(locs=coords_km, title='SPDE Mesh', show=False)
map2km.boundary.plot(ax=ax, color='black', linewidth=2)
plt.tight_layout()
plt.show()
Triangular mesh for SPDE spatial modeling
Triangular mesh (1,792 vertices, 3,499 triangles) for the SPDE spatial model

6. Define the SPDE Model

The SPDE model uses Penalized Complexity (PC) priors that shrink toward a base model:

PriorSpecificationInterpretation
RangeP(range < 30 km) = 0.05We believe spatial correlation extends beyond 30 km
SigmaP(sigma > 1 degrees C) = 0.05We believe spatial variation is typically less than 1 degrees C
# Define SPDE model with PC priors
spdeModel = spde2_pcmatern(
    mesh=mesh,
    prior_range=(30, 0.05),   # P(range < 30 km) = 0.05
    prior_sigma=(1, 0.05)     # P(sigma > 1°C) = 0.05
)

# Create projector matrix from mesh to observation locations
Amap = spde_make_A(mesh=mesh, loc=coords_km)
print(f"Projector matrix A: {Amap.shape}")
Projector matrix A: (55, 1792)

7. Fit the Model with pyINLA

# Prepare data for INLA
n_obs = len(tavg)
dataf = pd.DataFrame({
    'X202401': tavg['X202401'].values,
    'elevation_km': tavg['elevation'].values / 1000,
    'spatial': np.arange(n_obs)
})

# Define the model
model = {
    'response': 'X202401',
    'fixed': ['1', 'elevation_km'],
    'random': [{'id': 'spatial', 'model': spdeModel, 'A.local': Amap}]
}

# Fit the model
sfit = pyinla(model=model, family='gaussian', data=dataf, verbose=False)

8. Model Results

print(sfit.summary_fixed)
print(sfit.summary_hyperpar)
                   mean        sd  0.025quant   0.5quant  0.975quant       mode
(Intercept)   20.124107  3.872591   12.099176  20.123795   28.143617  20.123180
elevation_km  -5.738761  0.370071   -6.466339  -5.739090   -5.009235  -5.739068

                                                mean          sd   0.025quant     0.5quant   0.975quant         mode
Precision for the Gaussian observations     1.262697    0.335082     0.729446     1.220725     2.038849     1.141178
Range for spatial                        2957.504758  901.562679  1570.932977  2829.684896  5085.210525  2590.809299
Stdev for spatial                           2.801823    0.479452     1.981045     2.760177     3.863091     2.676366

Fixed Effects

ParameterMeanSD95% CIInterpretation
Intercept ($\beta_0$)20.123.87[12.10, 28.14]Baseline temperature at sea level (degrees C)
Elevation ($\beta_1$)-5.740.37[-6.47, -5.01]Temperature drops ~5.7 degrees C per 1000m

The elevation coefficient is close to the environmental lapse rate (~6.5 degrees C/km).

Hyperparameters

ParameterMeanSD95% CIDescription
Precision1.260.34[0.73, 2.04]Observation noise (higher = less noise)
Range2,958 km902 km[1,571, 5,085]Practical correlation range
Stdev2.800.48[1.98, 3.86]Marginal SD of spatial field (degrees C)

Interpreting the Results

Range. The range represents the distance beyond which spatial correlation is essentially zero. Our estimated range is ~2,958 km, meaning stations farther apart than this are essentially independent.

Marginal standard deviation. The marginal standard deviation ($\sigma = 2.80$ degrees C) represents the spatially structured variation remaining after accounting for the fixed effects. Two locations at the same elevation could still differ in temperature by roughly $\pm 2.80$ degrees C due to their spatial position alone, driven by factors not captured by elevation (proximity to the coast, local climate patterns, desert conditions).

Precision for Gaussian observations. The precision ($\approx 1.26$) is the inverse of the variance of the residual noise. Converting gives a noise standard deviation of approximately $1/\sqrt{1.26} \approx 0.89$ degrees C, representing the unexplained variation remaining after accounting for both the fixed effects and the spatial random field (measurement noise or micro-scale local variation).

Spatial effect vs. noise. The spatial effect (2.80 degrees C) is much larger than the noise (0.89 degrees C), meaning the model captures most of the spatial structure in the data, leaving only a small residual. This is consistent with the high $R^2 \approx 0.97$.

PC priors. We control the tails of the prior distributions. For the range, we set $P(\text{range} < 30\text{ km}) = 0.05$, encoding our belief that the range is much larger than 30 km. For $\sigma$, we control the upper tail to prevent unrealistically large variance.

Mesh edge length must be smaller than the expected range. If the range is smaller than the triangle edge, neighboring mesh nodes become independent, meaning the mesh is too coarse to capture the spatial structure. The PC prior value for the range (30 km) is chosen to be slightly smaller than the mesh edge length (50 km) as a reference point. With the mesh edge at 50 km and the estimated range at ~3,000 km, we have a very fine mesh relative to the spatial correlation structure. We could increase the edge length to 100 km for faster computation without expecting much effect on the results, since the range is still far larger than the mesh resolution.

Fixed effect coefficients change with a spatial random effect. In our case, the elevation coefficient shifted from $-2.2$ to $-5.74$ degrees C per km, much closer to the known physical environmental lapse rate of ~6.5 degrees C/km. The spatial effect absorbed confounded variation and allowed the fixed effect to better reflect the true physical relationship.

Adapting mesh resolution to the variable. Temperature has a long range, so a coarser mesh may suffice. However, for rainfall we need to keep a fine mesh (50 km or even finer) because rainfall exhibits much more localized spatial variation: it can rain at only one station while surrounding stations remain dry, so the spatial correlation is much shorter and the mesh must be detailed enough to capture these rapid changes.

9. Model Fit

# Plot fitted vs observed
fitted_mean = sfit.summary_fitted_values['mean'].values
observed = tavg['X202401'].values
valid = ~np.isnan(observed)
r_sq = np.corrcoef(fitted_mean[valid], observed[valid])[0, 1] ** 2
print(f"R-squared: {r_sq:.4f}")
R-squared: 0.9788

The $R^2 = 0.9788$ indicates the model (intercept + elevation + spatial random effect) explains nearly all of the observed temperature variation.

fig, ax = plt.subplots(figsize=(7, 7))
ax.scatter(fitted_mean[valid], observed[valid], marker='o', s=60,
           c='#3498db', alpha=0.7)
lims = [min(fitted_mean[valid].min(), observed[valid].min()) - 1,
        max(fitted_mean[valid].max(), observed[valid].max()) + 1]
ax.plot(lims, lims, 'r--', linewidth=2, label='Perfect fit')
ax.set_xlabel('Fitted (posterior mean, °C)')
ax.set_ylabel('Observed (°C)')
ax.set_title(f'Model fit: R² = {r_sq:.4f}')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Observed vs fitted temperature values
Model fit: Observed vs. fitted values ($R^2 = 0.9788$)

10. Create the Grid Projector

To visualize spatial fields on a regular grid, we create a grid projector that maps from mesh vertices to a 200 x 200 pixel grid covering the study region:

# Create grid projector for visualization
bb = map2km.total_bounds  # [minx, miny, maxx, maxy]
bb0_x = (bb[0], bb[2])
bb0_y = (bb[1], bb[3])

projGrid = spde_grid_projector(
    mesh=mesh,
    xlim=bb0_x,
    ylim=bb0_y,
    dims=(200, 200)
)
print(f"Grid projector: {projGrid.dims[0]}x{projGrid.dims[1]} = {projGrid.n_grid} points")
print(f"X range: {bb0_x[0]:.0f} to {bb0_x[1]:.0f} km")
print(f"Y range: {bb0_y[0]:.0f} to {bb0_y[1]:.0f} km")
Grid projector: 200x200 = 40000 points
X range: -1730 to 360 km
Y range: 1864 to 4272 km

11. Visualize the Spatial Effect

The spatial random effect $u(s)$ captures geographic patterns not explained by elevation:

# Find grid points outside the country boundary
bnd = unary_union(map2km.geometry)
sfGrid = gpd.GeoDataFrame(
    geometry=gpd.points_from_xy(projGrid.lattice[:, 0], projGrid.lattice[:, 1]),
    crs=CRS_KM
)
igrid_out = ~sfGrid.within(bnd)

# Project spatial effect to grid
spatial_mean = sfit.summary_random['spatial']['mean'].values
gridSmean = projGrid.project(spatial_mean, mask_outside=False)
gridSmean[igrid_out.values.reshape(projGrid.dims[1], projGrid.dims[0])] = np.nan

# Plot spatial effect
fig, ax = plt.subplots(figsize=(10, 8))
im = ax.imshow(
    gridSmean, origin='lower',
    extent=[bb0_x[0], bb0_x[1], bb0_y[0], bb0_y[1]],
    cmap='RdBu_r', aspect='equal'
)
map2km.boundary.plot(ax=ax, color='black', linewidth=1.5)
plt.colorbar(im, ax=ax, shrink=0.7, label='Spatial effect (°C)')
ax.set_title('Spatial random effect (posterior mean)')
ax.set_xlabel('Easting (km)')
ax.set_ylabel('Northing (km)')
plt.tight_layout()
plt.show()

What This Tells Us

This captures regional climate patterns like coastal effects, continental effects, and local topographic effects.

Spatial random effect showing temperature anomalies
Spatial random effect (posterior mean): temperature anomalies after accounting for elevation

12. Load Elevation Data (ETOPO)

To make predictions across the entire region, we need elevation at every grid point. We use ETOPO 2022 from NOAA via OpenDAP:

import xarray as xr

# Transform grid points back to lon/lat for ETOPO lookup
sfGrid_ll = sfGrid.to_crs(CRS_LONGLAT)
glocs = np.column_stack([sfGrid_ll.geometry.x, sfGrid_ll.geometry.y])

# Get bounding box in lon/lat (with some padding)
lon_min, lon_max = glocs[:, 0].min() - 1, glocs[:, 0].max() + 1
lat_min, lat_max = glocs[:, 1].min() - 1, glocs[:, 1].max() + 1

# Load ETOPO elevation data via OpenDAP
ETOPO_URL = "https://www.ngdc.noaa.gov/thredds/dodsC/global/ETOPO2022/60s/60s_surface_elev_netcdf/ETOPO_2022_v1_60s_N90W180_surface.nc"

ds = xr.open_dataset(ETOPO_URL, engine='netcdf4')
etopo = ds['z'].sel(
    lon=slice(lon_min, lon_max),
    lat=slice(lat_min, lat_max)
)

# Interpolate elevation at each grid point location
etopoll = etopo.interp(
    lon=xr.DataArray(glocs[:, 0], dims='points'),
    lat=xr.DataArray(glocs[:, 1], dims='points'),
    method='nearest'
).values

# Mask grid points outside the boundary
etopoll[igrid_out.values] = np.nan

ETOPO provides:

13. Predictions with Uncertainty

A key advantage of Bayesian spatial models is uncertainty quantification. We append NA-response rows for each grid point so INLA predicts at those locations, then re-evaluate at the fitted mode (fast, no re-optimization):

from scipy import sparse

n_grid = projGrid.n_grid

# Prediction data: NA response so it will be predicted
preddf = pd.DataFrame({
    'X202401': np.full(n_grid, np.nan),
    'elevation_km': etopoll / 1000,
    'spatial': np.arange(n_obs, n_obs + n_grid)
})

# Combine observation + prediction data
data_combined = pd.concat([dataf, preddf], ignore_index=True)

# Combined projector matrix (observations + grid)
A_combined = sparse.vstack([Amap, projGrid.proj])

# Model with combined A matrix
model_pred = {
    'response': 'X202401',
    'fixed': ['1', 'elevation_km'],
    'random': [{'id': 'spatial', 'model': spdeModel, 'A.local': A_combined}]
}

# Re-fit at the fitted mode (fast, no optimization)
sfit_pred = pyinla(
    model=model_pred,
    family='gaussian',
    data=data_combined,
    control={'mode': {'theta': sfit.mode['theta'], 'restart': False}},
    verbose=False
)

# Extract predictions for the grid points
i_pred = slice(n_obs, n_obs + n_grid)
pred_mean = sfit_pred.summary_fitted_values['mean'].values[i_pred]
pred_sd = sfit_pred.summary_fitted_values['sd'].values[i_pred]

# Reshape to grid and mask outside boundary
igrid_out_2d = igrid_out.values.reshape(projGrid.dims[1], projGrid.dims[0])

pred_grid = pred_mean.reshape(projGrid.dims[1], projGrid.dims[0])
pred_grid[igrid_out_2d] = np.nan

sd_pred_grid = pred_sd.reshape(projGrid.dims[1], projGrid.dims[0])
sd_pred_grid[igrid_out_2d] = np.nan

14. Final Results

The model produces two key outputs:

Predicted Temperature (Posterior Mean)

Prediction Uncertainty (Posterior SD)

This uncertainty map is valuable for planning where additional monitoring would be most beneficial.

# Plot prediction and uncertainty side by side
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Left: Predicted temperature
ax = axes[0]
im1 = ax.imshow(
    pred_grid, origin='lower',
    extent=[bb0_x[0], bb0_x[1], bb0_y[0], bb0_y[1]],
    cmap='RdYlBu_r', aspect='equal'
)
map2km.boundary.plot(ax=ax, color='black', linewidth=1.5)
plt.colorbar(im1, ax=ax, shrink=0.7, label='Temperature (°C)')
ax.set_title('(a) Predicted temperature (posterior mean)')
ax.set_xlabel('Easting (km)')
ax.set_ylabel('Northing (km)')

# Right: Uncertainty with stations
ax = axes[1]
im2 = ax.imshow(
    sd_pred_grid, origin='lower',
    extent=[bb0_x[0], bb0_x[1], bb0_y[0], bb0_y[1]],
    cmap='YlOrRd', aspect='equal'
)
map2km.boundary.plot(ax=ax, color='black', linewidth=1.5)
ax.scatter(coords_km[:, 0], coords_km[:, 1],
           marker='o', s=40, c='black', zorder=5, label='Weather stations')
plt.colorbar(im2, ax=ax, shrink=0.7, label='SD (°C)')
ax.set_title('(b) Prediction uncertainty')
ax.set_xlabel('Easting (km)')
ax.set_ylabel('Northing (km)')
ax.legend(loc='upper right', fontsize=9)

plt.tight_layout()
plt.show()
Temperature prediction with uncertainty quantification
SPDE Spatial Temperature Model: Predicted temperature (left) and prediction uncertainty (right)

A Note on High Uncertainty in the Southeast

Notice the elevated prediction uncertainty (SD > 1 degrees C) in the southeastern part of Saudi Arabia. This is a classic boundary effect in spatial modeling: predictions become less reliable at the edges of the study region where:

  1. No nearby stations exist to anchor the spatial field
  2. The model extrapolates rather than interpolates
  3. The mesh boundary influences the spatial random effect

This high uncertainty is not a bug. It is the model honestly reporting that predictions in this area are less reliable.

Challenges for the Reader

Challenge 1: Model Summer Temperatures

This application modeled January (X202401), a winter month. How do temperature patterns change in summer?

Try modeling July (X202407):

  1. Change the response variable from 'X202401' to 'X202407'
  2. Re-fit the model and compare results
  3. Questions to explore:
    • How does the intercept ($\beta_0$) change? (Expect much higher!)
    • Does the elevation effect ($\beta_1$) change?
    • How do spatial patterns differ between winter and summer?
    • Where are the hottest predicted temperatures?

Challenge 2: Reduce Boundary Uncertainty

Can you reduce the prediction uncertainty in southeastern Saudi Arabia?

Hint: Add weather stations from Oman to your dataset. Oman borders Saudi Arabia to the southeast and has GHCN stations with TAVG data.

Steps:

  1. Go back to Part 1 and modify COUNTRIES to include "Oman"
  2. Re-run the data collection to get stations from Oman
  3. Re-fit the spatial model with the expanded dataset
  4. Compare the uncertainty maps: the southeastern boundary should improve!

This exercise demonstrates a key principle: spatial models benefit from data beyond the prediction boundary.

Summary

ComponentValue
Study RegionLebanon, Syria, Jordan, Saudi Arabia
Stations55
ResponseX202401 (January 2024 mean temperature)
Mesh1,792 vertices, 3,499 triangles
Intercept ($\beta_0$)20.12 degrees C (sea level temperature)
Elevation ($\beta_1$)-5.74 degrees C per 1000m
Model R-squared0.9788
Prediction Grid200 x 200 = 40,000 points
Temperature Range-2.5 to 27.7 degrees C
Uncertainty Range0.38 to 1.46 degrees C

Key Takeaways

  1. SPDE provides efficient spatial modeling: Matern fields via triangular mesh
  2. PC priors encode prior beliefs: Range and sigma specifications
  3. The projector matrix A connects mesh to observations
  4. Elevation explains much variation: $\beta_1 \approx$ -5.7 degrees C/km matches lapse rate
  5. Spatial effect captures residual patterns: Coastal vs. continental effects
  6. Uncertainty quantification is automatic: Lower near stations, higher elsewhere
  7. pyINLA syntax is straightforward: Dict-based model specification

References