The Big Idea
The SPDE approach lets you model continuous spatial processes efficiently by connecting the Matérn covariance to a sparse precision matrix.
Traditional approach: Dense covariance matrix \(\Sigma\) → \(O(n^3)\) computation
SPDE approach: Sparse precision matrix \(\mathbf{Q}\) → \(O(n^{3/2})\) computation
This is possible because a Gaussian field with Matérn covariance is the solution to a specific SPDE, and the Finite Element Method (FEM) gives us a GMRF approximation with a sparse precision matrix.
The Matérn Covariance
The Matérn covariance is the most widely used covariance function in spatial statistics. It describes how the similarity between observations decreases with distance.
Correlation Function
For two locations \(\mathbf{s}_i\) and \(\mathbf{s}_j\), the Matérn correlation is:
Covariance Function
The covariance is the correlation scaled by the marginal variance \(\sigma^2\):
The Parameters
| Parameter | Symbol | Interpretation |
|---|---|---|
| Marginal variance | \(\sigma^2\) | The variance of the field at any single location. Controls the "amplitude" of spatial variation. |
| Scale | \(\kappa > 0\) | Controls how quickly correlation decays with distance. Larger \(\kappa\) means faster decay (shorter range). |
| Smoothness | \(\nu > 0\) | Controls the differentiability of the field. The field is \(\lceil \nu \rceil - 1\) times differentiable. Larger \(\nu\) gives smoother surfaces. |
When distance is zero, the Bessel function term equals 1, so \(\text{Cor} = 1\) and \(\text{Cov} = \sigma^2\).
Understanding Smoothness (\(\nu\))
The smoothness parameter has a direct physical interpretation:
- \(\nu = 0.5\): Exponential covariance. Rough, non-differentiable surfaces (like Brownian motion).
- \(\nu = 1\): Once differentiable. Moderately smooth surfaces. Default in INLA.
- \(\nu = 1.5\): 1.5 times differentiable. Smoother surfaces.
- \(\nu = 2\): Twice differentiable. Very smooth surfaces.
- \(\nu \to \infty\): Gaussian (squared exponential) covariance. Infinitely smooth.
Practical Range
The scale parameter \(\kappa\) is hard to interpret directly. Instead, we use the practical range, which is the distance at which correlation drops to approximately 0.14 (i.e., \(e^{-1} \approx 0.13\)):
Interpretation: If the practical range is 50 km, then two locations 50 km apart have correlation ~0.14. Beyond this distance, observations are nearly uncorrelated.
Marginal Standard Deviation (\(\sigma\))
The marginal standard deviation \(\sigma\) controls how much the spatial field varies. For the SPDE parameterization:
where \(\tau\) is an internal precision parameter. In 2D with \(\nu=1\), this simplifies to \(\sigma = 1/(\tau \sqrt{4\pi} \kappa)\).
Interpretation: If \(\sigma = 2\), then at any location, the spatial effect has standard deviation 2 (in the units of your response variable).
The SPDE-Matérn Link
Why This Matters
Computing Matérn covariance directly requires building and inverting large dense matrices, which becomes computationally infeasible for thousands of locations. The SPDE approach provides a shortcut: instead of working with the covariance matrix, we work with a sparse precision matrix that is much faster to compute.
The Key Insight (No Differential Equations Required)
Think of a Matérn random field as a smooth, spatially correlated surface. Lindgren et al. (2011) discovered that this surface can be created through a simple recipe:
- Start with random noise: Imagine sprinkling random values across your spatial domain, each location getting an independent random value.
- Smooth the noise: Apply a smoothing operation that blends nearby values together. The amount of smoothing determines how correlated nearby locations become.
- The result: After smoothing, you get exactly a Matérn covariance structure.
The Mathematical Statement
For those interested, the formal result is that a Gaussian field with Matérn covariance is the solution to this stochastic partial differential equation (SPDE):
The goal: We want to find \(u(\mathbf{s})\), the spatial random field. This is the unknown we're solving for. The SPDE tells us that \(u(\mathbf{s})\) (the result of smoothing white noise) has exactly Matérn covariance structure.
Understanding Each Part
| Symbol | What It Means | Intuition |
|---|---|---|
| \(\mathcal{W}(\mathbf{s})\) | Spatial white noise | The random starting values at each location (completely uncorrelated) |
| \(\kappa^2 - \Delta\) | Smoothing operator | Blends values with neighbors. Higher \(\kappa\) means less smoothing (shorter range) |
| \(\alpha/2\) | Power of smoothing | How many times to apply the smoothing. More smoothing = smoother result |
| \(u(\mathbf{s})\) | The spatial field | The smooth, correlated surface we want (has Matérn covariance) |
Connection to Matérn Parameters
The smoothing parameters connect directly to the Matérn covariance parameters:
- \(\alpha = \nu + d/2\) where \(d\) is the spatial dimension (usually 2)
- Higher \(\alpha\) means more smoothing passes, giving smoother fields (higher \(\nu\))
- The scale parameter \(\kappa\) controls the correlation range
Summary: What You Need to Remember
You don't need to understand the differential equation. Just remember:
- The SPDE approach is a computational trick to get Matérn covariance efficiently
- It uses the mesh to represent spatial smoothing with sparse matrices
- This makes fitting models with thousands of locations fast and practical
- The results are statistically equivalent to fitting Matérn covariance directly
From SPDE to FEM: The Bridge
The Problem
The SPDE defines \(u(\mathbf{s})\), the spatial field we want to find. But it's defined continuously over all of space: we'd need to know \(u(\mathbf{s})\) at infinitely many points. That's impossible to compute.
The Solution: Discretize with a Mesh
This is where the triangular mesh comes in. Instead of finding \(u(\mathbf{s})\) everywhere, we approximate it using values only at mesh vertices:
- Cover the domain with triangles: The mesh divides your spatial region into triangles.
- Estimate \(u\) at vertices only: We find \(u\) at each of the \(m\) mesh vertices (call these values \(w_1, w_2, \ldots, w_m\)).
- Interpolate between vertices: Within each triangle, \(u(\mathbf{s})\) is linearly interpolated from the three corner values (like stretching a rubber sheet over the vertices).
Why This Works
The SPDE says "smooth this noise to get \(u(\mathbf{s})\)" continuously. The FEM approximation says "smooth this noise, but only track values at mesh vertices, and interpolate in between." As the mesh gets finer (more vertices), the approximation gets closer to the true continuous solution.
FEM Matrices and Precision
Approximating the Spatial Field with Basis Functions
Remember: our goal is to find \(u(\mathbf{s})\), the spatial field from the SPDE. FEM approximates this continuous field as a weighted sum of basis functions defined on the triangular mesh:
Where:
- \(\psi_k(\mathbf{s})\): Basis functions (one per mesh vertex), defined by the triangles
- \(w_k\): Random weights at each vertex. These are what we estimate.
- \(m\): Number of mesh vertices
So instead of finding \(u(\mathbf{s})\) at every point in space, we only need to estimate the \(m\) weights \(\{w_1, w_2, \ldots, w_m\}\). The basis functions handle interpolation everywhere else.
What Are Tent Functions?
Think of the mesh as lying flat on the ground (the x-y plane). Each basis function \(\psi_k\) adds a third dimension (height) that rises above the mesh:
For each vertex \(k\), the tent function \(\psi_k\) has these properties:
- Height = 1 directly above vertex \(k\) (the peak of the tent)
- Height = 0 at all neighboring vertices (the tent touches the ground there)
- Slopes linearly from height 1 down to height 0 within the triangles touching vertex \(k\)
- Height = 0 everywhere outside those triangles
Side View: How the Tent Rises Above the Mesh
The mesh is flat (on the x-axis). The tent function ψ₂ rises to height 1 at vertex v₂, and slopes down to height 0 at neighboring vertices v₁ and v₃.
3D View: Single Tent on a Triangle:
The tent function equals 1 at its vertex and decreases linearly to 0 at neighboring vertices.
All Three Basis Functions:
Each vertex has its own tent. Inside the triangle, all three overlap (sum to 1).
- At the center: ψ₁ ≈ 0.33, ψ₂ ≈ 0.33, ψ₃ ≈ 0.33 (equidistant from all vertices)
- Near vertex 1: ψ₁ ≈ 0.8, ψ₂ ≈ 0.1, ψ₃ ≈ 0.1 (closer to vertex 1)
- On edge 1-2: ψ₃ = 0, and ψ₁ + ψ₂ = 1 (on the edge opposite to vertex 3)
- Always: ψ₁ + ψ₂ + ψ₃ = 1, no matter where!
Why do they always sum to 1?
This is not a coincidence - it's guaranteed by construction! The basis functions are defined using barycentric coordinates.
The idea: Any point P inside a triangle can be written as a weighted average of the three vertices:
The weights \(\lambda_1, \lambda_2, \lambda_3\) are called barycentric coordinates. They tell you "how much" of each vertex contributes to point P.
The key: The basis functions ARE the barycentric coordinates!
So by definition: \(\psi_1(P) + \psi_2(P) + \psi_3(P) = \lambda_1 + \lambda_2 + \lambda_3 = 1\). This property is called "partition of unity".
Check it at different locations:
| Location | ψ₁ | ψ₂ | ψ₃ | Sum |
|---|---|---|---|---|
| At vertex 1 | 1 | 0 | 0 | 1 |
| At vertex 2 | 0 | 1 | 0 | 1 |
| At vertex 3 | 0 | 0 | 1 | 1 |
| Midpoint of edge 1-2 | 0.5 | 0.5 | 0 | 1 |
| Center of triangle | 0.33 | 0.33 | 0.33 | 1 |
| Any point inside | λ₁ | λ₂ | λ₃ | 1 |
Weighted Sum Example:
Why different sizes? The basis functions ψ_k all have the same shape (peak at 1), but we multiply by weights w_k. So w₁·ψ₁ peaks at w₁, not at 1. Bigger weight = taller tent!
Result: \(u(\mathbf{s}) = 2 \cdot \psi_1 + 1 \cdot \psi_2 + 3 \cdot \psi_3\). At each vertex, \(u\) equals the weight. Inside the triangle, values are linearly interpolated.
When you sum all these tent functions (each multiplied by its weight \(w_k\)), you get a piecewise linear surface that passes through the weight values at each vertex.
Plugging Back into the SPDE
Here's the key step: we substitute our approximation \(u(\mathbf{s}) \approx \sum_k \psi_k(\mathbf{s}) w_k\) back into the SPDE equation. This converts the continuous differential equation into a discrete system of equations for the weights \(\{w_1, \ldots, w_m\}\).
The process (called the Galerkin method) involves:
- Substitute the basis function approximation into the SPDE
- Multiply by each basis function \(\psi_i\) and integrate over the domain
- This produces integrals of products of basis functions
The Connection: How u(s) leads to C, G, and Q
Step 1: Start with the SPDE (simplified for α=1):
Step 2: Substitute \(u(\mathbf{s}) \approx \sum_k w_k \psi_k(\mathbf{s})\):
Step 3: Multiply by \(\psi_i\) and integrate (Galerkin method):
Step 4: In matrix form, this becomes:
Result: The weights follow \(\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})\) where \(\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}\). The FEM matrices C and G come directly from integrating products of basis functions!
The FEM Matrices
From plugging the basis functions into the SPDE, we get two matrices:
| Matrix | Definition | Intuition |
|---|---|---|
| C (Mass) | \(C_{ij} = \int \psi_i(\mathbf{s}) \psi_j(\mathbf{s}) \, d\mathbf{s}\) | How much the basis functions overlap. Mostly diagonal (each vertex has an "area of influence"). |
| G (Stiffness) | \(G_{ij} = \int \nabla\psi_i \cdot \nabla\psi_j \, d\mathbf{s}\) | How the gradients (slopes) of neighboring basis functions interact. Encodes mesh connectivity. |
Key point: Both C and G are sparse because each basis function \(\psi_k\) is only non-zero near vertex \(k\). Two basis functions only interact if their vertices share a triangle.
The Precision Matrix
The precision matrix \(\mathbf{Q}\) is built from C and G. The exact formula depends on the smoothness parameter α you choose:
| If you choose... | Formula for Q | Field smoothness |
|---|---|---|
| α = 1 (rare) | \(\mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G}\) | Rough (ν = 0) |
| α = 2 (default) | \(\mathbf{Q} = \kappa^4 \mathbf{C} + 2\kappa^2 \mathbf{G} + \mathbf{G}\mathbf{C}^{-1}\mathbf{G}\) | Smooth (ν = 1) |
Why different formulas? Higher α means more smoothing, which requires a more complex formula. Think of it like applying a blur filter: α = 1 is one pass, α = 2 is two passes (smoother result). The formula for α = 2 is essentially "apply the α = 1 formula twice."
Worked Example: A Single Triangle
Let's work through a complete example with the simplest possible mesh: 3 vertices forming one triangle.
Step 1: The Mesh
Consider a triangle with vertices at:
- Vertex 1: \((0, 0)\)
- Vertex 2: \((1, 0)\)
- Vertex 3: \((0, 1)\)
This is a right triangle with area = 0.5.
Step 2: The Basis Functions
Each vertex gets a tent function. For this triangle:
- \(\psi_1(\mathbf{s})\): equals 1 at vertex 1, linearly decreases to 0 at vertices 2 and 3
- \(\psi_2(\mathbf{s})\): equals 1 at vertex 2, linearly decreases to 0 at vertices 1 and 3
- \(\psi_3(\mathbf{s})\): equals 1 at vertex 3, linearly decreases to 0 at vertices 1 and 2
At any point inside the triangle: \(\psi_1 + \psi_2 + \psi_3 = 1\) (they form a partition of unity).
Step 3: The Spatial Field Approximation
We approximate \(u(\mathbf{s})\) as:
Where \(w_1, w_2, w_3\) are the random weights we estimate. At vertex 1, \(u = w_1\). At vertex 2, \(u = w_2\). Inside the triangle, \(u\) is linearly interpolated.
Step 4: Computing the Mass Matrix C
Deriving the Matrix Equations (Galerkin Method)
Step 1: Start with the SPDE (for α=1):
Step 2: Substitute our approximation \(u(\mathbf{s}) \approx w_1\psi_1 + w_2\psi_2 + w_3\psi_3\):
Step 3: Multiply both sides by \(\psi_1\) and integrate over the domain:
Step 4: Expand (using integration by parts for the Δ term):
Step 5: Define the matrix entries:
Step 6: The equation becomes:
The Complete System: 3 Equations
We repeat this process for \(\psi_2\) and \(\psi_3\) to get 3 equations:
Equation 1 (multiply by ψ₁): \(w_1(\kappa^2 C_{11} + G_{11}) + w_2(\kappa^2 C_{12} + G_{12}) + w_3(\kappa^2 C_{13} + G_{13}) = \epsilon_1\)
Equation 2 (multiply by ψ₂): \(w_1(\kappa^2 C_{21} + G_{21}) + w_2(\kappa^2 C_{22} + G_{22}) + w_3(\kappa^2 C_{23} + G_{23}) = \epsilon_2\)
Equation 3 (multiply by ψ₃): \(w_1(\kappa^2 C_{31} + G_{31}) + w_2(\kappa^2 C_{32} + G_{32}) + w_3(\kappa^2 C_{33} + G_{33}) = \epsilon_3\)
In matrix form:
Showing C and G separately:
Where: \(C_{ik} = \int \psi_i \psi_k \, d\mathbf{s}\) and \(G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}\)
Why is C a 3×3 matrix?
Now you can see it directly from the equations above:
- 3 unknowns (w₁, w₂, w₃) → 3 columns in the matrix
- 3 equations (one per ψ_i) → 3 rows in the matrix
- Each entry \(C_{ik}\) is the coefficient of \(w_k\) in equation \(i\)
In general: m vertices → m unknowns → m equations → m×m matrices.
Now let's compute the actual values of \(C_{ik}\).
What does ψ_i × ψ_j look like?
ψ₁ × ψ₁
Peak of 1 at vertex 1, fades to 0
Large integral (more "volume")
ψ₁ × ψ₂
Zero at all vertices! Small bump in middle
Smaller integral (less "volume")
Let's compute step by step:
| Entry | Calculation | At vertex 1 | At vertex 2 | In middle | Result |
|---|---|---|---|---|---|
| \(C_{11}\) | \(\int \psi_1 \times \psi_1\) | 1×1 = 1 | 0×0 = 0 | 0.33×0.33 ≈ 0.11 | 2 (larger) |
| \(C_{12}\) | \(\int \psi_1 \times \psi_2\) | 1×0 = 0 | 0×1 = 0 | 0.33×0.33 ≈ 0.11 | 1 (smaller) |
| \(C_{22}\) | \(\int \psi_2 \times \psi_2\) | 0×0 = 0 | 1×1 = 1 | 0.33×0.33 ≈ 0.11 | 2 (larger) |
The Rule for Mass Matrix C
| Same basis function (i = k): | \(C_{ii} = \frac{2 \cdot \text{Area}}{12}\) → coefficient 2 |
| Different basis functions (i ≠ k): | \(C_{ik} = \frac{1 \cdot \text{Area}}{12}\) → coefficient 1 |
Where does Area/12 come from?
There's a standard formula for integrating products of barycentric coordinates over a triangle:
where \(\lambda_1, \lambda_2, \lambda_3\) are barycentric coordinates (which ARE our basis functions: \(\psi_i = \lambda_i\)).
Computing \(C_{11} = \int \psi_1^2 \, dA = \int \lambda_1^2 \, dA\):
Here \(a=2, b=0, c=0\):
Computing \(C_{12} = \int \psi_1 \psi_2 \, dA = \int \lambda_1 \lambda_2 \, dA\):
Here \(a=1, b=1, c=0\):
The formula (standard result for linear basis functions on triangles):
Step 5: Computing the Stiffness Matrix G
Why are we computing gradient products?
Remember from the Galerkin derivation, the matrix entries \(G_{ik}\) come from the Laplacian term \(-\Delta\) in the SPDE. After integration by parts:
The gradient integrals appear naturally - we're not computing them arbitrarily!
The stiffness matrix measures how the slopes (gradients) of the tent functions interact.
Understanding the Gradient ∇ψ
Each tent function slopes downhill from its peak (vertex) toward the other vertices. The gradient points in the direction of steepest descent.
∇ψ₁: slopes away from v₁
∇ψ₂: slopes away from v₂
- Same direction (or same tent): dot product is positive → G_ii > 0
- Opposite directions: dot product is negative → G_ij < 0 for neighbors
For our right triangle:
| Entry | What it measures | Value | Why? |
|---|---|---|---|
| \(G_{11}\) | \(\int \nabla\psi_1 \cdot \nabla\psi_1\) | Positive (1) | ∇ψ₁ dotted with itself is always positive (magnitude squared) |
| \(G_{12}\) | \(\int \nabla\psi_1 \cdot \nabla\psi_2\) | Negative (-0.5) | ∇ψ₁ and ∇ψ₂ point in opposite directions (tent 1 slopes away from v₁, tent 2 slopes away from v₂) |
| \(G_{23}\) | \(\int \nabla\psi_2 \cdot \nabla\psi_3\) | Zero (0) | For this specific triangle, ∇ψ₂ and ∇ψ₃ happen to be perpendicular (dot product = 0) |
How to compute ∇ψ for a triangle
For a triangle with vertices at \((x_1, y_1), (x_2, y_2), (x_3, y_3)\), the gradient of each basis function is:
where \((i, j, k)\) is a cyclic permutation of \((1, 2, 3)\).
For our right triangle with vertices \(v_1=(0,0), v_2=(1,0), v_3=(0,1)\):
Area = 0.5, so \(2 \cdot \text{Area} = 1\)
∇ψ₁ (i=1, j=2, k=3)
∇ψ₂ (i=2, j=3, k=1)
∇ψ₃ (i=3, j=1, k=2)
Let's compute G step by step
Formula: \(G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area}\)
Diagonal entries (same basis function):
\(G_{22} = \nabla\psi_2 \cdot \nabla\psi_2 \times \text{Area} = (1 \cdot 1 + 0 \cdot 0) \times 0.5 = 1 \times 0.5 = \mathbf{0.5}\)
\(G_{33} = \nabla\psi_3 \cdot \nabla\psi_3 \times \text{Area} = (0 \cdot 0 + 1 \cdot 1) \times 0.5 = 1 \times 0.5 = \mathbf{0.5}\)
Off-diagonal entries (different basis functions):
\(G_{13} = \nabla\psi_1 \cdot \nabla\psi_3 \times \text{Area} = ((-1) \cdot 0 + (-1) \cdot 1) \times 0.5 = -1 \times 0.5 = \mathbf{-0.5}\)
\(G_{23} = \nabla\psi_2 \cdot \nabla\psi_3 \times \text{Area} = (1 \cdot 0 + 0 \cdot 1) \times 0.5 = 0 \times 0.5 = \mathbf{0}\)
Assembling G: The matrix is symmetric (\(G_{ik} = G_{ki}\)), so:
The Rule for Stiffness Matrix G
Diagonal (i = k): Always positive
The squared magnitude of a gradient is always positive.
Off-diagonal (i ≠ k): Usually negative
Gradients of adjacent tent functions typically point in opposite directions.
Sanity check: Row sums of G should be zero
This is a fundamental property of the stiffness matrix (since constant functions have zero Laplacian):
Row 2: \((-0.5) + 0.5 + 0 = 0\) ✓
Row 3: \((-0.5) + 0 + 0.5 = 0\) ✓
Step 6: Building the Precision Matrix Q
Where does Q = κ²C + G come from?
Remember from the Galerkin derivation (Step 4), each matrix entry of Q combines C and G:
This is exactly what emerged from multiplying by basis functions and integrating the SPDE!
For \(\alpha = 1\) (simplest case), with scale parameter \(\kappa\):
What does κ control?
κ²C term (Mass matrix)
Controls the amplitude/variance of the field. Larger κ² → more precision → smaller variance.
G term (Stiffness matrix)
Controls smoothness/correlation. Penalizes differences between neighboring vertices.
Let's compute Q step by step for κ = 2:
Computing Q for κ = 2
Step 1: Compute κ²C = 4 × C
Step 2: Write out the numerical values
\(\mathbf{G} = \begin{pmatrix} 1.0 & -0.5 & -0.5 \\ -0.5 & 0.5 & 0 \\ -0.5 & 0 & 0.5 \end{pmatrix}\)
Step 3: Add them: Q = κ²C + G
\(Q_{12} = 0.167 + (-0.5) = \mathbf{-0.333}\)
\(Q_{13} = 0.167 + (-0.5) = \mathbf{-0.333}\)
\(Q_{22} = 0.333 + 0.5 = \mathbf{0.833}\)
\(Q_{23} = 0.167 + 0 = \mathbf{0.167}\)
\(Q_{33} = 0.333 + 0.5 = \mathbf{0.833}\)
The Final Precision Matrix Q (for κ = 2)
Notice: The diagonal entries are positive (precision), and off-diagonal entries can be positive or negative depending on the balance of C and G contributions.
Sanity check: Q should be positive definite
For Q to define a valid Gaussian distribution, it must be symmetric positive definite. You can verify this by checking that all eigenvalues are positive, or that the determinant and all leading principal minors are positive.
Step 7: What Q Tells Us
The precision matrix \(\mathbf{Q}\) is the inverse of the covariance matrix for the weights \(\mathbf{w} = (w_1, w_2, w_3)\):
Reading the Precision Matrix
Diagonal entries Q_ii
Precision (inverse marginal variance) of weight \(w_i\).
From our Q: \(Q_{11} = 1.333\) means vertex 1 has precision 1.333
Off-diagonal entries Q_ij
Conditional dependence between \(w_i\) and \(w_j\).
\(Q_{12} = -0.333\) means vertices 1 and 2 are conditionally dependent (neighbors)
The Markov Property (Conditional Independence)
Key insight: If \(Q_{ij} = 0\), then \(w_i\) and \(w_j\) are conditionally independent given all other weights. In our example, \(Q_{23} = 0.167 \neq 0\) because vertices 2 and 3 share an edge.
In a large mesh, most vertex pairs are not neighbors → most \(Q_{ij} = 0\) → Q is sparse!
From Precision to Covariance
The covariance matrix is \(\Sigma = Q^{-1}\). For our 3×3 example:
Notice: The covariance matrix is dense (all entries non-zero) even though Q is sparse! This is why we work with Q directly in INLA - it's much more efficient.
Connection to Matérn Parameters
The SPDE parameter κ relates to the practical correlation range ρ:
For α = 2 in 2D (ν = 1), this simplifies to: \(\rho \approx \frac{2.83}{\kappa}\)
Large κ (e.g., κ = 10)
Small range ρ ≈ 0.28
Field varies rapidly, weak spatial correlation
Small κ (e.g., κ = 0.5)
Large range ρ ≈ 5.66
Field is smooth, strong spatial correlation
Summary of the Full Chain
| Step | What We Do | Result |
|---|---|---|
| 1 | Create mesh | Vertices + triangles |
| 2 | Define basis functions | Tent function \(\psi_k\) at each vertex |
| 3 | Approximate \(u(\mathbf{s})\) | \(u(\mathbf{s}) \approx \sum_k w_k \psi_k(\mathbf{s})\) |
| 4 | Plug into SPDE + integrate | FEM matrices C, G |
| 5 | Combine with \(\kappa\) | Precision matrix \(\mathbf{Q}\) |
| 6 | Use in INLA | \(\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})\) as spatial random effect |
Deriving the Random Walk 1 (RW1) Precision Matrix
The same FEM/Galerkin approach works in 1D! Let's derive the Random Walk of order 1 precision matrix using the exact same steps we used for the 2D Matérn field.
Why a Different Differential Equation?
The 2D Matérn SPDE was: \((\kappa^2 - \Delta)u = \mathcal{W}\)
For a Random Walk, we want a process where increments (first differences) are independent. This corresponds to setting κ = 0 and working in 1D:
Physical interpretation: The second derivative of position equals white noise → velocity is Brownian motion → position is integrated Brownian motion (random walk).
| Model | SPDE | Dimension | Result |
|---|---|---|---|
| Matérn (2D) | \((\kappa^2 - \Delta)u = \mathcal{W}\) | 2D spatial | Q = κ²C + G |
| RW1 | \(-\frac{d^2 u}{dx^2} = \mathcal{W}\) | 1D (time/space) | Q = G only |
Size and Rank of RW1
For a random walk with n points:
- Precision matrix Q is n × n
- There are n - 1 increments (differences between consecutive points)
- Q has rank n - 1 (rank deficient by 1)
- This is an intrinsic model - no absolute level, only relative differences matter
In practice: We add a sum-to-zero constraint or use it as a random effect (which handles the rank deficiency).
The 1D Differential Equation
The RW1 Stochastic Differential Equation
This is the 1D Laplacian (second derivative) driven by white noise. Unlike the Matérn SPDE, there is no κ²u term, so there will be no mass matrix C contribution.
Step 1: The 1D Mesh
Consider n = 4 equally-spaced points on a line with spacing h = 1:
Step 2: The 1D Basis Functions (Hat Functions)
Each point gets a "hat function" - the 1D equivalent of tent functions:
Hat function ψ₂: equals 1 at x₂, linearly decreases to 0 at neighbors x₁ and x₃
Step 3: The Galerkin Derivation (Full Detail)
Deriving the Matrix Equations (Galerkin Method for RW1)
Step 1: Start with the 1D differential equation:
Step 2: Substitute our approximation \(u(x) \approx w_1\psi_1 + w_2\psi_2 + w_3\psi_3 + w_4\psi_4\):
Step 3: Multiply both sides by \(\psi_1\) and integrate over the domain:
Step 4: Apply integration by parts: \(-\int \psi_i \frac{d^2\psi_k}{dx^2} dx = +\int \frac{d\psi_i}{dx} \frac{d\psi_k}{dx} dx\)
Step 5: Define the stiffness matrix entries:
Step 6: The equation becomes:
The Complete System: 4 Equations for 4 Unknowns
We repeat this process for \(\psi_2\), \(\psi_3\), and \(\psi_4\) to get 4 equations:
Eq 1 (×ψ₁): \(w_1 G_{11} + w_2 G_{12} + w_3 G_{13} + w_4 G_{14} = \epsilon_1\)
Eq 2 (×ψ₂): \(w_1 G_{21} + w_2 G_{22} + w_3 G_{23} + w_4 G_{24} = \epsilon_2\)
Eq 3 (×ψ₃): \(w_1 G_{31} + w_2 G_{32} + w_3 G_{33} + w_4 G_{34} = \epsilon_3\)
Eq 4 (×ψ₄): \(w_1 G_{41} + w_2 G_{42} + w_3 G_{43} + w_4 G_{44} = \epsilon_4\)
In matrix form: \(\mathbf{G}\mathbf{w} = \boldsymbol{\epsilon}\), where \(\mathbf{Q}_{\text{RW1}} = \mathbf{G}\)
Key Difference from Matérn SPDE
In the Matérn case, we had \(\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}\) (both mass and stiffness matrices). For RW1, there's no κ²u term in the differential equation, so no mass matrix C appears. The precision matrix is only the stiffness matrix: \(\mathbf{Q}_{\text{RW1}} = \mathbf{G}\).
Step 4: Computing the Stiffness Matrix G
Gradients of 1D hat functions
For a hat function at xᵢ with spacing h:
Left segment [xᵢ₋₁, xᵢ]
Slope = \(\frac{d\psi_i}{dx} = +\frac{1}{h}\)
Right segment [xᵢ, xᵢ₊₁]
Slope = \(\frac{d\psi_i}{dx} = -\frac{1}{h}\)
Let's compute G step by step (h = 1)
Diagonal entries G_ii (interior point, e.g., i = 2):
\(G_{22} = \frac{1}{h^2} \cdot h + \frac{1}{h^2} \cdot h = \frac{1}{h} + \frac{1}{h} = \frac{2}{h}\)
With h = 1: \(G_{22} = \mathbf{2}\)
Diagonal entries G_ii (boundary point, e.g., i = 1):
With h = 1: \(G_{11} = \mathbf{1}\) (only one segment!)
Off-diagonal entries G_ik (neighbors, e.g., i = 1, k = 2):
With h = 1: \(G_{12} = \mathbf{-1}\)
Off-diagonal entries G_ik (non-neighbors, e.g., i = 1, k = 3):
The RW1 Precision Matrix (4 points)
This is exactly the standard RW1 structure matrix!
The Rule for RW1 Precision Matrix
Interior points
Q_ii = 2
(two neighbors)
Boundary points
Q_ii = 1
(one neighbor)
Neighbors
Q_ij = -1
(adjacent points)
Sanity check: Row sums should be zero
Row 2: \((-1) + 2 + (-1) + 0 = 0\) ✓
Row 3: \(0 + (-1) + 2 + (-1) = 0\) ✓
Row 4: \(0 + 0 + (-1) + 1 = 0\) ✓
Row sums = 0 means Q is rank-deficient (as expected for RW1 - there's no "anchor").
Why is this a "Random Walk"?
The quadratic form \(\mathbf{x}^T \mathbf{Q} \mathbf{x}\) gives:
This penalizes first differences (increments) - exactly what a random walk does! The increments \(\Delta x_i = x_{i+1} - x_i\) are independent with equal variance.
| Model | Differential Operator | Precision Matrix Structure |
|---|---|---|
| RW1 | \(-\frac{d^2}{dx^2}\) (1D Laplacian) | Tridiagonal: (1, -1) at boundaries, (2, -1, -1) interior |
| RW2 | \(\frac{d^4}{dx^4}\) (biharmonic) | Pentadiagonal: penalizes second differences |
| Matérn 2D | \((\kappa^2 - \Delta)^{\alpha/2}\) | Sparse graph Laplacian structure |
Deriving the Random Walk 2 (RW2) Precision Matrix
Following the same approach, let's derive the Random Walk of order 2 precision matrix. RW2 produces smoother curves than RW1 by penalizing changes in slope rather than changes in level.
Why a Fourth Derivative?
RW1 used the second derivative (Laplacian), which penalizes curvature and gives independent first differences.
RW2 uses the fourth derivative (biharmonic), which penalizes changes in curvature and gives independent second differences:
Physical interpretation: RW1 assumes velocity is white noise (position is rough). RW2 assumes acceleration is white noise (velocity is smooth, position is even smoother).
| Property | RW1 | RW2 |
|---|---|---|
| SPDE | \(-\frac{d^2 u}{dx^2} = \mathcal{W}\) | \(\frac{d^4 u}{dx^4} = \mathcal{W}\) |
| Penalizes | First differences (level changes) | Second differences (slope changes) |
| Smoothness | Continuous but rough | Continuous and smooth |
| Matrix structure | Tridiagonal (3 diagonals) | Pentadiagonal (5 diagonals) |
| Rank deficiency | 1 (needs 1 constraint) | 2 (needs 2 constraints) |
Size and Rank of RW2
For a random walk of order 2 with n points:
- Precision matrix Q is n × n
- There are n - 2 second differences
- Q has rank n - 2 (rank deficient by 2)
- This is doubly intrinsic - no absolute level AND no absolute slope
In practice: Need sum-to-zero constraint AND linear-trend-to-zero constraint, or use as random effect.
The Galerkin Approach for RW2
Starting from the SPDE
Multiply by test function ψᵢ and integrate:
Apply integration by parts twice:
The Challenge with Piecewise Linear Basis Functions
For hat functions (piecewise linear), the second derivative is zero almost everywhere (linear functions have no curvature), except at the kinks where it has delta-function behavior.
Solution: We can construct the RW2 precision matrix directly as \(\mathbf{Q} = \mathbf{D}^T\mathbf{D}\), where \(\mathbf{D}\) is the second-difference matrix.
Step 1: The Second-Difference Matrix D
The second difference at point i is: \(\Delta^2 x_i = x_{i-1} - 2x_i + x_{i+1}\)
For n = 5 points, we have n - 2 = 3 second differences:
Each row computes: (left neighbor) - 2×(center) + (right neighbor)
Step 2: Computing Q = DTD
Why Q = DTD?
The quadratic form should penalize squared second differences:
Therefore: \(\mathbf{Q}_{\text{RW2}} = \mathbf{D}^T\mathbf{D}\)
Step 3: Computing Q Step by Step (n = 5)
Matrix multiplication DTD
DT (5×3):
Q = DTD (5×5):
The RW2 Precision Matrix (5 points)
This is the standard RW2 structure matrix! Note the pentadiagonal structure.
The Rule for RW2 Precision Matrix
Boundary (i=1,n)
Q_ii = 1
Near-boundary (i=2,n-1)
Q_ii = 5
Interior (i=3,...,n-2)
Q_ii = 6
1st neighbors
Q_{i,i±1} = -4 or -2
2nd neighbors
Q_{i,i±2} = 1
Sanity checks
Row 1: \(1 + (-2) + 1 = 0\) ✓
Row 2: \((-2) + 5 + (-4) + 1 = 0\) ✓
Row 3: \(1 + (-4) + 6 + (-4) + 1 = 0\) ✓
Rank = n - 2: rank(Q) = 5 - 2 = 3 ✓
Row sums = 0 and rank deficiency = 2 confirm this is RW2.
Why is this "Random Walk 2"?
The quadratic form \(\mathbf{x}^T \mathbf{Q} \mathbf{x}\) gives:
This penalizes second differences - deviations from a straight line!
RW1
\(\Delta x_i = x_{i+1} - x_i\)
Independent increments
RW2
\(\Delta^2 x_i = x_{i-1} - 2x_i + x_{i+1}\)
Independent second increments
Visual Intuition: RW1 vs RW2
RW1: Penalizes level changes
Jagged, can change direction sharply
RW2: Penalizes slope changes
Smooth, gradual changes in direction
| Property | RW1 | RW2 |
|---|---|---|
| Precision matrix | Tridiagonal | Pentadiagonal |
| Non-zero diagonals | 3 (main, ±1) | 5 (main, ±1, ±2) |
| Rank deficiency | 1 | 2 |
| Null space | Constant (1, 1, 1, ...) | Constant + Linear (1, 2, 3, ...) |
| Typical diagonal (interior) | 2 | 6 |
From u(s) to Weights: What Are We Estimating?
The Continuous Field u(s)
The SPDE defines a continuous spatial field \(u(s)\) at every location \(s\) in the domain. But we can't estimate infinitely many values! We need a finite representation.
The Basis Function Approximation
We approximate \(u(s)\) using basis functions centered at mesh vertices:
Where:
- \(\psi_k(s)\): Basis function for vertex \(k\) (piecewise linear "tent" function)
- \(w_k\): Weight at vertex \(k\) (what INLA estimates)
- \(n\): Number of mesh vertices
Connecting the Notation
The weights \(\mathbf{w}\) are the same as \(\mathbf{u}_{\text{mesh}}\):
Once we have the weights, we can compute \(u(s)\) at any location by evaluating the basis functions.
The Projector Matrix A
To get \(u(s)\) at observation locations, we evaluate the basis function expansion. The projector matrix \(\mathbf{A}\) does this efficiently:
In matrix form:
Where \(A_{ik} = \psi_k(s_i)\) is the value of basis function \(k\) at observation location \(s_i\).
Structure of A
Each row of \(\mathbf{A}\) has at most 3 non-zero values because each point lies in one triangle with 3 vertices. These values are the barycentric coordinates of the point within its triangle, and each row sums to 1.
| Symbol | Dimension | Meaning |
|---|---|---|
| \(\mathbf{w}\) or \(\mathbf{u}_{\text{mesh}}\) | \(n \times 1\) | Weights at mesh vertices (what INLA estimates) |
| \(\mathbf{A}\) | \(m \times n\) | Projector matrix (basis functions at obs locations) |
| \(\mathbf{u}_{\text{obs}}\) | \(m \times 1\) | \(u(s)\) at observation locations |
The Complete Model
Putting it all together, a spatial model with SPDE has three levels:
General Hierarchical Model
Where \(g(\cdot)\) is the link function connecting the mean \(\mu_i\) to the linear predictor \(\eta_i\):
| Family | Link \(g(\mu)\) | Example |
|---|---|---|
| Gaussian | identity: \(\mu\) | Continuous measurements |
| Poisson | log: \(\log(\mu)\) | Counts, disease mapping |
| Binomial | logit: \(\log(\mu/(1-\mu))\) | Presence/absence, proportions |
| Gamma | log: \(\log(\mu)\) | Positive continuous (e.g., rainfall) |
Where Does Q Come From?
The precision matrix \(\mathbf{Q}\) comes from the SPDE discretization. For the Matérn field with α = 2:
Where \(\mathbf{C}\) (mass matrix) and \(\mathbf{G}\) (stiffness matrix) are computed from integrating the basis functions over the mesh triangles. The matrix \(\mathbf{Q}\) encodes the spatial correlation structure and defines the prior on the weights \(\mathbf{w}\).
Notation Summary
- \(\mathbf{y}\): Observed response vector (\(m \times 1\))
- \(\eta_i\): Linear predictor at observation \(i\)
- \(g(\cdot)\): Link function (identity, log, logit, etc.)
- \(\beta_0\): Intercept (and other fixed effects)
- \(\mathbf{A}\): Projector matrix (\(m \times n\)), evaluates basis functions at observation locations
- \(\mathbf{w}\): Weights at mesh vertices (\(n \times 1\)), what INLA estimates
- \(\mathbf{A}\mathbf{w}\): The spatial field \(u(s)\) evaluated at observation locations
- \(\mathbf{Q}\): Precision matrix from SPDE (\(n \times n\)), prior precision of weights \(\mathbf{w}\)
Next Steps
Now that you understand the theory, see how to fit SPDE models in practice:
- Modeling SPDE - Step-by-step workflow for fitting SPDE models
- Creating Meshes - Detailed guide on mesh construction
- Priors - Setting PC priors for range and sigma
References
- Lindgren, F., Rue, H., & Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. JRSS-B, 73(4), 423-498.
- Fuglstad, G.-A., Simpson, D., Lindgren, F., & Rue, H. (2019). Constructing priors that penalize the complexity of Gaussian random fields. JASA, 114(525), 445-452.
- Krainski, E., et al. (2018). Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Chapman & Hall/CRC.