← Learn

Understanding SPDEs

The theory behind SPDE spatial models: Matérn covariance, FEM, and precision matrices.

The Big Idea

The SPDE approach lets you model continuous spatial processes efficiently by connecting the Matérn covariance to a sparse precision matrix.

Traditional approach: Dense covariance matrix \(\Sigma\) → \(O(n^3)\) computation
SPDE approach: Sparse precision matrix \(\mathbf{Q}\) → \(O(n^{3/2})\) computation

This is possible because a Gaussian field with Matérn covariance is the solution to a specific SPDE, and the Finite Element Method (FEM) gives us a GMRF approximation with a sparse precision matrix.

The Matérn Covariance

The Matérn covariance is the most widely used covariance function in spatial statistics. It describes how the similarity between observations decreases with distance.

Correlation Function

For two locations \(\mathbf{s}_i\) and \(\mathbf{s}_j\), the Matérn correlation is:

\[ \text{Cor}(U(\mathbf{s}_i), U(\mathbf{s}_j)) = \frac{2^{1-\nu}}{\Gamma(\nu)} (\kappa \| \mathbf{s}_i - \mathbf{s}_j \|)^\nu K_\nu(\kappa \| \mathbf{s}_i - \mathbf{s}_j \|) \]

Covariance Function

The covariance is the correlation scaled by the marginal variance \(\sigma^2\):

\[ \text{Cov}(U(\mathbf{s}_i), U(\mathbf{s}_j)) = \sigma^2 \cdot \text{Cor}(U(\mathbf{s}_i), U(\mathbf{s}_j)) \]

The Parameters

ParameterSymbolInterpretation
Marginal variance \(\sigma^2\) The variance of the field at any single location. Controls the "amplitude" of spatial variation.
Scale \(\kappa > 0\) Controls how quickly correlation decays with distance. Larger \(\kappa\) means faster decay (shorter range).
Smoothness \(\nu > 0\) Controls the differentiability of the field. The field is \(\lceil \nu \rceil - 1\) times differentiable. Larger \(\nu\) gives smoother surfaces.

When distance is zero, the Bessel function term equals 1, so \(\text{Cor} = 1\) and \(\text{Cov} = \sigma^2\).

Understanding Smoothness (\(\nu\))

The smoothness parameter has a direct physical interpretation:

  • \(\nu = 0.5\): Exponential covariance. Rough, non-differentiable surfaces (like Brownian motion).
  • \(\nu = 1\): Once differentiable. Moderately smooth surfaces. Default in INLA.
  • \(\nu = 1.5\): 1.5 times differentiable. Smoother surfaces.
  • \(\nu = 2\): Twice differentiable. Very smooth surfaces.
  • \(\nu \to \infty\): Gaussian (squared exponential) covariance. Infinitely smooth.
In practice: \(\nu\) is often fixed (not estimated) because it is poorly identified from data. The default \(\nu = 1\) works well for most environmental and spatial applications.

Practical Range

The scale parameter \(\kappa\) is hard to interpret directly. Instead, we use the practical range, which is the distance at which correlation drops to approximately 0.14 (i.e., \(e^{-1} \approx 0.13\)):

\[ \rho = \frac{\sqrt{8\nu}}{\kappa} \]

Interpretation: If the practical range is 50 km, then two locations 50 km apart have correlation ~0.14. Beyond this distance, observations are nearly uncorrelated.

Marginal Standard Deviation (\(\sigma\))

The marginal standard deviation \(\sigma\) controls how much the spatial field varies. For the SPDE parameterization:

\[ \sigma^2 = \frac{\Gamma(\nu)}{\Gamma(\nu + d/2)(4\pi)^{d/2}\kappa^{2\nu}\tau^2} \]

where \(\tau\) is an internal precision parameter. In 2D with \(\nu=1\), this simplifies to \(\sigma = 1/(\tau \sqrt{4\pi} \kappa)\).

Interpretation: If \(\sigma = 2\), then at any location, the spatial effect has standard deviation 2 (in the units of your response variable).

The SPDE-Matérn Link

Why This Matters

Computing Matérn covariance directly requires building and inverting large dense matrices, which becomes computationally infeasible for thousands of locations. The SPDE approach provides a shortcut: instead of working with the covariance matrix, we work with a sparse precision matrix that is much faster to compute.

The Key Insight (No Differential Equations Required)

Think of a Matérn random field as a smooth, spatially correlated surface. Lindgren et al. (2011) discovered that this surface can be created through a simple recipe:

  1. Start with random noise: Imagine sprinkling random values across your spatial domain, each location getting an independent random value.
  2. Smooth the noise: Apply a smoothing operation that blends nearby values together. The amount of smoothing determines how correlated nearby locations become.
  3. The result: After smoothing, you get exactly a Matérn covariance structure.
The practical benefit: This "noise + smoothing" recipe can be computed very efficiently using the mesh. Instead of storing correlations between all pairs of points (which grows as n²), we only need to store relationships between neighboring mesh vertices (which grows as n).

The Mathematical Statement

For those interested, the formal result is that a Gaussian field with Matérn covariance is the solution to this stochastic partial differential equation (SPDE):

\[ (\kappa^2 - \Delta)^{\alpha/2} u(\mathbf{s}) = \mathcal{W}(\mathbf{s}) \]

The goal: We want to find \(u(\mathbf{s})\), the spatial random field. This is the unknown we're solving for. The SPDE tells us that \(u(\mathbf{s})\) (the result of smoothing white noise) has exactly Matérn covariance structure.

Understanding Each Part

SymbolWhat It MeansIntuition
\(\mathcal{W}(\mathbf{s})\) Spatial white noise The random starting values at each location (completely uncorrelated)
\(\kappa^2 - \Delta\) Smoothing operator Blends values with neighbors. Higher \(\kappa\) means less smoothing (shorter range)
\(\alpha/2\) Power of smoothing How many times to apply the smoothing. More smoothing = smoother result
\(u(\mathbf{s})\) The spatial field The smooth, correlated surface we want (has Matérn covariance)

Connection to Matérn Parameters

The smoothing parameters connect directly to the Matérn covariance parameters:

  • \(\alpha = \nu + d/2\) where \(d\) is the spatial dimension (usually 2)
  • Higher \(\alpha\) means more smoothing passes, giving smoother fields (higher \(\nu\))
  • The scale parameter \(\kappa\) controls the correlation range
Default in 2D: \(\alpha = 2\) which gives \(\nu = 1\). This means one smoothing pass, producing fields that are continuous but can have sharp changes (like real environmental data).

Summary: What You Need to Remember

You don't need to understand the differential equation. Just remember:

  • The SPDE approach is a computational trick to get Matérn covariance efficiently
  • It uses the mesh to represent spatial smoothing with sparse matrices
  • This makes fitting models with thousands of locations fast and practical
  • The results are statistically equivalent to fitting Matérn covariance directly

From SPDE to FEM: The Bridge

The Problem

The SPDE defines \(u(\mathbf{s})\), the spatial field we want to find. But it's defined continuously over all of space: we'd need to know \(u(\mathbf{s})\) at infinitely many points. That's impossible to compute.

The Solution: Discretize with a Mesh

This is where the triangular mesh comes in. Instead of finding \(u(\mathbf{s})\) everywhere, we approximate it using values only at mesh vertices:

  1. Cover the domain with triangles: The mesh divides your spatial region into triangles.
  2. Estimate \(u\) at vertices only: We find \(u\) at each of the \(m\) mesh vertices (call these values \(w_1, w_2, \ldots, w_m\)).
  3. Interpolate between vertices: Within each triangle, \(u(\mathbf{s})\) is linearly interpolated from the three corner values (like stretching a rubber sheet over the vertices).
Key insight: By discretizing on a mesh, we reduce the problem from "find \(u(\mathbf{s})\) at infinitely many points" to "estimate \(m\) weights \(\{w_1, \ldots, w_m\}\) at mesh vertices." The precision matrix \(\mathbf{Q}\) tells us how these weights are correlated.

Why This Works

The SPDE says "smooth this noise to get \(u(\mathbf{s})\)" continuously. The FEM approximation says "smooth this noise, but only track values at mesh vertices, and interpolate in between." As the mesh gets finer (more vertices), the approximation gets closer to the true continuous solution.

Practical note: You don't need a very fine mesh. Even a relatively coarse mesh (a few hundred vertices) often gives excellent approximations because Matérn fields are smooth.

FEM Matrices and Precision

Approximating the Spatial Field with Basis Functions

Remember: our goal is to find \(u(\mathbf{s})\), the spatial field from the SPDE. FEM approximates this continuous field as a weighted sum of basis functions defined on the triangular mesh:

\[ u(\mathbf{s}) \approx \sum_{k=1}^{m} \psi_k(\mathbf{s}) w_k \]

Where:

  • \(\psi_k(\mathbf{s})\): Basis functions (one per mesh vertex), defined by the triangles
  • \(w_k\): Random weights at each vertex. These are what we estimate.
  • \(m\): Number of mesh vertices

So instead of finding \(u(\mathbf{s})\) at every point in space, we only need to estimate the \(m\) weights \(\{w_1, w_2, \ldots, w_m\}\). The basis functions handle interpolation everywhere else.

What Are Tent Functions?

Think of the mesh as lying flat on the ground (the x-y plane). Each basis function \(\psi_k\) adds a third dimension (height) that rises above the mesh:

The key insight: The mesh lives in 2D (x, y). The basis function value is a height (z) at each point. So \(\psi_k(x, y)\) tells you "how tall is the tent at location (x, y)?"

For each vertex \(k\), the tent function \(\psi_k\) has these properties:

  • Height = 1 directly above vertex \(k\) (the peak of the tent)
  • Height = 0 at all neighboring vertices (the tent touches the ground there)
  • Slopes linearly from height 1 down to height 0 within the triangles touching vertex \(k\)
  • Height = 0 everywhere outside those triangles

Side View: How the Tent Rises Above the Mesh

x height ψ(x) v₁ v₂ v₃ ψ₂ = 1 ψ₂ = 0 ψ₂ = 0 slopes down slopes down

The mesh is flat (on the x-axis). The tent function ψ₂ rises to height 1 at vertex v₂, and slopes down to height 0 at neighboring vertices v₁ and v₃.

3D View: Single Tent on a Triangle:

ψ₁ = 1 ψ₁ = 0 ψ₁ = 0 height

The tent function equals 1 at its vertex and decreases linearly to 0 at neighboring vertices.

All Three Basis Functions:

ψ₁=1 ψ₂=1 ψ₃=1 overlap

Each vertex has its own tent. Inside the triangle, all three overlap (sum to 1).

What does "overlap" mean? At any point inside a triangle, all three tent functions have non-zero height. The values depend on where you are in the triangle:
  • At the center: ψ₁ ≈ 0.33, ψ₂ ≈ 0.33, ψ₃ ≈ 0.33 (equidistant from all vertices)
  • Near vertex 1: ψ₁ ≈ 0.8, ψ₂ ≈ 0.1, ψ₃ ≈ 0.1 (closer to vertex 1)
  • On edge 1-2: ψ₃ = 0, and ψ₁ + ψ₂ = 1 (on the edge opposite to vertex 3)
  • Always: ψ₁ + ψ₂ + ψ₃ = 1, no matter where!

Why do they always sum to 1?

This is not a coincidence - it's guaranteed by construction! The basis functions are defined using barycentric coordinates.

The idea: Any point P inside a triangle can be written as a weighted average of the three vertices:

\[ P = \lambda_1 \cdot v_1 + \lambda_2 \cdot v_2 + \lambda_3 \cdot v_3 \quad \text{where} \quad \lambda_1 + \lambda_2 + \lambda_3 = 1 \]

The weights \(\lambda_1, \lambda_2, \lambda_3\) are called barycentric coordinates. They tell you "how much" of each vertex contributes to point P.

The key: The basis functions ARE the barycentric coordinates!

\[ \psi_k(P) = \lambda_k \]

So by definition: \(\psi_1(P) + \psi_2(P) + \psi_3(P) = \lambda_1 + \lambda_2 + \lambda_3 = 1\). This property is called "partition of unity".

Check it at different locations:

Locationψ₁ψ₂ψ₃Sum
At vertex 11001
At vertex 20101
At vertex 30011
Midpoint of edge 1-20.50.501
Center of triangle0.330.330.331
Any point insideλ₁λ₂λ₃1

Weighted Sum Example:

Why different sizes? The basis functions ψ_k all have the same shape (peak at 1), but we multiply by weights w_k. So w₁·ψ₁ peaks at w₁, not at 1. Bigger weight = taller tent!

w₁=2 ψ₁ + w₂=1 ψ₂ + w₃=3 ψ₃ = u=2 u=1 u=3 u(s)

Result: \(u(\mathbf{s}) = 2 \cdot \psi_1 + 1 \cdot \psi_2 + 3 \cdot \psi_3\). At each vertex, \(u\) equals the weight. Inside the triangle, values are linearly interpolated.

When you sum all these tent functions (each multiplied by its weight \(w_k\)), you get a piecewise linear surface that passes through the weight values at each vertex.

Plugging Back into the SPDE

Here's the key step: we substitute our approximation \(u(\mathbf{s}) \approx \sum_k \psi_k(\mathbf{s}) w_k\) back into the SPDE equation. This converts the continuous differential equation into a discrete system of equations for the weights \(\{w_1, \ldots, w_m\}\).

The process (called the Galerkin method) involves:

  1. Substitute the basis function approximation into the SPDE
  2. Multiply by each basis function \(\psi_i\) and integrate over the domain
  3. This produces integrals of products of basis functions

The Connection: How u(s) leads to C, G, and Q

Step 1: Start with the SPDE (simplified for α=1):

\[ (\kappa^2 - \Delta) u(\mathbf{s}) = \mathcal{W}(\mathbf{s}) \]

Step 2: Substitute \(u(\mathbf{s}) \approx \sum_k w_k \psi_k(\mathbf{s})\):

\[ (\kappa^2 - \Delta) \sum_k w_k \psi_k(\mathbf{s}) = \mathcal{W}(\mathbf{s}) \]

Step 3: Multiply by \(\psi_i\) and integrate (Galerkin method):

\[ \sum_k w_k \left[ \kappa^2 \underbrace{\int \psi_i \psi_k \, d\mathbf{s}}_{C_{ik}} + \underbrace{\int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}}_{G_{ik}} \right] = \text{noise}_i \]

Step 4: In matrix form, this becomes:

\[ (\kappa^2 \mathbf{C} + \mathbf{G}) \mathbf{w} = \text{noise} \quad \Rightarrow \quad \mathbf{Q} \mathbf{w} = \text{noise} \]

Result: The weights follow \(\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})\) where \(\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}\). The FEM matrices C and G come directly from integrating products of basis functions!

The FEM Matrices

From plugging the basis functions into the SPDE, we get two matrices:

MatrixDefinitionIntuition
C (Mass) \(C_{ij} = \int \psi_i(\mathbf{s}) \psi_j(\mathbf{s}) \, d\mathbf{s}\) How much the basis functions overlap. Mostly diagonal (each vertex has an "area of influence").
G (Stiffness) \(G_{ij} = \int \nabla\psi_i \cdot \nabla\psi_j \, d\mathbf{s}\) How the gradients (slopes) of neighboring basis functions interact. Encodes mesh connectivity.

Key point: Both C and G are sparse because each basis function \(\psi_k\) is only non-zero near vertex \(k\). Two basis functions only interact if their vertices share a triangle.

The Precision Matrix

The precision matrix \(\mathbf{Q}\) is built from C and G. The exact formula depends on the smoothness parameter α you choose:

Simple version: There is just ONE matrix Q. The formula changes based on how smooth you want the field. In practice, α = 2 is the default, so you almost always use the second formula.
If you choose...Formula for QField smoothness
α = 1 (rare) \(\mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G}\) Rough (ν = 0)
α = 2 (default) \(\mathbf{Q} = \kappa^4 \mathbf{C} + 2\kappa^2 \mathbf{G} + \mathbf{G}\mathbf{C}^{-1}\mathbf{G}\) Smooth (ν = 1)

Why different formulas? Higher α means more smoothing, which requires a more complex formula. Think of it like applying a blur filter: α = 1 is one pass, α = 2 is two passes (smoother result). The formula for α = 2 is essentially "apply the α = 1 formula twice."

Key insight: \(\mathbf{Q}\) is sparse because \(\mathbf{G}\) only has non-zero entries for connected mesh vertices. This sparsity enables fast computation.

Worked Example: A Single Triangle

Let's work through a complete example with the simplest possible mesh: 3 vertices forming one triangle.

Step 1: The Mesh

Consider a triangle with vertices at:

  • Vertex 1: \((0, 0)\)
  • Vertex 2: \((1, 0)\)
  • Vertex 3: \((0, 1)\)

This is a right triangle with area = 0.5.

Step 2: The Basis Functions

Each vertex gets a tent function. For this triangle:

  • \(\psi_1(\mathbf{s})\): equals 1 at vertex 1, linearly decreases to 0 at vertices 2 and 3
  • \(\psi_2(\mathbf{s})\): equals 1 at vertex 2, linearly decreases to 0 at vertices 1 and 3
  • \(\psi_3(\mathbf{s})\): equals 1 at vertex 3, linearly decreases to 0 at vertices 1 and 2

At any point inside the triangle: \(\psi_1 + \psi_2 + \psi_3 = 1\) (they form a partition of unity).

Step 3: The Spatial Field Approximation

We approximate \(u(\mathbf{s})\) as:

\[ u(\mathbf{s}) \approx w_1 \psi_1(\mathbf{s}) + w_2 \psi_2(\mathbf{s}) + w_3 \psi_3(\mathbf{s}) \]

Where \(w_1, w_2, w_3\) are the random weights we estimate. At vertex 1, \(u = w_1\). At vertex 2, \(u = w_2\). Inside the triangle, \(u\) is linearly interpolated.

Step 4: Computing the Mass Matrix C

Deriving the Matrix Equations (Galerkin Method)

Step 1: Start with the SPDE (for α=1):

\[ (\kappa^2 - \Delta) u(\mathbf{s}) = \mathcal{W}(\mathbf{s}) \]

Step 2: Substitute our approximation \(u(\mathbf{s}) \approx w_1\psi_1 + w_2\psi_2 + w_3\psi_3\):

\[ (\kappa^2 - \Delta)(w_1\psi_1 + w_2\psi_2 + w_3\psi_3) = \mathcal{W}(\mathbf{s}) \]

Step 3: Multiply both sides by \(\psi_1\) and integrate over the domain:

\[ \int \psi_1 \cdot (\kappa^2 - \Delta)(w_1\psi_1 + w_2\psi_2 + w_3\psi_3) \, d\mathbf{s} = \int \psi_1 \cdot \mathcal{W} \, d\mathbf{s} \]

Step 4: Expand (using integration by parts for the Δ term):

\[ w_1 \left[ \kappa^2 \int\psi_1\psi_1 + \int\nabla\psi_1 \cdot \nabla\psi_1 \right] + w_2 \left[ \kappa^2 \int\psi_1\psi_2 + \int\nabla\psi_1 \cdot \nabla\psi_2 \right] + w_3 \left[ \kappa^2 \int\psi_1\psi_3 + \int\nabla\psi_1 \cdot \nabla\psi_3 \right] = \epsilon_1 \]

Step 5: Define the matrix entries:

\[ C_{ik} = \int \psi_i \psi_k \, d\mathbf{s} \qquad G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s} \]

Step 6: The equation becomes:

\[ w_1(\kappa^2 C_{11} + G_{11}) + w_2(\kappa^2 C_{12} + G_{12}) + w_3(\kappa^2 C_{13} + G_{13}) = \epsilon_1 \]

The Complete System: 3 Equations

We repeat this process for \(\psi_2\) and \(\psi_3\) to get 3 equations:

Equation 1 (multiply by ψ₁): \(w_1(\kappa^2 C_{11} + G_{11}) + w_2(\kappa^2 C_{12} + G_{12}) + w_3(\kappa^2 C_{13} + G_{13}) = \epsilon_1\)

Equation 2 (multiply by ψ₂): \(w_1(\kappa^2 C_{21} + G_{21}) + w_2(\kappa^2 C_{22} + G_{22}) + w_3(\kappa^2 C_{23} + G_{23}) = \epsilon_2\)

Equation 3 (multiply by ψ₃): \(w_1(\kappa^2 C_{31} + G_{31}) + w_2(\kappa^2 C_{32} + G_{32}) + w_3(\kappa^2 C_{33} + G_{33}) = \epsilon_3\)

In matrix form:

\[ \underbrace{\begin{pmatrix} \kappa^2 C_{11} + G_{11} & \kappa^2 C_{12} + G_{12} & \kappa^2 C_{13} + G_{13} \\ \kappa^2 C_{21} + G_{21} & \kappa^2 C_{22} + G_{22} & \kappa^2 C_{23} + G_{23} \\ \kappa^2 C_{31} + G_{31} & \kappa^2 C_{32} + G_{32} & \kappa^2 C_{33} + G_{33} \end{pmatrix}}_{\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}} \begin{pmatrix} w_1 \\ w_2 \\ w_3 \end{pmatrix} = \begin{pmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{pmatrix} \]

Showing C and G separately:

\[ \mathbf{Q} = \kappa^2 \underbrace{\begin{pmatrix} C_{11} & C_{12} & C_{13} \\ C_{21} & C_{22} & C_{23} \\ C_{31} & C_{32} & C_{33} \end{pmatrix}}_{\mathbf{C} \text{ (Mass)}} + \underbrace{\begin{pmatrix} G_{11} & G_{12} & G_{13} \\ G_{21} & G_{22} & G_{23} \\ G_{31} & G_{32} & G_{33} \end{pmatrix}}_{\mathbf{G} \text{ (Stiffness)}} \]

Where:   \(C_{ik} = \int \psi_i \psi_k \, d\mathbf{s}\)  and  \(G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}\)

Why is C a 3×3 matrix?

Now you can see it directly from the equations above:

  • 3 unknowns (w₁, w₂, w₃) → 3 columns in the matrix
  • 3 equations (one per ψ_i) → 3 rows in the matrix
  • Each entry \(C_{ik}\) is the coefficient of \(w_k\) in equation \(i\)

In general: m vertices → m unknowns → m equations → m×m matrices.

Now let's compute the actual values of \(C_{ik}\).

The key idea: \(C_{ik} = \int \psi_i(\mathbf{s}) \times \psi_k(\mathbf{s}) \, d\mathbf{s}\). This is NOT just "do the tents overlap?" (they always do inside the triangle). It's "when we multiply the two tent heights together and add up the result over the whole triangle, what do we get?"

What does ψ_i × ψ_j look like?

1×1=1 0×0=0 0×0=0

ψ₁ × ψ₁

Peak of 1 at vertex 1, fades to 0

Large integral (more "volume")

1×0=0 0×1=0 0×0=0 max≈0.25

ψ₁ × ψ₂

Zero at all vertices! Small bump in middle

Smaller integral (less "volume")

The crucial insight: For \(C_{11} = \int \psi_1 \times \psi_1\), the product is 1 at vertex 1 and positive everywhere in the triangle. For \(C_{12} = \int \psi_1 \times \psi_2\), the product is 0 at every vertex (because where ψ₁=1, ψ₂=0 and vice versa). The product is only non-zero in the middle, and even there it's small. That's why C₁₁ > C₁₂.

Let's compute step by step:

EntryCalculationAt vertex 1At vertex 2In middleResult
\(C_{11}\) \(\int \psi_1 \times \psi_1\) 1×1 = 1 0×0 = 0 0.33×0.33 ≈ 0.11 2 (larger)
\(C_{12}\) \(\int \psi_1 \times \psi_2\) 1×0 = 0 0×1 = 0 0.33×0.33 ≈ 0.11 1 (smaller)
\(C_{22}\) \(\int \psi_2 \times \psi_2\) 0×0 = 0 1×1 = 1 0.33×0.33 ≈ 0.11 2 (larger)

The Rule for Mass Matrix C

Same basis function (i = k): \(C_{ii} = \frac{2 \cdot \text{Area}}{12}\) → coefficient 2
Different basis functions (i ≠ k): \(C_{ik} = \frac{1 \cdot \text{Area}}{12}\) → coefficient 1

Where does Area/12 come from?

There's a standard formula for integrating products of barycentric coordinates over a triangle:

\[ \int_{\triangle} \lambda_1^a \lambda_2^b \lambda_3^c \, dA = \frac{a! \, b! \, c!}{(a+b+c+2)!} \times 2 \times \text{Area} \]

where \(\lambda_1, \lambda_2, \lambda_3\) are barycentric coordinates (which ARE our basis functions: \(\psi_i = \lambda_i\)).

Computing \(C_{11} = \int \psi_1^2 \, dA = \int \lambda_1^2 \, dA\):

Here \(a=2, b=0, c=0\):

\[ C_{11} = \frac{2! \cdot 0! \cdot 0!}{(2+0+0+2)!} \times 2 \times \text{Area} = \frac{2 \cdot 1 \cdot 1}{4!} \times 2 \times \text{Area} = \frac{2}{24} \times 2 \times \text{Area} = \frac{\mathbf{2} \times \text{Area}}{12} \]

Computing \(C_{12} = \int \psi_1 \psi_2 \, dA = \int \lambda_1 \lambda_2 \, dA\):

Here \(a=1, b=1, c=0\):

\[ C_{12} = \frac{1! \cdot 1! \cdot 0!}{(1+1+0+2)!} \times 2 \times \text{Area} = \frac{1 \cdot 1 \cdot 1}{4!} \times 2 \times \text{Area} = \frac{1}{24} \times 2 \times \text{Area} = \frac{\mathbf{1} \times \text{Area}}{12} \]
The 12 comes from 4!/2 = 24/2 = 12. The factorial (4! = 24) appears because we're integrating polynomials over a 2D simplex (triangle). It's similar to how \(\int_0^1 x \, dx = \frac{1}{2}\) and \(\int_0^1 x^2 \, dx = \frac{1}{3}\) - the denominator increases with polynomial degree. For triangles with barycentric coordinates, the pattern gives us 4! in the denominator.

The formula (standard result for linear basis functions on triangles):

\[ \mathbf{C} = \frac{\text{Area}}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} = \frac{0.5}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} \]

Step 5: Computing the Stiffness Matrix G

Why are we computing gradient products?

Remember from the Galerkin derivation, the matrix entries \(G_{ik}\) come from the Laplacian term \(-\Delta\) in the SPDE. After integration by parts:

\[ -\int \psi_i \cdot \Delta\psi_k \, d\mathbf{s} = +\int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s} = G_{ik} \]

The gradient integrals appear naturally - we're not computing them arbitrarily!

The stiffness matrix measures how the slopes (gradients) of the tent functions interact.

The key idea: \(G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}\). Inside a triangle, each tent function has a constant slope (it's a flat ramp). So the integral simplifies to: \(G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area}\)

Understanding the Gradient ∇ψ

Each tent function slopes downhill from its peak (vertex) toward the other vertices. The gradient points in the direction of steepest descent.

v₁ ∇ψ₁

∇ψ₁: slopes away from v₁

v₂ ∇ψ₂

∇ψ₂: slopes away from v₂

The dot product ∇ψ_i · ∇ψ_j:
  • Same direction (or same tent): dot product is positive → G_ii > 0
  • Opposite directions: dot product is negative → G_ij < 0 for neighbors

For our right triangle:

\[ \mathbf{G} = \frac{1}{2} \begin{pmatrix} 2 & -1 & -1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix} \]
EntryWhat it measuresValueWhy?
\(G_{11}\) \(\int \nabla\psi_1 \cdot \nabla\psi_1\) Positive (1) ∇ψ₁ dotted with itself is always positive (magnitude squared)
\(G_{12}\) \(\int \nabla\psi_1 \cdot \nabla\psi_2\) Negative (-0.5) ∇ψ₁ and ∇ψ₂ point in opposite directions (tent 1 slopes away from v₁, tent 2 slopes away from v₂)
\(G_{23}\) \(\int \nabla\psi_2 \cdot \nabla\psi_3\) Zero (0) For this specific triangle, ∇ψ₂ and ∇ψ₃ happen to be perpendicular (dot product = 0)
Key pattern: In G, diagonal entries are always positive (self-interaction). Off-diagonal entries are negative or zero (neighboring tents slope in different directions). This is the opposite pattern from C, where all entries are positive!

How to compute ∇ψ for a triangle

For a triangle with vertices at \((x_1, y_1), (x_2, y_2), (x_3, y_3)\), the gradient of each basis function is:

\[ \nabla\psi_i = \frac{1}{2 \cdot \text{Area}} \begin{pmatrix} y_j - y_k \\ x_k - x_j \end{pmatrix} \]

where \((i, j, k)\) is a cyclic permutation of \((1, 2, 3)\).

For our right triangle with vertices \(v_1=(0,0), v_2=(1,0), v_3=(0,1)\):

Area = 0.5, so \(2 \cdot \text{Area} = 1\)

∇ψ₁ (i=1, j=2, k=3)

\(\nabla\psi_1 = \begin{pmatrix} y_2 - y_3 \\ x_3 - x_2 \end{pmatrix} = \begin{pmatrix} 0-1 \\ 0-1 \end{pmatrix} = \begin{pmatrix} -1 \\ -1 \end{pmatrix}\)

∇ψ₂ (i=2, j=3, k=1)

\(\nabla\psi_2 = \begin{pmatrix} y_3 - y_1 \\ x_1 - x_3 \end{pmatrix} = \begin{pmatrix} 1-0 \\ 0-0 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\)

∇ψ₃ (i=3, j=1, k=2)

\(\nabla\psi_3 = \begin{pmatrix} y_1 - y_2 \\ x_2 - x_1 \end{pmatrix} = \begin{pmatrix} 0-0 \\ 1-0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\)

Let's compute G step by step

Formula: \(G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area}\)

Diagonal entries (same basis function):

\(G_{11} = \nabla\psi_1 \cdot \nabla\psi_1 \times \text{Area} = ((-1)(-1) + (-1)(-1)) \times 0.5 = 2 \times 0.5 = \mathbf{1.0}\)

\(G_{22} = \nabla\psi_2 \cdot \nabla\psi_2 \times \text{Area} = (1 \cdot 1 + 0 \cdot 0) \times 0.5 = 1 \times 0.5 = \mathbf{0.5}\)

\(G_{33} = \nabla\psi_3 \cdot \nabla\psi_3 \times \text{Area} = (0 \cdot 0 + 1 \cdot 1) \times 0.5 = 1 \times 0.5 = \mathbf{0.5}\)

Off-diagonal entries (different basis functions):

\(G_{12} = \nabla\psi_1 \cdot \nabla\psi_2 \times \text{Area} = ((-1) \cdot 1 + (-1) \cdot 0) \times 0.5 = -1 \times 0.5 = \mathbf{-0.5}\)

\(G_{13} = \nabla\psi_1 \cdot \nabla\psi_3 \times \text{Area} = ((-1) \cdot 0 + (-1) \cdot 1) \times 0.5 = -1 \times 0.5 = \mathbf{-0.5}\)

\(G_{23} = \nabla\psi_2 \cdot \nabla\psi_3 \times \text{Area} = (1 \cdot 0 + 0 \cdot 1) \times 0.5 = 0 \times 0.5 = \mathbf{0}\)

Assembling G: The matrix is symmetric (\(G_{ik} = G_{ki}\)), so:

\[ \mathbf{G} = \begin{pmatrix} G_{11} & G_{12} & G_{13} \\ G_{21} & G_{22} & G_{23} \\ G_{31} & G_{32} & G_{33} \end{pmatrix} = \begin{pmatrix} 1.0 & -0.5 & -0.5 \\ -0.5 & 0.5 & 0 \\ -0.5 & 0 & 0.5 \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 2 & -1 & -1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix} \]

The Rule for Stiffness Matrix G

Diagonal (i = k): Always positive

\[ G_{ii} = |\nabla\psi_i|^2 \times \text{Area} > 0 \]

The squared magnitude of a gradient is always positive.

Off-diagonal (i ≠ k): Usually negative

\[ G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area} \leq 0 \]

Gradients of adjacent tent functions typically point in opposite directions.

Why the pattern? Tent function ψ_i rises toward vertex i and falls away from it. Meanwhile, ψ_k rises toward vertex k. If i and k are different vertices, their gradients point in different directions, giving a negative (or zero) dot product.

Sanity check: Row sums of G should be zero

This is a fundamental property of the stiffness matrix (since constant functions have zero Laplacian):

Row 1: \(1.0 + (-0.5) + (-0.5) = 0\) ✓
Row 2: \((-0.5) + 0.5 + 0 = 0\) ✓
Row 3: \((-0.5) + 0 + 0.5 = 0\) ✓

Step 6: Building the Precision Matrix Q

Where does Q = κ²C + G come from?

Remember from the Galerkin derivation (Step 4), each matrix entry of Q combines C and G:

\[ Q_{ik} = \kappa^2 C_{ik} + G_{ik} \]

This is exactly what emerged from multiplying by basis functions and integrating the SPDE!

For \(\alpha = 1\) (simplest case), with scale parameter \(\kappa\):

\[ \mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G} \]

What does κ control?

κ²C term (Mass matrix)

Controls the amplitude/variance of the field. Larger κ² → more precision → smaller variance.

G term (Stiffness matrix)

Controls smoothness/correlation. Penalizes differences between neighboring vertices.

The balance: Large κ makes the κ²C term dominate → field varies more freely at each vertex. Small κ makes G dominate → field is smoother (neighboring values more similar).

Let's compute Q step by step for κ = 2:

Computing Q for κ = 2

Step 1: Compute κ²C = 4 × C

\[ \kappa^2 \mathbf{C} = 4 \times \frac{0.5}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} = \frac{2}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} = \frac{1}{6} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} \]

Step 2: Write out the numerical values

\(\kappa^2 \mathbf{C} = \begin{pmatrix} 2/6 & 1/6 & 1/6 \\ 1/6 & 2/6 & 1/6 \\ 1/6 & 1/6 & 2/6 \end{pmatrix} = \begin{pmatrix} 0.333 & 0.167 & 0.167 \\ 0.167 & 0.333 & 0.167 \\ 0.167 & 0.167 & 0.333 \end{pmatrix}\)

\(\mathbf{G} = \begin{pmatrix} 1.0 & -0.5 & -0.5 \\ -0.5 & 0.5 & 0 \\ -0.5 & 0 & 0.5 \end{pmatrix}\)

Step 3: Add them: Q = κ²C + G

\(Q_{11} = 0.333 + 1.0 = \mathbf{1.333}\)
\(Q_{12} = 0.167 + (-0.5) = \mathbf{-0.333}\)
\(Q_{13} = 0.167 + (-0.5) = \mathbf{-0.333}\)
\(Q_{22} = 0.333 + 0.5 = \mathbf{0.833}\)
\(Q_{23} = 0.167 + 0 = \mathbf{0.167}\)
\(Q_{33} = 0.333 + 0.5 = \mathbf{0.833}\)

The Final Precision Matrix Q (for κ = 2)

\[ \mathbf{Q} = \begin{pmatrix} 1.333 & -0.333 & -0.333 \\ -0.333 & 0.833 & 0.167 \\ -0.333 & 0.167 & 0.833 \end{pmatrix} \]

Notice: The diagonal entries are positive (precision), and off-diagonal entries can be positive or negative depending on the balance of C and G contributions.

Sanity check: Q should be positive definite

For Q to define a valid Gaussian distribution, it must be symmetric positive definite. You can verify this by checking that all eigenvalues are positive, or that the determinant and all leading principal minors are positive.

Step 7: What Q Tells Us

The precision matrix \(\mathbf{Q}\) is the inverse of the covariance matrix for the weights \(\mathbf{w} = (w_1, w_2, w_3)\):

\[ \mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1}) \]

Reading the Precision Matrix

Diagonal entries Q_ii

Precision (inverse marginal variance) of weight \(w_i\).

From our Q: \(Q_{11} = 1.333\) means vertex 1 has precision 1.333

Off-diagonal entries Q_ij

Conditional dependence between \(w_i\) and \(w_j\).

\(Q_{12} = -0.333\) means vertices 1 and 2 are conditionally dependent (neighbors)

The Markov Property (Conditional Independence)

Key insight: If \(Q_{ij} = 0\), then \(w_i\) and \(w_j\) are conditionally independent given all other weights. In our example, \(Q_{23} = 0.167 \neq 0\) because vertices 2 and 3 share an edge.

In a large mesh, most vertex pairs are not neighbors → most \(Q_{ij} = 0\) → Q is sparse!

From Precision to Covariance

The covariance matrix is \(\Sigma = Q^{-1}\). For our 3×3 example:

\[ \mathbf{\Sigma} = \mathbf{Q}^{-1} = \begin{pmatrix} 1.333 & -0.333 & -0.333 \\ -0.333 & 0.833 & 0.167 \\ -0.333 & 0.167 & 0.833 \end{pmatrix}^{-1} \approx \begin{pmatrix} 1.5 & 1.0 & 1.0 \\ 1.0 & 2.0 & 1.0 \\ 1.0 & 1.0 & 2.0 \end{pmatrix} \]

Notice: The covariance matrix is dense (all entries non-zero) even though Q is sparse! This is why we work with Q directly in INLA - it's much more efficient.

Connection to Matérn Parameters

The SPDE parameter κ relates to the practical correlation range ρ:

\[ \rho = \frac{\sqrt{8\nu}}{\kappa} \quad \text{where } \nu = \alpha - d/2 \]

For α = 2 in 2D (ν = 1), this simplifies to: \(\rho \approx \frac{2.83}{\kappa}\)

Large κ (e.g., κ = 10)

Small range ρ ≈ 0.28
Field varies rapidly, weak spatial correlation

Small κ (e.g., κ = 0.5)

Large range ρ ≈ 5.66
Field is smooth, strong spatial correlation

The key insight: In a larger mesh, \(\mathbf{Q}\) is sparse because each vertex only connects to a few neighbors. A mesh with 1000 vertices might have \(\mathbf{Q}\) with only ~6000 non-zero entries instead of 1,000,000. This sparsity makes computation fast.

Summary of the Full Chain

StepWhat We DoResult
1 Create mesh Vertices + triangles
2 Define basis functions Tent function \(\psi_k\) at each vertex
3 Approximate \(u(\mathbf{s})\) \(u(\mathbf{s}) \approx \sum_k w_k \psi_k(\mathbf{s})\)
4 Plug into SPDE + integrate FEM matrices C, G
5 Combine with \(\kappa\) Precision matrix \(\mathbf{Q}\)
6 Use in INLA \(\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})\) as spatial random effect

Deriving the Random Walk 1 (RW1) Precision Matrix

The same FEM/Galerkin approach works in 1D! Let's derive the Random Walk of order 1 precision matrix using the exact same steps we used for the 2D Matérn field.

Why a Different Differential Equation?

The 2D Matérn SPDE was: \((\kappa^2 - \Delta)u = \mathcal{W}\)

For a Random Walk, we want a process where increments (first differences) are independent. This corresponds to setting κ = 0 and working in 1D:

\[ -\Delta u = -\frac{d^2 u}{dx^2} = \mathcal{W}(x) \]

Physical interpretation: The second derivative of position equals white noise → velocity is Brownian motion → position is integrated Brownian motion (random walk).

ModelSPDEDimensionResult
Matérn (2D) \((\kappa^2 - \Delta)u = \mathcal{W}\) 2D spatial Q = κ²C + G
RW1 \(-\frac{d^2 u}{dx^2} = \mathcal{W}\) 1D (time/space) Q = G only

Size and Rank of RW1

For a random walk with n points:

  • Precision matrix Q is n × n
  • There are n - 1 increments (differences between consecutive points)
  • Q has rank n - 1 (rank deficient by 1)
  • This is an intrinsic model - no absolute level, only relative differences matter

In practice: We add a sum-to-zero constraint or use it as a random effect (which handles the rank deficiency).

The 1D Differential Equation

The RW1 Stochastic Differential Equation

\[ -\frac{d^2 u}{dx^2} = \mathcal{W}(x) \]

This is the 1D Laplacian (second derivative) driven by white noise. Unlike the Matérn SPDE, there is no κ²u term, so there will be no mass matrix C contribution.

Step 1: The 1D Mesh

Consider n = 4 equally-spaced points on a line with spacing h = 1:

x₁ x₂ x₃ x₄ h=1 h=1 h=1

Step 2: The 1D Basis Functions (Hat Functions)

Each point gets a "hat function" - the 1D equivalent of tent functions:

x₁ x₂ x₃ x₄ ψ₂ = 1 0 0

Hat function ψ₂: equals 1 at x₂, linearly decreases to 0 at neighbors x₁ and x₃

Step 3: The Galerkin Derivation (Full Detail)

Deriving the Matrix Equations (Galerkin Method for RW1)

Step 1: Start with the 1D differential equation:

\[ -\frac{d^2 u}{dx^2} = \mathcal{W}(x) \]

Step 2: Substitute our approximation \(u(x) \approx w_1\psi_1 + w_2\psi_2 + w_3\psi_3 + w_4\psi_4\):

\[ -\frac{d^2}{dx^2}(w_1\psi_1 + w_2\psi_2 + w_3\psi_3 + w_4\psi_4) = \mathcal{W}(x) \]

Step 3: Multiply both sides by \(\psi_1\) and integrate over the domain:

\[ -\int \psi_1 \cdot \frac{d^2}{dx^2}(w_1\psi_1 + w_2\psi_2 + \cdots) \, dx = \int \psi_1 \cdot \mathcal{W} \, dx \]

Step 4: Apply integration by parts: \(-\int \psi_i \frac{d^2\psi_k}{dx^2} dx = +\int \frac{d\psi_i}{dx} \frac{d\psi_k}{dx} dx\)

\[ w_1 \int\frac{d\psi_1}{dx}\frac{d\psi_1}{dx}dx + w_2 \int\frac{d\psi_1}{dx}\frac{d\psi_2}{dx}dx + w_3 \int\frac{d\psi_1}{dx}\frac{d\psi_3}{dx}dx + w_4 \int\frac{d\psi_1}{dx}\frac{d\psi_4}{dx}dx = \epsilon_1 \]

Step 5: Define the stiffness matrix entries:

\[ G_{ik} = \int \frac{d\psi_i}{dx} \cdot \frac{d\psi_k}{dx} \, dx \]

Step 6: The equation becomes:

\[ w_1 G_{11} + w_2 G_{12} + w_3 G_{13} + w_4 G_{14} = \epsilon_1 \]

The Complete System: 4 Equations for 4 Unknowns

We repeat this process for \(\psi_2\), \(\psi_3\), and \(\psi_4\) to get 4 equations:

Eq 1 (×ψ₁): \(w_1 G_{11} + w_2 G_{12} + w_3 G_{13} + w_4 G_{14} = \epsilon_1\)

Eq 2 (×ψ₂): \(w_1 G_{21} + w_2 G_{22} + w_3 G_{23} + w_4 G_{24} = \epsilon_2\)

Eq 3 (×ψ₃): \(w_1 G_{31} + w_2 G_{32} + w_3 G_{33} + w_4 G_{34} = \epsilon_3\)

Eq 4 (×ψ₄): \(w_1 G_{41} + w_2 G_{42} + w_3 G_{43} + w_4 G_{44} = \epsilon_4\)

In matrix form: \(\mathbf{G}\mathbf{w} = \boldsymbol{\epsilon}\), where \(\mathbf{Q}_{\text{RW1}} = \mathbf{G}\)

Key Difference from Matérn SPDE

In the Matérn case, we had \(\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}\) (both mass and stiffness matrices). For RW1, there's no κ²u term in the differential equation, so no mass matrix C appears. The precision matrix is only the stiffness matrix: \(\mathbf{Q}_{\text{RW1}} = \mathbf{G}\).

Step 4: Computing the Stiffness Matrix G

Gradients of 1D hat functions

For a hat function at xᵢ with spacing h:

Left segment [xᵢ₋₁, xᵢ]

Slope = \(\frac{d\psi_i}{dx} = +\frac{1}{h}\)

Right segment [xᵢ, xᵢ₊₁]

Slope = \(\frac{d\psi_i}{dx} = -\frac{1}{h}\)

Let's compute G step by step (h = 1)

Diagonal entries G_ii (interior point, e.g., i = 2):

\(G_{22} = \int_{x_1}^{x_2} \left(\frac{1}{h}\right)^2 dx + \int_{x_2}^{x_3} \left(-\frac{1}{h}\right)^2 dx\)

\(G_{22} = \frac{1}{h^2} \cdot h + \frac{1}{h^2} \cdot h = \frac{1}{h} + \frac{1}{h} = \frac{2}{h}\)

With h = 1: \(G_{22} = \mathbf{2}\)

Diagonal entries G_ii (boundary point, e.g., i = 1):

\(G_{11} = \int_{x_1}^{x_2} \left(-\frac{1}{h}\right)^2 dx = \frac{1}{h^2} \cdot h = \frac{1}{h}\)

With h = 1: \(G_{11} = \mathbf{1}\) (only one segment!)

Off-diagonal entries G_ik (neighbors, e.g., i = 1, k = 2):

\(G_{12} = \int_{x_1}^{x_2} \left(-\frac{1}{h}\right) \cdot \left(+\frac{1}{h}\right) dx = -\frac{1}{h^2} \cdot h = -\frac{1}{h}\)

With h = 1: \(G_{12} = \mathbf{-1}\)

Off-diagonal entries G_ik (non-neighbors, e.g., i = 1, k = 3):

\(G_{13} = 0\) (ψ₁ and ψ₃ don't overlap!)

The RW1 Precision Matrix (4 points)

\[ \mathbf{Q}_{\text{RW1}} = \mathbf{G} = \begin{pmatrix} 1 & -1 & 0 & 0 \\ -1 & 2 & -1 & 0 \\ 0 & -1 & 2 & -1 \\ 0 & 0 & -1 & 1 \end{pmatrix} \]

This is exactly the standard RW1 structure matrix!

The Rule for RW1 Precision Matrix

Interior points

Q_ii = 2
(two neighbors)

Boundary points

Q_ii = 1
(one neighbor)

Neighbors

Q_ij = -1
(adjacent points)

Sanity check: Row sums should be zero

Row 1: \(1 + (-1) + 0 + 0 = 0\) ✓
Row 2: \((-1) + 2 + (-1) + 0 = 0\) ✓
Row 3: \(0 + (-1) + 2 + (-1) = 0\) ✓
Row 4: \(0 + 0 + (-1) + 1 = 0\) ✓

Row sums = 0 means Q is rank-deficient (as expected for RW1 - there's no "anchor").

Why is this a "Random Walk"?

The quadratic form \(\mathbf{x}^T \mathbf{Q} \mathbf{x}\) gives:

\[ \mathbf{x}^T \mathbf{Q}_{\text{RW1}} \mathbf{x} = (x_1 - x_2)^2 + (x_2 - x_3)^2 + (x_3 - x_4)^2 = \sum_{i=1}^{n-1} (x_i - x_{i+1})^2 \]

This penalizes first differences (increments) - exactly what a random walk does! The increments \(\Delta x_i = x_{i+1} - x_i\) are independent with equal variance.

The big picture: RW1, RW2, AR1, and SPDE models are all unified under the same FEM framework. The choice of differential operator (first derivative, second derivative, Laplacian, etc.) determines which precision matrix structure you get.
ModelDifferential OperatorPrecision Matrix Structure
RW1 \(-\frac{d^2}{dx^2}\) (1D Laplacian) Tridiagonal: (1, -1) at boundaries, (2, -1, -1) interior
RW2 \(\frac{d^4}{dx^4}\) (biharmonic) Pentadiagonal: penalizes second differences
Matérn 2D \((\kappa^2 - \Delta)^{\alpha/2}\) Sparse graph Laplacian structure

Deriving the Random Walk 2 (RW2) Precision Matrix

Following the same approach, let's derive the Random Walk of order 2 precision matrix. RW2 produces smoother curves than RW1 by penalizing changes in slope rather than changes in level.

Why a Fourth Derivative?

RW1 used the second derivative (Laplacian), which penalizes curvature and gives independent first differences.

RW2 uses the fourth derivative (biharmonic), which penalizes changes in curvature and gives independent second differences:

\[ \frac{d^4 u}{dx^4} = \mathcal{W}(x) \]

Physical interpretation: RW1 assumes velocity is white noise (position is rough). RW2 assumes acceleration is white noise (velocity is smooth, position is even smoother).

PropertyRW1RW2
SPDE \(-\frac{d^2 u}{dx^2} = \mathcal{W}\) \(\frac{d^4 u}{dx^4} = \mathcal{W}\)
Penalizes First differences (level changes) Second differences (slope changes)
Smoothness Continuous but rough Continuous and smooth
Matrix structure Tridiagonal (3 diagonals) Pentadiagonal (5 diagonals)
Rank deficiency 1 (needs 1 constraint) 2 (needs 2 constraints)

Size and Rank of RW2

For a random walk of order 2 with n points:

  • Precision matrix Q is n × n
  • There are n - 2 second differences
  • Q has rank n - 2 (rank deficient by 2)
  • This is doubly intrinsic - no absolute level AND no absolute slope

In practice: Need sum-to-zero constraint AND linear-trend-to-zero constraint, or use as random effect.

The Galerkin Approach for RW2

Starting from the SPDE

\[ \frac{d^4 u}{dx^4} = \mathcal{W}(x) \]

Multiply by test function ψᵢ and integrate:

\[ \int \psi_i \cdot \frac{d^4 u}{dx^4} \, dx = \int \psi_i \cdot \mathcal{W} \, dx \]

Apply integration by parts twice:

\[ \int \frac{d^2\psi_i}{dx^2} \cdot \frac{d^2 u}{dx^2} \, dx = \epsilon_i \]

The Challenge with Piecewise Linear Basis Functions

For hat functions (piecewise linear), the second derivative is zero almost everywhere (linear functions have no curvature), except at the kinks where it has delta-function behavior.

Solution: We can construct the RW2 precision matrix directly as \(\mathbf{Q} = \mathbf{D}^T\mathbf{D}\), where \(\mathbf{D}\) is the second-difference matrix.

Step 1: The Second-Difference Matrix D

The second difference at point i is: \(\Delta^2 x_i = x_{i-1} - 2x_i + x_{i+1}\)

For n = 5 points, we have n - 2 = 3 second differences:

\[ \mathbf{D} = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 \\ 0 & 1 & -2 & 1 & 0 \\ 0 & 0 & 1 & -2 & 1 \end{pmatrix} \quad \text{(3 × 5 matrix)} \]

Each row computes: (left neighbor) - 2×(center) + (right neighbor)

Step 2: Computing Q = DTD

Why Q = DTD?

The quadratic form should penalize squared second differences:

\[ \mathbf{x}^T \mathbf{Q} \mathbf{x} = \sum_{i=2}^{n-1} (x_{i-1} - 2x_i + x_{i+1})^2 = \|\mathbf{D}\mathbf{x}\|^2 = \mathbf{x}^T \mathbf{D}^T\mathbf{D} \mathbf{x} \]

Therefore: \(\mathbf{Q}_{\text{RW2}} = \mathbf{D}^T\mathbf{D}\)

Step 3: Computing Q Step by Step (n = 5)

Matrix multiplication DTD

DT (5×3):

\[ \mathbf{D}^T = \begin{pmatrix} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 1 & -2 & 1 \\ 0 & 1 & -2 \\ 0 & 0 & 1 \end{pmatrix} \]

Q = DTD (5×5):

\[ \mathbf{Q}_{\text{RW2}} = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 \\ -2 & 5 & -4 & 1 & 0 \\ 1 & -4 & 6 & -4 & 1 \\ 0 & 1 & -4 & 5 & -2 \\ 0 & 0 & 1 & -2 & 1 \end{pmatrix} \]

The RW2 Precision Matrix (5 points)

\[ \mathbf{Q}_{\text{RW2}} = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 \\ -2 & 5 & -4 & 1 & 0 \\ 1 & -4 & 6 & -4 & 1 \\ 0 & 1 & -4 & 5 & -2 \\ 0 & 0 & 1 & -2 & 1 \end{pmatrix} \]

This is the standard RW2 structure matrix! Note the pentadiagonal structure.

The Rule for RW2 Precision Matrix

Boundary (i=1,n)

Q_ii = 1

Near-boundary (i=2,n-1)

Q_ii = 5

Interior (i=3,...,n-2)

Q_ii = 6

1st neighbors

Q_{i,i±1} = -4 or -2

2nd neighbors

Q_{i,i±2} = 1

Sanity checks

Row sums = 0:
Row 1: \(1 + (-2) + 1 = 0\) ✓
Row 2: \((-2) + 5 + (-4) + 1 = 0\) ✓
Row 3: \(1 + (-4) + 6 + (-4) + 1 = 0\) ✓

Rank = n - 2: rank(Q) = 5 - 2 = 3 ✓

Row sums = 0 and rank deficiency = 2 confirm this is RW2.

Why is this "Random Walk 2"?

The quadratic form \(\mathbf{x}^T \mathbf{Q} \mathbf{x}\) gives:

\[ \mathbf{x}^T \mathbf{Q}_{\text{RW2}} \mathbf{x} = \sum_{i=2}^{n-1} (x_{i-1} - 2x_i + x_{i+1})^2 = \sum_{i} (\Delta^2 x_i)^2 \]

This penalizes second differences - deviations from a straight line!

RW1

\(\Delta x_i = x_{i+1} - x_i\)
Independent increments

RW2

\(\Delta^2 x_i = x_{i-1} - 2x_i + x_{i+1}\)
Independent second increments

Visual Intuition: RW1 vs RW2

RW1: Penalizes level changes

Jagged, can change direction sharply

RW2: Penalizes slope changes

Smooth, gradual changes in direction

When to use RW2 over RW1? Use RW2 when you expect the underlying trend to be smooth (e.g., temperature over time, age effects). Use RW1 when you expect more abrupt changes (e.g., step functions, threshold effects).
PropertyRW1RW2
Precision matrix Tridiagonal Pentadiagonal
Non-zero diagonals 3 (main, ±1) 5 (main, ±1, ±2)
Rank deficiency 1 2
Null space Constant (1, 1, 1, ...) Constant + Linear (1, 2, 3, ...)
Typical diagonal (interior) 2 6

From u(s) to Weights: What Are We Estimating?

The Continuous Field u(s)

The SPDE defines a continuous spatial field \(u(s)\) at every location \(s\) in the domain. But we can't estimate infinitely many values! We need a finite representation.

The Basis Function Approximation

We approximate \(u(s)\) using basis functions centered at mesh vertices:

\[ u(s) \approx \sum_{k=1}^{n} w_k \, \psi_k(s) \]

Where:

  • \(\psi_k(s)\): Basis function for vertex \(k\) (piecewise linear "tent" function)
  • \(w_k\): Weight at vertex \(k\) (what INLA estimates)
  • \(n\): Number of mesh vertices
The Target: INLA estimates the weights \(\mathbf{w} = (w_1, w_2, \ldots, w_n)^T\). These weights fully determine the spatial field everywhere via the basis function expansion.

Connecting the Notation

The weights \(\mathbf{w}\) are the same as \(\mathbf{u}_{\text{mesh}}\):

\[ \mathbf{w} = \mathbf{u}_{\text{mesh}} = \text{(values at mesh vertices)} \]

Once we have the weights, we can compute \(u(s)\) at any location by evaluating the basis functions.

The Projector Matrix A

To get \(u(s)\) at observation locations, we evaluate the basis function expansion. The projector matrix \(\mathbf{A}\) does this efficiently:

\[ u(s_i) = \sum_{k=1}^{n} w_k \, \psi_k(s_i) = \sum_{k=1}^{n} A_{ik} \, w_k \]

In matrix form:

\[ \mathbf{u}_{\text{obs}} = \mathbf{A} \, \mathbf{w} \]

Where \(A_{ik} = \psi_k(s_i)\) is the value of basis function \(k\) at observation location \(s_i\).

Structure of A

Each row of \(\mathbf{A}\) has at most 3 non-zero values because each point lies in one triangle with 3 vertices. These values are the barycentric coordinates of the point within its triangle, and each row sums to 1.

SymbolDimensionMeaning
\(\mathbf{w}\) or \(\mathbf{u}_{\text{mesh}}\) \(n \times 1\) Weights at mesh vertices (what INLA estimates)
\(\mathbf{A}\) \(m \times n\) Projector matrix (basis functions at obs locations)
\(\mathbf{u}_{\text{obs}}\) \(m \times 1\) \(u(s)\) at observation locations
# Build projector matrix in pyINLA A = mesh.make_projector(coords) # A is a sparse matrix: m (observations) × n (vertices) # Get u(s) at observation locations from estimated weights u_obs = A @ w # w = result.summary_random['spatial']['mean'] # For prediction at new locations: A_pred = mesh.make_projector(pred_coords) u_pred = A_pred @ w # u(s) at prediction locations

The Complete Model

Putting it all together, a spatial model with SPDE has three levels:

General Hierarchical Model

\[ \begin{align} g(\mu_i) &= \eta_i = \beta_0 + [\mathbf{A}\mathbf{w}]_i & \text{(Link function)}\\[0.3em] \mathbf{w} &\sim N(\mathbf{0}, \mathbf{Q}^{-1}) & \text{(Latent field prior)} \end{align} \]

Where \(g(\cdot)\) is the link function connecting the mean \(\mu_i\) to the linear predictor \(\eta_i\):

FamilyLink \(g(\mu)\)Example
Gaussianidentity: \(\mu\)Continuous measurements
Poissonlog: \(\log(\mu)\)Counts, disease mapping
Binomiallogit: \(\log(\mu/(1-\mu))\)Presence/absence, proportions
Gammalog: \(\log(\mu)\)Positive continuous (e.g., rainfall)

Where Does Q Come From?

The precision matrix \(\mathbf{Q}\) comes from the SPDE discretization. For the Matérn field with α = 2:

\[ \mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G} \]

Where \(\mathbf{C}\) (mass matrix) and \(\mathbf{G}\) (stiffness matrix) are computed from integrating the basis functions over the mesh triangles. The matrix \(\mathbf{Q}\) encodes the spatial correlation structure and defines the prior on the weights \(\mathbf{w}\).

Notation Summary

  • \(\mathbf{y}\): Observed response vector (\(m \times 1\))
  • \(\eta_i\): Linear predictor at observation \(i\)
  • \(g(\cdot)\): Link function (identity, log, logit, etc.)
  • \(\beta_0\): Intercept (and other fixed effects)
  • \(\mathbf{A}\): Projector matrix (\(m \times n\)), evaluates basis functions at observation locations
  • \(\mathbf{w}\): Weights at mesh vertices (\(n \times 1\)), what INLA estimates
  • \(\mathbf{A}\mathbf{w}\): The spatial field \(u(s)\) evaluated at observation locations
  • \(\mathbf{Q}\): Precision matrix from SPDE (\(n \times n\)), prior precision of weights \(\mathbf{w}\)
Summary: INLA estimates the weights \(\mathbf{w}\), which have prior precision \(\mathbf{Q}\) from the SPDE. The spatial effect at your data locations is \(\mathbf{A}\mathbf{w}\), which enters the linear predictor \(\eta\). The link function \(g\) connects \(\eta\) to the mean of your chosen likelihood family.

Next Steps

Now that you understand the theory, see how to fit SPDE models in practice:

References

  • Lindgren, F., Rue, H., & Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. JRSS-B, 73(4), 423-498.
  • Fuglstad, G.-A., Simpson, D., Lindgren, F., & Rue, H. (2019). Constructing priors that penalize the complexity of Gaussian random fields. JASA, 114(525), 445-452.
  • Krainski, E., et al. (2018). Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Chapman & Hall/CRC.