Understanding SPDEs

The Big Idea

The SPDE approach lets you model continuous spatial processes efficiently by connecting the Matérn covariance to a sparse precision matrix.

Traditional approach: Dense covariance matrix \(\Sigma\) → \(O(n^3)\) computation
SPDE approach: Sparse precision matrix \(\mathbf{Q}\) → \(O(n^{3/2})\) computation

This is possible because a Gaussian field with Matérn covariance is the solution to a specific SPDE, and the Finite Element Method (FEM) gives us a GMRF approximation with a sparse precision matrix.

The Matérn Covariance

The Matérn covariance is the most widely used covariance function in spatial statistics. It describes how the similarity between observations decreases with distance.

Correlation Function

For two locations \(\mathbf{s}_i\) and \(\mathbf{s}_j\), the Matérn correlation is:

\text{Cor}(U(\mathbf{s}_i), U(\mathbf{s}_j)) = \frac{2^{1-\nu}}{\Gamma(\nu)} (\kappa \| \mathbf{s}_i - \mathbf{s}_j \|)^\nu K_\nu(\kappa \| \mathbf{s}_i - \mathbf{s}_j \|)

Covariance Function

The covariance is the correlation scaled by the marginal variance \(\sigma^2\):

\text{Cov}(U(\mathbf{s}_i), U(\mathbf{s}_j)) = \sigma^2 \cdot \text{Cor}(U(\mathbf{s}_i), U(\mathbf{s}_j))

The Parameters

Parameter	Symbol	Interpretation
Marginal variance	\(\sigma^2\)	The variance of the field at any single location. Controls the "amplitude" of spatial variation.
Scale	\(\kappa > 0\)	Controls how quickly correlation decays with distance. Larger \(\kappa\) means faster decay (shorter range).
Smoothness	\(\nu > 0\)	Controls the differentiability of the field. The field is \(\lceil \nu \rceil - 1\) times differentiable. Larger \(\nu\) gives smoother surfaces.

When distance is zero, the Bessel function term equals 1, so \(\text{Cor} = 1\) and \(\text{Cov} = \sigma^2\).

Understanding Smoothness (\(\nu\))

The smoothness parameter has a direct physical interpretation:

\(\nu = 0.5\): Exponential covariance. Rough, non-differentiable surfaces (like Brownian motion).
\(\nu = 1\): Once differentiable. Moderately smooth surfaces. Default in INLA.
\(\nu = 1.5\): 1.5 times differentiable. Smoother surfaces.
\(\nu = 2\): Twice differentiable. Very smooth surfaces.
\(\nu \to \infty\): Gaussian (squared exponential) covariance. Infinitely smooth.

In practice: \(\nu\) is often fixed (not estimated) because it is poorly identified from data. The default \(\nu = 1\) works well for most environmental and spatial applications.

Practical Range

The scale parameter \(\kappa\) is hard to interpret directly. Instead, we use the practical range, which is the distance at which correlation drops to approximately 0.14 (i.e., \(e^{-1} \approx 0.13\)):

\rho = \frac{\sqrt{8\nu}}{\kappa}

Interpretation: If the practical range is 50 km, then two locations 50 km apart have correlation ~0.14. Beyond this distance, observations are nearly uncorrelated.

Marginal Standard Deviation (\(\sigma\))

The marginal standard deviation \(\sigma\) controls how much the spatial field varies. For the SPDE parameterization:

\sigma^2 = \frac{\Gamma(\nu)}{\Gamma(\nu + d/2)(4\pi)^{d/2}\kappa^{2\nu}\tau^2}

where \(\tau\) is an internal precision parameter. In 2D with \(\nu=1\), this simplifies to \(\sigma = 1/(\tau \sqrt{4\pi} \kappa)\).

Interpretation: If \(\sigma = 2\), then at any location, the spatial effect has standard deviation 2 (in the units of your response variable).

The SPDE-Matérn Link

Why This Matters

Computing Matérn covariance directly requires building and inverting large dense matrices, which becomes computationally infeasible for thousands of locations. The SPDE approach provides a shortcut: instead of working with the covariance matrix, we work with a sparse precision matrix that is much faster to compute.

The Key Insight (No Differential Equations Required)

Think of a Matérn random field as a smooth, spatially correlated surface. Lindgren et al. (2011) discovered that this surface can be created through a simple recipe:

Start with random noise: Imagine sprinkling random values across your spatial domain, each location getting an independent random value.
Smooth the noise: Apply a smoothing operation that blends nearby values together. The amount of smoothing determines how correlated nearby locations become.
The result: After smoothing, you get exactly a Matérn covariance structure.

          The practical benefit: This "noise + smoothing" recipe can be computed very efficiently
          using the mesh. Instead of storing correlations between all pairs of points (which grows as n²),
          we only need to store relationships between neighboring mesh vertices (which grows as n).
        

The Mathematical Statement

For those interested, the formal result is that a Gaussian field with Matérn covariance is the solution to this stochastic partial differential equation (SPDE):

(\kappa^2 - \Delta)^{\alpha/2} u(\mathbf{s}) = \mathcal{W}(\mathbf{s})

The goal: We want to find \(u(\mathbf{s})\), the spatial random field. This is the unknown we're solving for. The SPDE tells us that \(u(\mathbf{s})\) (the result of smoothing white noise) has exactly Matérn covariance structure.

Understanding Each Part

Symbol	What It Means	Intuition
\(\mathcal{W}(\mathbf{s})\)	Spatial white noise	The random starting values at each location (completely uncorrelated)
\(\kappa^2 - \Delta\)	Smoothing operator	Blends values with neighbors. Higher \(\kappa\) means less smoothing (shorter range)
\(\alpha/2\)	Power of smoothing	How many times to apply the smoothing. More smoothing = smoother result
\(u(\mathbf{s})\)	The spatial field	The smooth, correlated surface we want (has Matérn covariance)

Connection to Matérn Parameters

The smoothing parameters connect directly to the Matérn covariance parameters:

\(\alpha = \nu + d/2\) where \(d\) is the spatial dimension (usually 2)
Higher \(\alpha\) means more smoothing passes, giving smoother fields (higher \(\nu\))
The scale parameter \(\kappa\) controls the correlation range

Default in 2D: \(\alpha = 2\) which gives \(\nu = 1\). This means one smoothing pass, producing fields that are continuous but can have sharp changes (like real environmental data).

Summary: What You Need to Remember

You don't need to understand the differential equation. Just remember:

The SPDE approach is a computational trick to get Matérn covariance efficiently
It uses the mesh to represent spatial smoothing with sparse matrices
This makes fitting models with thousands of locations fast and practical
The results are statistically equivalent to fitting Matérn covariance directly

From SPDE to FEM: The Bridge

The Problem

The SPDE defines \(u(\mathbf{s})\), the spatial field we want to find. But it's defined continuously over all of space: we'd need to know \(u(\mathbf{s})\) at infinitely many points. That's impossible to compute.

The Solution: Discretize with a Mesh

This is where the triangular mesh comes in. Instead of finding \(u(\mathbf{s})\) everywhere, we approximate it using values only at mesh vertices:

Cover the domain with triangles: The mesh divides your spatial region into triangles.
Estimate \(u\) at vertices only: We find \(u\) at each of the \(m\) mesh vertices (call these values \(w_1, w_2, \ldots, w_m\)).
Interpolate between vertices: Within each triangle, \(u(\mathbf{s})\) is linearly interpolated from the three corner values (like stretching a rubber sheet over the vertices).

          Key insight: By discretizing on a mesh, we reduce the problem from
          "find \(u(\mathbf{s})\) at infinitely many points" to "estimate \(m\) weights \(\{w_1, \ldots, w_m\}\)
          at mesh vertices." The precision matrix \(\mathbf{Q}\) tells us how these weights are correlated.
        

Why This Works

The SPDE says "smooth this noise to get \(u(\mathbf{s})\)" continuously. The FEM approximation says "smooth this noise, but only track values at mesh vertices, and interpolate in between." As the mesh gets finer (more vertices), the approximation gets closer to the true continuous solution.

Practical note: You don't need a very fine mesh. Even a relatively coarse mesh (a few hundred vertices) often gives excellent approximations because Matérn fields are smooth.

FEM Matrices and Precision

Approximating the Spatial Field with Basis Functions

Remember: our goal is to find \(u(\mathbf{s})\), the spatial field from the SPDE. FEM approximates this continuous field as a weighted sum of basis functions defined on the triangular mesh:

u(\mathbf{s}) \approx \sum_{k=1}^{m} \psi_k(\mathbf{s}) w_k

Where:

\(\psi_k(\mathbf{s})\): Basis functions (one per mesh vertex), defined by the triangles
\(w_k\): Random weights at each vertex. These are what we estimate.
\(m\): Number of mesh vertices

So instead of finding \(u(\mathbf{s})\) at every point in space, we only need to estimate the \(m\) weights \(\{w_1, w_2, \ldots, w_m\}\). The basis functions handle interpolation everywhere else.

What Are Tent Functions?

Think of the mesh as lying flat on the ground (the x-y plane). Each basis function \(\psi_k\) adds a third dimension (height) that rises above the mesh:

The key insight: The mesh lives in 2D (x, y). The basis function value is a height (z) at each point. So \(\psi_k(x, y)\) tells you "how tall is the tent at location (x, y)?"

For each vertex \(k\), the tent function \(\psi_k\) has these properties:

Height = 1 directly above vertex \(k\) (the peak of the tent)
Height = 0 at all neighboring vertices (the tent touches the ground there)
Slopes linearly from height 1 down to height 0 within the triangles touching vertex \(k\)
Height = 0 everywhere outside those triangles

Side View: How the Tent Rises Above the Mesh

The mesh is flat (on the x-axis). The tent function ψ₂ rises to height 1 at vertex v₂, and slopes down to height 0 at neighboring vertices v₁ and v₃.

3D View: Single Tent on a Triangle:

The tent function equals 1 at its vertex and decreases linearly to 0 at neighboring vertices.

All Three Basis Functions:

Each vertex has its own tent. Inside the triangle, all three overlap (sum to 1).

          What does "overlap" mean? At any point inside a triangle, all three tent
          functions have non-zero height. The values depend on where you are in the triangle:
          At the center: ψ₁ ≈ 0.33, ψ₂ ≈ 0.33, ψ₃ ≈ 0.33 (equidistant from all vertices)
Near vertex 1: ψ₁ ≈ 0.8, ψ₂ ≈ 0.1, ψ₃ ≈ 0.1 (closer to vertex 1)
On edge 1-2: ψ₃ = 0, and ψ₁ + ψ₂ = 1 (on the edge opposite to vertex 3)
Always: ψ₁ + ψ₂ + ψ₃ = 1, no matter where!

        

Why do they always sum to 1?

This is not a coincidence - it's guaranteed by construction! The basis functions are defined using barycentric coordinates.

The idea: Any point P inside a triangle can be written as a weighted average of the three vertices:

P = \lambda_1 \cdot v_1 + \lambda_2 \cdot v_2 + \lambda_3 \cdot v_3 \quad \text{where} \quad \lambda_1 + \lambda_2 + \lambda_3 = 1

The weights \(\lambda_1, \lambda_2, \lambda_3\) are called barycentric coordinates. They tell you "how much" of each vertex contributes to point P.

The key: The basis functions ARE the barycentric coordinates!

\psi_k(P) = \lambda_k

So by definition: \(\psi_1(P) + \psi_2(P) + \psi_3(P) = \lambda_1 + \lambda_2 + \lambda_3 = 1\). This property is called "partition of unity".

Check it at different locations:

Location	ψ₁	ψ₂	ψ₃	Sum
At vertex 1	1	0	0	1
At vertex 2	0	1	0	1
At vertex 3	0	0	1	1
Midpoint of edge 1-2	0.5	0.5	0	1
Center of triangle	0.33	0.33	0.33	1
Any point inside	λ₁	λ₂	λ₃	1

Weighted Sum Example:

Why different sizes? The basis functions ψ_k all have the same shape (peak at 1), but we multiply by weights w_k. So w₁·ψ₁ peaks at w₁, not at 1. Bigger weight = taller tent!

Result: \(u(\mathbf{s}) = 2 \cdot \psi_1 + 1 \cdot \psi_2 + 3 \cdot \psi_3\). At each vertex, \(u\) equals the weight. Inside the triangle, values are linearly interpolated.

When you sum all these tent functions (each multiplied by its weight \(w_k\)), you get a piecewise linear surface that passes through the weight values at each vertex.

Plugging Back into the SPDE

Here's the key step: we substitute our approximation \(u(\mathbf{s}) \approx \sum_k \psi_k(\mathbf{s}) w_k\) back into the SPDE equation. This converts the continuous differential equation into a discrete system of equations for the weights \(\{w_1, \ldots, w_m\}\).

The process (called the Galerkin method) involves:

Substitute the basis function approximation into the SPDE
Multiply by each basis function \(\psi_i\) and integrate over the domain
This produces integrals of products of basis functions

The Connection: How u(s) leads to C, G, and Q

Step 1: Start with the SPDE (simplified for α=1):

(\kappa^2 - \Delta) u(\mathbf{s}) = \mathcal{W}(\mathbf{s})

Step 2: Substitute \(u(\mathbf{s}) \approx \sum_k w_k \psi_k(\mathbf{s})\):

(\kappa^2 - \Delta) \sum_k w_k \psi_k(\mathbf{s}) = \mathcal{W}(\mathbf{s})

Step 3: Multiply by \(\psi_i\) and integrate (Galerkin method):

\sum_k w_k \left[ \kappa^2 \underbrace{\int \psi_i \psi_k \, d\mathbf{s}}_{C_{ik}} + \underbrace{\int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}}_{G_{ik}} \right] = \text{noise}_i

Step 4: In matrix form, this becomes:

(\kappa^2 \mathbf{C} + \mathbf{G}) \mathbf{w} = \text{noise} \quad \Rightarrow \quad \mathbf{Q} \mathbf{w} = \text{noise}

Result: The weights follow \(\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})\) where \(\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}\). The FEM matrices C and G come directly from integrating products of basis functions!

The FEM Matrices

From plugging the basis functions into the SPDE, we get two matrices:

Matrix	Definition	Intuition
C (Mass)	\(C_{ij} = \int \psi_i(\mathbf{s}) \psi_j(\mathbf{s}) \, d\mathbf{s}\)	How much the basis functions overlap. Mostly diagonal (each vertex has an "area of influence").
G (Stiffness)	\(G_{ij} = \int \nabla\psi_i \cdot \nabla\psi_j \, d\mathbf{s}\)	How the gradients (slopes) of neighboring basis functions interact. Encodes mesh connectivity.

Key point: Both C and G are sparse because each basis function \(\psi_k\) is only non-zero near vertex \(k\). Two basis functions only interact if their vertices share a triangle.

The Precision Matrix

The precision matrix \(\mathbf{Q}\) is built from C and G. The exact formula depends on the smoothness parameter α you choose:

Simple version: There is just ONE matrix Q. The formula changes based on how smooth you want the field. In practice, α = 2 is the default, so you almost always use the second formula.

If you choose...	Formula for Q	Field smoothness
α = 1 (rare)	\(\mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G}\)	Rough (ν = 0)
α = 2 (default)	\(\mathbf{Q} = \kappa^4 \mathbf{C} + 2\kappa^2 \mathbf{G} + \mathbf{G}\mathbf{C}^{-1}\mathbf{G}\)	Smooth (ν = 1)

Why different formulas? Higher α means more smoothing, which requires a more complex formula. Think of it like applying a blur filter: α = 1 is one pass, α = 2 is two passes (smoother result). The formula for α = 2 is essentially "apply the α = 1 formula twice."

Key insight: \(\mathbf{Q}\) is sparse because \(\mathbf{G}\) only has non-zero entries for connected mesh vertices. This sparsity enables fast computation.

Worked Example: A Single Triangle

Let's work through a complete example with the simplest possible mesh: 3 vertices forming one triangle.

Step 1: The Mesh

Consider a triangle with vertices at:

Vertex 1: \((0, 0)\)
Vertex 2: \((1, 0)\)
Vertex 3: \((0, 1)\)

This is a right triangle with area = 0.5.

Step 2: The Basis Functions

Each vertex gets a tent function. For this triangle:

\(\psi_1(\mathbf{s})\): equals 1 at vertex 1, linearly decreases to 0 at vertices 2 and 3
\(\psi_2(\mathbf{s})\): equals 1 at vertex 2, linearly decreases to 0 at vertices 1 and 3
\(\psi_3(\mathbf{s})\): equals 1 at vertex 3, linearly decreases to 0 at vertices 1 and 2

At any point inside the triangle: \(\psi_1 + \psi_2 + \psi_3 = 1\) (they form a partition of unity).

Step 3: The Spatial Field Approximation

We approximate \(u(\mathbf{s})\) as:

u(\mathbf{s}) \approx w_1 \psi_1(\mathbf{s}) + w_2 \psi_2(\mathbf{s}) + w_3 \psi_3(\mathbf{s})

Where \(w_1, w_2, w_3\) are the random weights we estimate. At vertex 1, \(u = w_1\). At vertex 2, \(u = w_2\). Inside the triangle, \(u\) is linearly interpolated.

Step 4: Computing the Mass Matrix C

Deriving the Matrix Equations (Galerkin Method)

Step 1: Start with the SPDE (for α=1):

(\kappa^2 - \Delta) u(\mathbf{s}) = \mathcal{W}(\mathbf{s})

Step 2: Substitute our approximation \(u(\mathbf{s}) \approx w_1\psi_1 + w_2\psi_2 + w_3\psi_3\):

(\kappa^2 - \Delta)(w_1\psi_1 + w_2\psi_2 + w_3\psi_3) = \mathcal{W}(\mathbf{s})

Step 3: Multiply both sides by \(\psi_1\) and integrate over the domain:

\int \psi_1 \cdot (\kappa^2 - \Delta)(w_1\psi_1 + w_2\psi_2 + w_3\psi_3) \, d\mathbf{s} = \int \psi_1 \cdot \mathcal{W} \, d\mathbf{s}

Step 4: Expand (using integration by parts for the Δ term):

w_1 \left[ \kappa^2 \int\psi_1\psi_1 + \int\nabla\psi_1 \cdot \nabla\psi_1 \right] + w_2 \left[ \kappa^2 \int\psi_1\psi_2 + \int\nabla\psi_1 \cdot \nabla\psi_2 \right] + w_3 \left[ \kappa^2 \int\psi_1\psi_3 + \int\nabla\psi_1 \cdot \nabla\psi_3 \right] = \epsilon_1

Step 5: Define the matrix entries:

C_{ik} = \int \psi_i \psi_k \, d\mathbf{s} \qquad G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}

Step 6: The equation becomes:

w_1(\kappa^2 C_{11} + G_{11}) + w_2(\kappa^2 C_{12} + G_{12}) + w_3(\kappa^2 C_{13} + G_{13}) = \epsilon_1

The Complete System: 3 Equations

We repeat this process for \(\psi_2\) and \(\psi_3\) to get 3 equations:

Equation 1 (multiply by ψ₁): \(w_1(\kappa^2 C_{11} + G_{11}) + w_2(\kappa^2 C_{12} + G_{12}) + w_3(\kappa^2 C_{13} + G_{13}) = \epsilon_1\)
Equation 2 (multiply by ψ₂): \(w_1(\kappa^2 C_{21} + G_{21}) + w_2(\kappa^2 C_{22} + G_{22}) + w_3(\kappa^2 C_{23} + G_{23}) = \epsilon_2\)
Equation 3 (multiply by ψ₃): \(w_1(\kappa^2 C_{31} + G_{31}) + w_2(\kappa^2 C_{32} + G_{32}) + w_3(\kappa^2 C_{33} + G_{33}) = \epsilon_3\)

In matrix form:

\underbrace{\begin{pmatrix} \kappa^2 C_{11} + G_{11} & \kappa^2 C_{12} + G_{12} & \kappa^2 C_{13} + G_{13} \\ \kappa^2 C_{21} + G_{21} & \kappa^2 C_{22} + G_{22} & \kappa^2 C_{23} + G_{23} \\ \kappa^2 C_{31} + G_{31} & \kappa^2 C_{32} + G_{32} & \kappa^2 C_{33} + G_{33} \end{pmatrix}}_{\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}} \begin{pmatrix} w_1 \\ w_2 \\ w_3 \end{pmatrix} = \begin{pmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{pmatrix}

Showing C and G separately:

\mathbf{Q} = \kappa^2 \underbrace{\begin{pmatrix} C_{11} & C_{12} & C_{13} \\ C_{21} & C_{22} & C_{23} \\ C_{31} & C_{32} & C_{33} \end{pmatrix}}_{\mathbf{C} \text{ (Mass)}} + \underbrace{\begin{pmatrix} G_{11} & G_{12} & G_{13} \\ G_{21} & G_{22} & G_{23} \\ G_{31} & G_{32} & G_{33} \end{pmatrix}}_{\mathbf{G} \text{ (Stiffness)}}

Where: \(C_{ik} = \int \psi_i \psi_k \, d\mathbf{s}\) and \(G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}\)

Why is C a 3×3 matrix?

Now you can see it directly from the equations above:

3 unknowns (w₁, w₂, w₃) → 3 columns in the matrix
3 equations (one per ψ_i) → 3 rows in the matrix
Each entry \(C_{ik}\) is the coefficient of \(w_k\) in equation \(i\)

In general: m vertices → m unknowns → m equations → m×m matrices.

Now let's compute the actual values of \(C_{ik}\).

The key idea: \(C_{ik} = \int \psi_i(\mathbf{s}) \times \psi_k(\mathbf{s}) \, d\mathbf{s}\). This is NOT just "do the tents overlap?" (they always do inside the triangle). It's "when we multiply the two tent heights together and add up the result over the whole triangle, what do we get?"

What does ψ_i × ψ_j look like?

ψ₁ × ψ₁

Peak of 1 at vertex 1, fades to 0

Large integral (more "volume")

ψ₁ × ψ₂

Zero at all vertices! Small bump in middle

Smaller integral (less "volume")

            The crucial insight: For \(C_{11} = \int \psi_1 \times \psi_1\), the product is 1 at vertex 1 and positive everywhere in the triangle.
            For \(C_{12} = \int \psi_1 \times \psi_2\), the product is 0 at every vertex (because where ψ₁=1, ψ₂=0 and vice versa).
            The product is only non-zero in the middle, and even there it's small. That's why C₁₁ > C₁₂.
          

Let's compute step by step:

Entry	Calculation	At vertex 1	At vertex 2	In middle	Result
\(C_{11}\)	\(\int \psi_1 \times \psi_1\)	1×1 = 1	0×0 = 0	0.33×0.33 ≈ 0.11	2 (larger)
\(C_{12}\)	\(\int \psi_1 \times \psi_2\)	1×0 = 0	0×1 = 0	0.33×0.33 ≈ 0.11	1 (smaller)
\(C_{22}\)	\(\int \psi_2 \times \psi_2\)	0×0 = 0	1×1 = 1	0.33×0.33 ≈ 0.11	2 (larger)

The Rule for Mass Matrix C

Same basis function (i = k):	\(C_{ii} = \frac{2 \cdot \text{Area}}{12}\) → coefficient 2
Different basis functions (i ≠ k):	\(C_{ik} = \frac{1 \cdot \text{Area}}{12}\) → coefficient 1

Where does Area/12 come from?

There's a standard formula for integrating products of barycentric coordinates over a triangle:

\int_{\triangle} \lambda_1^a \lambda_2^b \lambda_3^c \, dA = \frac{a! \, b! \, c!}{(a+b+c+2)!} \times 2 \times \text{Area}

where \(\lambda_1, \lambda_2, \lambda_3\) are barycentric coordinates (which ARE our basis functions: \(\psi_i = \lambda_i\)).

Computing \(C_{11} = \int \psi_1^2 \, dA = \int \lambda_1^2 \, dA\):

Here \(a=2, b=0, c=0\):

C_{11} = \frac{2! \cdot 0! \cdot 0!}{(2+0+0+2)!} \times 2 \times \text{Area} = \frac{2 \cdot 1 \cdot 1}{4!} \times 2 \times \text{Area} = \frac{2}{24} \times 2 \times \text{Area} = \frac{\mathbf{2} \times \text{Area}}{12}

Computing \(C_{12} = \int \psi_1 \psi_2 \, dA = \int \lambda_1 \lambda_2 \, dA\):

Here \(a=1, b=1, c=0\):

C_{12} = \frac{1! \cdot 1! \cdot 0!}{(1+1+0+2)!} \times 2 \times \text{Area} = \frac{1 \cdot 1 \cdot 1}{4!} \times 2 \times \text{Area} = \frac{1}{24} \times 2 \times \text{Area} = \frac{\mathbf{1} \times \text{Area}}{12}

The 12 comes from 4!/2 = 24/2 = 12. The factorial (4! = 24) appears because we're integrating polynomials over a 2D simplex (triangle). It's similar to how \(\int_0^1 x \, dx = \frac{1}{2}\) and \(\int_0^1 x^2 \, dx = \frac{1}{3}\) - the denominator increases with polynomial degree. For triangles with barycentric coordinates, the pattern gives us 4! in the denominator.

The formula (standard result for linear basis functions on triangles):

\mathbf{C} = \frac{\text{Area}}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} = \frac{0.5}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}

Step 5: Computing the Stiffness Matrix G

Why are we computing gradient products?

Remember from the Galerkin derivation, the matrix entries \(G_{ik}\) come from the Laplacian term \(-\Delta\) in the SPDE. After integration by parts:

-\int \psi_i \cdot \Delta\psi_k \, d\mathbf{s} = +\int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s} = G_{ik}

The gradient integrals appear naturally - we're not computing them arbitrarily!

The stiffness matrix measures how the slopes (gradients) of the tent functions interact.

The key idea: \(G_{ik} = \int \nabla\psi_i \cdot \nabla\psi_k \, d\mathbf{s}\). Inside a triangle, each tent function has a constant slope (it's a flat ramp). So the integral simplifies to: \(G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area}\)

Understanding the Gradient ∇ψ

Each tent function slopes downhill from its peak (vertex) toward the other vertices. The gradient points in the direction of steepest descent.

∇ψ₁: slopes away from v₁

∇ψ₂: slopes away from v₂

            The dot product ∇ψ_i · ∇ψ_j:
            Same direction (or same tent): dot product is positive → G_ii > 0
Opposite directions: dot product is negative → G_ij < 0 for neighbors

          

For our right triangle:

\mathbf{G} = \frac{1}{2} \begin{pmatrix} 2 & -1 & -1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix}

Entry	What it measures	Value	Why?
\(G_{11}\)	\(\int \nabla\psi_1 \cdot \nabla\psi_1\)	Positive (1)	∇ψ₁ dotted with itself is always positive (magnitude squared)
\(G_{12}\)	\(\int \nabla\psi_1 \cdot \nabla\psi_2\)	Negative (-0.5)	∇ψ₁ and ∇ψ₂ point in opposite directions (tent 1 slopes away from v₁, tent 2 slopes away from v₂)
\(G_{23}\)	\(\int \nabla\psi_2 \cdot \nabla\psi_3\)	Zero (0)	For this specific triangle, ∇ψ₂ and ∇ψ₃ happen to be perpendicular (dot product = 0)

Key pattern: In G, diagonal entries are always positive (self-interaction). Off-diagonal entries are negative or zero (neighboring tents slope in different directions). This is the opposite pattern from C, where all entries are positive!

How to compute ∇ψ for a triangle

For a triangle with vertices at \((x_1, y_1), (x_2, y_2), (x_3, y_3)\), the gradient of each basis function is:

\nabla\psi_i = \frac{1}{2 \cdot \text{Area}} \begin{pmatrix} y_j - y_k \\ x_k - x_j \end{pmatrix}

where \((i, j, k)\) is a cyclic permutation of \((1, 2, 3)\).

For our right triangle with vertices \(v_1=(0,0), v_2=(1,0), v_3=(0,1)\):

Area = 0.5, so \(2 \cdot \text{Area} = 1\)

∇ψ₁ (i=1, j=2, k=3)

                \(\nabla\psi_1 = \begin{pmatrix} y_2 - y_3 \\ x_3 - x_2 \end{pmatrix} = \begin{pmatrix} 0-1 \\ 0-1 \end{pmatrix} = \begin{pmatrix} -1 \\ -1 \end{pmatrix}\)
              

∇ψ₂ (i=2, j=3, k=1)

                \(\nabla\psi_2 = \begin{pmatrix} y_3 - y_1 \\ x_1 - x_3 \end{pmatrix} = \begin{pmatrix} 1-0 \\ 0-0 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\)
              

∇ψ₃ (i=3, j=1, k=2)

                \(\nabla\psi_3 = \begin{pmatrix} y_1 - y_2 \\ x_2 - x_1 \end{pmatrix} = \begin{pmatrix} 0-0 \\ 1-0 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\)
              

Let's compute G step by step

Formula: \(G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area}\)

Diagonal entries (same basis function):

G_{11} = \nabla\psi_1 \cdot \nabla\psi_1 \times \text{Area} = ((-1)(-1) + (-1)(-1)) \times 0.5 = 2 \times 0.5 = \mathbf{1.0}\) \(G_{22} = \nabla\psi_2 \cdot \nabla\psi_2 \times \text{Area} = (1 \cdot 1 + 0 \cdot 0) \times 0.5 = 1 \times 0.5 = \mathbf{0.5}\) \(G_{33} = \nabla\psi_3 \cdot \nabla\psi_3 \times \text{Area} = (0 \cdot 0 + 1 \cdot 1) \times 0.5 = 1 \times 0.5 = \mathbf{0.5}

Off-diagonal entries (different basis functions):

G_{12} = \nabla\psi_1 \cdot \nabla\psi_2 \times \text{Area} = ((-1) \cdot 1 + (-1) \cdot 0) \times 0.5 = -1 \times 0.5 = \mathbf{-0.5}\) \(G_{13} = \nabla\psi_1 \cdot \nabla\psi_3 \times \text{Area} = ((-1) \cdot 0 + (-1) \cdot 1) \times 0.5 = -1 \times 0.5 = \mathbf{-0.5}\) \(G_{23} = \nabla\psi_2 \cdot \nabla\psi_3 \times \text{Area} = (1 \cdot 0 + 0 \cdot 1) \times 0.5 = 0 \times 0.5 = \mathbf{0}

Assembling G: The matrix is symmetric (\(G_{ik} = G_{ki}\)), so:

\mathbf{G} = \begin{pmatrix} G_{11} & G_{12} & G_{13} \\ G_{21} & G_{22} & G_{23} \\ G_{31} & G_{32} & G_{33} \end{pmatrix} = \begin{pmatrix} 1.0 & -0.5 & -0.5 \\ -0.5 & 0.5 & 0 \\ -0.5 & 0 & 0.5 \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 2 & -1 & -1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix}

The Rule for Stiffness Matrix G

Diagonal (i = k): Always positive

G_{ii} = |\nabla\psi_i|^2 \times \text{Area} > 0

The squared magnitude of a gradient is always positive.

Off-diagonal (i ≠ k): Usually negative

G_{ik} = (\nabla\psi_i \cdot \nabla\psi_k) \times \text{Area} \leq 0

Gradients of adjacent tent functions typically point in opposite directions.

            Why the pattern?
            Tent function ψ_i rises toward vertex i and falls away from it. Meanwhile, ψ_k rises toward vertex k. If i and k are different vertices, their gradients point in different directions, giving a negative (or zero) dot product.
          

Sanity check: Row sums of G should be zero

This is a fundamental property of the stiffness matrix (since constant functions have zero Laplacian):

            Row 1: \(1.0 + (-0.5) + (-0.5) = 0\) ✓

            Row 2: \((-0.5) + 0.5 + 0 = 0\) ✓

            Row 3: \((-0.5) + 0 + 0.5 = 0\) ✓

Step 6: Building the Precision Matrix Q

Where does Q = κ²C + G come from?

Remember from the Galerkin derivation (Step 4), each matrix entry of Q combines C and G:

Q_{ik} = \kappa^2 C_{ik} + G_{ik}

This is exactly what emerged from multiplying by basis functions and integrating the SPDE!

For \(\alpha = 1\) (simplest case), with scale parameter \(\kappa\):

\mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G}

What does κ control?

κ²C term (Mass matrix)

Controls the amplitude/variance of the field. Larger κ² → more precision → smaller variance.

G term (Stiffness matrix)

Controls smoothness/correlation. Penalizes differences between neighboring vertices.

            The balance: Large κ makes the κ²C term dominate → field varies more freely at each vertex.
            Small κ makes G dominate → field is smoother (neighboring values more similar).
          

Let's compute Q step by step for κ = 2:

Computing Q for κ = 2

Step 1: Compute κ²C = 4 × C

\kappa^2 \mathbf{C} = 4 \times \frac{0.5}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} = \frac{2}{12} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} = \frac{1}{6} \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}

Step 2: Write out the numerical values

\kappa^2 \mathbf{C} = \begin{pmatrix} 2/6 & 1/6 & 1/6 \\ 1/6 & 2/6 & 1/6 \\ 1/6 & 1/6 & 2/6 \end{pmatrix} = \begin{pmatrix} 0.333 & 0.167 & 0.167 \\ 0.167 & 0.333 & 0.167 \\ 0.167 & 0.167 & 0.333 \end{pmatrix}\) \(\mathbf{G} = \begin{pmatrix} 1.0 & -0.5 & -0.5 \\ -0.5 & 0.5 & 0 \\ -0.5 & 0 & 0.5 \end{pmatrix}

Step 3: Add them: Q = κ²C + G

Q_{11} = 0.333 + 1.0 = \mathbf{1.333}\) \(Q_{12} = 0.167 + (-0.5) = \mathbf{-0.333}\) \(Q_{13} = 0.167 + (-0.5) = \mathbf{-0.333}\) \(Q_{22} = 0.333 + 0.5 = \mathbf{0.833}\) \(Q_{23} = 0.167 + 0 = \mathbf{0.167}\) \(Q_{33} = 0.333 + 0.5 = \mathbf{0.833}

The Final Precision Matrix Q (for κ = 2)

\mathbf{Q} = \begin{pmatrix} 1.333 & -0.333 & -0.333 \\ -0.333 & 0.833 & 0.167 \\ -0.333 & 0.167 & 0.833 \end{pmatrix}

Notice: The diagonal entries are positive (precision), and off-diagonal entries can be positive or negative depending on the balance of C and G contributions.

Sanity check: Q should be positive definite

For Q to define a valid Gaussian distribution, it must be symmetric positive definite. You can verify this by checking that all eigenvalues are positive, or that the determinant and all leading principal minors are positive.

Step 7: What Q Tells Us

The precision matrix \(\mathbf{Q}\) is the inverse of the covariance matrix for the weights \(\mathbf{w} = (w_1, w_2, w_3)\):

\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})

Reading the Precision Matrix

Diagonal entries Q_ii

Precision (inverse marginal variance) of weight \(w_i\).

From our Q: \(Q_{11} = 1.333\) means vertex 1 has precision 1.333

Off-diagonal entries Q_ij

Conditional dependence between \(w_i\) and \(w_j\).

\(Q_{12} = -0.333\) means vertices 1 and 2 are conditionally dependent (neighbors)

The Markov Property (Conditional Independence)

Key insight: If \(Q_{ij} = 0\), then \(w_i\) and \(w_j\) are conditionally independent given all other weights. In our example, \(Q_{23} = 0.167 \neq 0\) because vertices 2 and 3 share an edge.

In a large mesh, most vertex pairs are not neighbors → most \(Q_{ij} = 0\) → Q is sparse!

From Precision to Covariance

The covariance matrix is \(\Sigma = Q^{-1}\). For our 3×3 example:

\mathbf{\Sigma} = \mathbf{Q}^{-1} = \begin{pmatrix} 1.333 & -0.333 & -0.333 \\ -0.333 & 0.833 & 0.167 \\ -0.333 & 0.167 & 0.833 \end{pmatrix}^{-1} \approx \begin{pmatrix} 1.5 & 1.0 & 1.0 \\ 1.0 & 2.0 & 1.0 \\ 1.0 & 1.0 & 2.0 \end{pmatrix}

Notice: The covariance matrix is dense (all entries non-zero) even though Q is sparse! This is why we work with Q directly in INLA - it's much more efficient.

Connection to Matérn Parameters

The SPDE parameter κ relates to the practical correlation range ρ:

\rho = \frac{\sqrt{8\nu}}{\kappa} \quad \text{where } \nu = \alpha - d/2

For α = 2 in 2D (ν = 1), this simplifies to: \(\rho \approx \frac{2.83}{\kappa}\)

Large κ (e.g., κ = 10)

Small range ρ ≈ 0.28
Field varies rapidly, weak spatial correlation

Small κ (e.g., κ = 0.5)

Large range ρ ≈ 5.66
Field is smooth, strong spatial correlation

          The key insight: In a larger mesh, \(\mathbf{Q}\) is sparse because each vertex
          only connects to a few neighbors. A mesh with 1000 vertices might have \(\mathbf{Q}\) with only
          ~6000 non-zero entries instead of 1,000,000. This sparsity makes computation fast.
        

Summary of the Full Chain

Step	What We Do	Result
1	Create mesh	Vertices + triangles
2	Define basis functions	Tent function \(\psi_k\) at each vertex
3	Approximate \(u(\mathbf{s})\)	\(u(\mathbf{s}) \approx \sum_k w_k \psi_k(\mathbf{s})\)
4	Plug into SPDE + integrate	FEM matrices C, G
5	Combine with \(\kappa\)	Precision matrix \(\mathbf{Q}\)
6	Use in INLA	\(\mathbf{w} \sim N(\mathbf{0}, \mathbf{Q}^{-1})\) as spatial random effect

Deriving the Random Walk 1 (RW1) Precision Matrix

The same FEM/Galerkin approach works in 1D! Let's derive the Random Walk of order 1 precision matrix using the exact same steps we used for the 2D Matérn field.

Why a Different Differential Equation?

The 2D Matérn SPDE was: \((\kappa^2 - \Delta)u = \mathcal{W}\)

For a Random Walk, we want a process where increments (first differences) are independent. This corresponds to setting κ = 0 and working in 1D:

-\Delta u = -\frac{d^2 u}{dx^2} = \mathcal{W}(x)

Physical interpretation: The second derivative of position equals white noise → velocity is Brownian motion → position is integrated Brownian motion (random walk).

Model	SPDE	Dimension	Result
Matérn (2D)	\((\kappa^2 - \Delta)u = \mathcal{W}\)	2D spatial	Q = κ²C + G
RW1	\(-\frac{d^2 u}{dx^2} = \mathcal{W}\)	1D (time/space)	Q = G only

Size and Rank of RW1

For a random walk with n points:

Precision matrix Q is n × n
There are n - 1 increments (differences between consecutive points)
Q has rank n - 1 (rank deficient by 1)
This is an intrinsic model - no absolute level, only relative differences matter

In practice: We add a sum-to-zero constraint or use it as a random effect (which handles the rank deficiency).

The 1D Differential Equation

The RW1 Stochastic Differential Equation

-\frac{d^2 u}{dx^2} = \mathcal{W}(x)

This is the 1D Laplacian (second derivative) driven by white noise. Unlike the Matérn SPDE, there is no κ²u term, so there will be no mass matrix C contribution.

Step 1: The 1D Mesh

Consider n = 4 equally-spaced points on a line with spacing h = 1:

Step 2: The 1D Basis Functions (Hat Functions)

Each point gets a "hat function" - the 1D equivalent of tent functions:

Hat function ψ₂: equals 1 at x₂, linearly decreases to 0 at neighbors x₁ and x₃

Step 3: The Galerkin Derivation (Full Detail)

Deriving the Matrix Equations (Galerkin Method for RW1)

Step 1: Start with the 1D differential equation:

-\frac{d^2 u}{dx^2} = \mathcal{W}(x)

Step 2: Substitute our approximation \(u(x) \approx w_1\psi_1 + w_2\psi_2 + w_3\psi_3 + w_4\psi_4\):

-\frac{d^2}{dx^2}(w_1\psi_1 + w_2\psi_2 + w_3\psi_3 + w_4\psi_4) = \mathcal{W}(x)

Step 3: Multiply both sides by \(\psi_1\) and integrate over the domain:

-\int \psi_1 \cdot \frac{d^2}{dx^2}(w_1\psi_1 + w_2\psi_2 + \cdots) \, dx = \int \psi_1 \cdot \mathcal{W} \, dx

Step 4: Apply integration by parts: \(-\int \psi_i \frac{d^2\psi_k}{dx^2} dx = +\int \frac{d\psi_i}{dx} \frac{d\psi_k}{dx} dx\)

w_1 \int\frac{d\psi_1}{dx}\frac{d\psi_1}{dx}dx + w_2 \int\frac{d\psi_1}{dx}\frac{d\psi_2}{dx}dx + w_3 \int\frac{d\psi_1}{dx}\frac{d\psi_3}{dx}dx + w_4 \int\frac{d\psi_1}{dx}\frac{d\psi_4}{dx}dx = \epsilon_1

Step 5: Define the stiffness matrix entries:

G_{ik} = \int \frac{d\psi_i}{dx} \cdot \frac{d\psi_k}{dx} \, dx

Step 6: The equation becomes:

w_1 G_{11} + w_2 G_{12} + w_3 G_{13} + w_4 G_{14} = \epsilon_1

The Complete System: 4 Equations for 4 Unknowns

We repeat this process for \(\psi_2\), \(\psi_3\), and \(\psi_4\) to get 4 equations:

Eq 1 (×ψ₁): \(w_1 G_{11} + w_2 G_{12} + w_3 G_{13} + w_4 G_{14} = \epsilon_1\)
Eq 2 (×ψ₂): \(w_1 G_{21} + w_2 G_{22} + w_3 G_{23} + w_4 G_{24} = \epsilon_2\)
Eq 3 (×ψ₃): \(w_1 G_{31} + w_2 G_{32} + w_3 G_{33} + w_4 G_{34} = \epsilon_3\)
Eq 4 (×ψ₄): \(w_1 G_{41} + w_2 G_{42} + w_3 G_{43} + w_4 G_{44} = \epsilon_4\)

In matrix form: \(\mathbf{G}\mathbf{w} = \boldsymbol{\epsilon}\), where \(\mathbf{Q}_{\text{RW1}} = \mathbf{G}\)

Key Difference from Matérn SPDE

In the Matérn case, we had \(\mathbf{Q} = \kappa^2\mathbf{C} + \mathbf{G}\) (both mass and stiffness matrices). For RW1, there's no κ²u term in the differential equation, so no mass matrix C appears. The precision matrix is only the stiffness matrix: \(\mathbf{Q}_{\text{RW1}} = \mathbf{G}\).

Step 4: Computing the Stiffness Matrix G

Gradients of 1D hat functions

For a hat function at xᵢ with spacing h:

Left segment [xᵢ₋₁, xᵢ]

Slope = \(\frac{d\psi_i}{dx} = +\frac{1}{h}\)

Right segment [xᵢ, xᵢ₊₁]

Slope = \(\frac{d\psi_i}{dx} = -\frac{1}{h}\)

Let's compute G step by step (h = 1)

Diagonal entries G_ii (interior point, e.g., i = 2):

G_{22} = \int_{x_1}^{x_2} \left(\frac{1}{h}\right)^2 dx + \int_{x_2}^{x_3} \left(-\frac{1}{h}\right)^2 dx\) \(G_{22} = \frac{1}{h^2} \cdot h + \frac{1}{h^2} \cdot h = \frac{1}{h} + \frac{1}{h} = \frac{2}{h}\) With h = 1: \(G_{22} = \mathbf{2}

Diagonal entries G_ii (boundary point, e.g., i = 1):

\(G_{11} = \int_{x_1}^{x_2} \left(-\frac{1}{h}\right)^2 dx = \frac{1}{h^2} \cdot h = \frac{1}{h}\) With h = 1: \(G_{11} = \mathbf{1}\) (only one segment!)

Off-diagonal entries G_ik (neighbors, e.g., i = 1, k = 2):

G_{12} = \int_{x_1}^{x_2} \left(-\frac{1}{h}\right) \cdot \left(+\frac{1}{h}\right) dx = -\frac{1}{h^2} \cdot h = -\frac{1}{h}\) With h = 1: \(G_{12} = \mathbf{-1}

Off-diagonal entries G_ik (non-neighbors, e.g., i = 1, k = 3):

\(G_{13} = 0\) (ψ₁ and ψ₃ don't overlap!)

The RW1 Precision Matrix (4 points)

\mathbf{Q}_{\text{RW1}} = \mathbf{G} = \begin{pmatrix} 1 & -1 & 0 & 0 \\ -1 & 2 & -1 & 0 \\ 0 & -1 & 2 & -1 \\ 0 & 0 & -1 & 1 \end{pmatrix}

This is exactly the standard RW1 structure matrix!

The Rule for RW1 Precision Matrix

Interior points

Q_ii = 2
(two neighbors)

Boundary points

Q_ii = 1
(one neighbor)

Neighbors

Q_ij = -1
(adjacent points)

Sanity check: Row sums should be zero

            Row 1: \(1 + (-1) + 0 + 0 = 0\) ✓

            Row 2: \((-1) + 2 + (-1) + 0 = 0\) ✓

            Row 3: \(0 + (-1) + 2 + (-1) = 0\) ✓

            Row 4: \(0 + 0 + (-1) + 1 = 0\) ✓

Row sums = 0 means Q is rank-deficient (as expected for RW1 - there's no "anchor").

Why is this a "Random Walk"?

The quadratic form \(\mathbf{x}^T \mathbf{Q} \mathbf{x}\) gives:

\mathbf{x}^T \mathbf{Q}_{\text{RW1}} \mathbf{x} = (x_1 - x_2)^2 + (x_2 - x_3)^2 + (x_3 - x_4)^2 = \sum_{i=1}^{n-1} (x_i - x_{i+1})^2

This penalizes first differences (increments) - exactly what a random walk does! The increments \(\Delta x_i = x_{i+1} - x_i\) are independent with equal variance.

          The big picture: RW1, RW2, AR1, and SPDE models are all unified under the same FEM framework.
          The choice of differential operator (first derivative, second derivative, Laplacian, etc.) determines
          which precision matrix structure you get.
        

Model	Differential Operator	Precision Matrix Structure
RW1	\(-\frac{d^2}{dx^2}\) (1D Laplacian)	Tridiagonal: (1, -1) at boundaries, (2, -1, -1) interior
RW2	\(\frac{d^4}{dx^4}\) (biharmonic)	Pentadiagonal: penalizes second differences
Matérn 2D	\((\kappa^2 - \Delta)^{\alpha/2}\)	Sparse graph Laplacian structure

Deriving the Random Walk 2 (RW2) Precision Matrix

Following the same approach, let's derive the Random Walk of order 2 precision matrix. RW2 produces smoother curves than RW1 by penalizing changes in slope rather than changes in level.

Why a Fourth Derivative?

RW1 used the second derivative (Laplacian), which penalizes curvature and gives independent first differences.

RW2 uses the fourth derivative (biharmonic), which penalizes changes in curvature and gives independent second differences:

\frac{d^4 u}{dx^4} = \mathcal{W}(x)

Physical interpretation: RW1 assumes velocity is white noise (position is rough). RW2 assumes acceleration is white noise (velocity is smooth, position is even smoother).

Property	RW1	RW2
SPDE	\(-\frac{d^2 u}{dx^2} = \mathcal{W}\)	\(\frac{d^4 u}{dx^4} = \mathcal{W}\)
Penalizes	First differences (level changes)	Second differences (slope changes)
Smoothness	Continuous but rough	Continuous and smooth
Matrix structure	Tridiagonal (3 diagonals)	Pentadiagonal (5 diagonals)
Rank deficiency	1 (needs 1 constraint)	2 (needs 2 constraints)

Size and Rank of RW2

For a random walk of order 2 with n points:

Precision matrix Q is n × n
There are n - 2 second differences
Q has rank n - 2 (rank deficient by 2)
This is doubly intrinsic - no absolute level AND no absolute slope

In practice: Need sum-to-zero constraint AND linear-trend-to-zero constraint, or use as random effect.

The Galerkin Approach for RW2

Starting from the SPDE

\frac{d^4 u}{dx^4} = \mathcal{W}(x)

Multiply by test function ψᵢ and integrate:

\int \psi_i \cdot \frac{d^4 u}{dx^4} \, dx = \int \psi_i \cdot \mathcal{W} \, dx

Apply integration by parts twice:

\int \frac{d^2\psi_i}{dx^2} \cdot \frac{d^2 u}{dx^2} \, dx = \epsilon_i

The Challenge with Piecewise Linear Basis Functions

For hat functions (piecewise linear), the second derivative is zero almost everywhere (linear functions have no curvature), except at the kinks where it has delta-function behavior.

Solution: We can construct the RW2 precision matrix directly as \(\mathbf{Q} = \mathbf{D}^T\mathbf{D}\), where \(\mathbf{D}\) is the second-difference matrix.

Step 1: The Second-Difference Matrix D

The second difference at point i is: \(\Delta^2 x_i = x_{i-1} - 2x_i + x_{i+1}\)

For n = 5 points, we have n - 2 = 3 second differences:

\mathbf{D} = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 \\ 0 & 1 & -2 & 1 & 0 \\ 0 & 0 & 1 & -2 & 1 \end{pmatrix} \quad \text{(3 × 5 matrix)}

Each row computes: (left neighbor) - 2×(center) + (right neighbor)

Step 2: Computing Q = D^TD

Why Q = D^TD?

The quadratic form should penalize squared second differences:

\mathbf{x}^T \mathbf{Q} \mathbf{x} = \sum_{i=2}^{n-1} (x_{i-1} - 2x_i + x_{i+1})^2 = \|\mathbf{D}\mathbf{x}\|^2 = \mathbf{x}^T \mathbf{D}^T\mathbf{D} \mathbf{x}

Therefore: \(\mathbf{Q}_{\text{RW2}} = \mathbf{D}^T\mathbf{D}\)

Step 3: Computing Q Step by Step (n = 5)

Matrix multiplication D^TD

D^T (5×3):

\mathbf{D}^T = \begin{pmatrix} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 1 & -2 & 1 \\ 0 & 1 & -2 \\ 0 & 0 & 1 \end{pmatrix}

Q = D^TD (5×5):

\mathbf{Q}_{\text{RW2}} = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 \\ -2 & 5 & -4 & 1 & 0 \\ 1 & -4 & 6 & -4 & 1 \\ 0 & 1 & -4 & 5 & -2 \\ 0 & 0 & 1 & -2 & 1 \end{pmatrix}

The RW2 Precision Matrix (5 points)

\mathbf{Q}_{\text{RW2}} = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 \\ -2 & 5 & -4 & 1 & 0 \\ 1 & -4 & 6 & -4 & 1 \\ 0 & 1 & -4 & 5 & -2 \\ 0 & 0 & 1 & -2 & 1 \end{pmatrix}

This is the standard RW2 structure matrix! Note the pentadiagonal structure.

The Rule for RW2 Precision Matrix

Boundary (i=1,n)

Q_ii = 1

Near-boundary (i=2,n-1)

Q_ii = 5

Interior (i=3,...,n-2)

Q_ii = 6

1st neighbors

Q_{i,i±1} = -4 or -2

2nd neighbors

Q_{i,i±2} = 1

Sanity checks

            Row sums = 0:

            Row 1: \(1 + (-2) + 1 = 0\) ✓

            Row 2: \((-2) + 5 + (-4) + 1 = 0\) ✓

            Row 3: \(1 + (-4) + 6 + (-4) + 1 = 0\) ✓

            Rank = n - 2: rank(Q) = 5 - 2 = 3 ✓

Row sums = 0 and rank deficiency = 2 confirm this is RW2.

Why is this "Random Walk 2"?

The quadratic form \(\mathbf{x}^T \mathbf{Q} \mathbf{x}\) gives:

\mathbf{x}^T \mathbf{Q}_{\text{RW2}} \mathbf{x} = \sum_{i=2}^{n-1} (x_{i-1} - 2x_i + x_{i+1})^2 = \sum_{i} (\Delta^2 x_i)^2

This penalizes second differences - deviations from a straight line!

RW1

\(\Delta x_i = x_{i+1} - x_i\)
Independent increments

RW2

\(\Delta^2 x_i = x_{i-1} - 2x_i + x_{i+1}\)
Independent second increments

Visual Intuition: RW1 vs RW2

RW1: Penalizes level changes

Jagged, can change direction sharply

RW2: Penalizes slope changes

Smooth, gradual changes in direction

          When to use RW2 over RW1? Use RW2 when you expect the underlying trend to be smooth
          (e.g., temperature over time, age effects). Use RW1 when you expect more abrupt changes
          (e.g., step functions, threshold effects).
        

Property	RW1	RW2
Precision matrix	Tridiagonal	Pentadiagonal
Non-zero diagonals	3 (main, ±1)	5 (main, ±1, ±2)
Rank deficiency	1	2
Null space	Constant (1, 1, 1, ...)	Constant + Linear (1, 2, 3, ...)
Typical diagonal (interior)	2	6

From u(s) to Weights: What Are We Estimating?

The Continuous Field u(s)

The SPDE defines a continuous spatial field \(u(s)\) at every location \(s\) in the domain. But we can't estimate infinitely many values! We need a finite representation.

The Basis Function Approximation

We approximate \(u(s)\) using basis functions centered at mesh vertices:

u(s) \approx \sum_{k=1}^{n} w_k \, \psi_k(s)

Where:

\(\psi_k(s)\): Basis function for vertex \(k\) (piecewise linear "tent" function)
\(w_k\): Weight at vertex \(k\) (what INLA estimates)
\(n\): Number of mesh vertices

          The Target: INLA estimates the weights \(\mathbf{w} = (w_1, w_2, \ldots, w_n)^T\).
          These weights fully determine the spatial field everywhere via the basis function expansion.
        

Connecting the Notation

The weights \(\mathbf{w}\) are the same as \(\mathbf{u}_{\text{mesh}}\):

\mathbf{w} = \mathbf{u}_{\text{mesh}} = \text{(values at mesh vertices)}

Once we have the weights, we can compute \(u(s)\) at any location by evaluating the basis functions.

The Projector Matrix A

To get \(u(s)\) at observation locations, we evaluate the basis function expansion. The projector matrix \(\mathbf{A}\) does this efficiently:

u(s_i) = \sum_{k=1}^{n} w_k \, \psi_k(s_i) = \sum_{k=1}^{n} A_{ik} \, w_k

In matrix form:

\mathbf{u}_{\text{obs}} = \mathbf{A} \, \mathbf{w}

Where \(A_{ik} = \psi_k(s_i)\) is the value of basis function \(k\) at observation location \(s_i\).

Structure of A

Each row of \(\mathbf{A}\) has at most 3 non-zero values because each point lies in one triangle with 3 vertices. These values are the barycentric coordinates of the point within its triangle, and each row sums to 1.

Symbol	Dimension	Meaning
\(\mathbf{w}\) or \(\mathbf{u}_{\text{mesh}}\)	\(n \times 1\)	Weights at mesh vertices (what INLA estimates)
\(\mathbf{A}\)	\(m \times n\)	Projector matrix (basis functions at obs locations)
\(\mathbf{u}_{\text{obs}}\)	\(m \times 1\)	\(u(s)\) at observation locations

# Build projector matrix in pyINLA
A = mesh.make_projector(coords)
# A is a sparse matrix: m (observations) × n (vertices)

# Get u(s) at observation locations from estimated weights
u_obs = A @ w  # w = result.summary_random['spatial']['mean']

# For prediction at new locations:
A_pred = mesh.make_projector(pred_coords)
u_pred = A_pred @ w  # u(s) at prediction locations

The Complete Model

Putting it all together, a spatial model with SPDE has three levels:

General Hierarchical Model

\begin{align} g(\mu_i) &= \eta_i = \beta_0 + [\mathbf{A}\mathbf{w}]_i & \text{(Link function)}\\[0.3em] \mathbf{w} &\sim N(\mathbf{0}, \mathbf{Q}^{-1}) & \text{(Latent field prior)} \end{align}

Where \(g(\cdot)\) is the link function connecting the mean \(\mu_i\) to the linear predictor \(\eta_i\):

Family	Link \(g(\mu)\)	Example
Gaussian	identity: \(\mu\)	Continuous measurements
Poisson	log: \(\log(\mu)\)	Counts, disease mapping
Binomial	logit: \(\log(\mu/(1-\mu))\)	Presence/absence, proportions
Gamma	log: \(\log(\mu)\)	Positive continuous (e.g., rainfall)

Where Does Q Come From?

The precision matrix \(\mathbf{Q}\) comes from the SPDE discretization. For the Matérn field with α = 2:

\mathbf{Q} = \kappa^2 \mathbf{C} + \mathbf{G}

Where \(\mathbf{C}\) (mass matrix) and \(\mathbf{G}\) (stiffness matrix) are computed from integrating the basis functions over the mesh triangles. The matrix \(\mathbf{Q}\) encodes the spatial correlation structure and defines the prior on the weights \(\mathbf{w}\).

Notation Summary

\(\mathbf{y}\): Observed response vector (\(m \times 1\))
\(\eta_i\): Linear predictor at observation \(i\)
\(g(\cdot)\): Link function (identity, log, logit, etc.)
\(\beta_0\): Intercept (and other fixed effects)
\(\mathbf{A}\): Projector matrix (\(m \times n\)), evaluates basis functions at observation locations
\(\mathbf{w}\): Weights at mesh vertices (\(n \times 1\)), what INLA estimates
\(\mathbf{A}\mathbf{w}\): The spatial field \(u(s)\) evaluated at observation locations
\(\mathbf{Q}\): Precision matrix from SPDE (\(n \times n\)), prior precision of weights \(\mathbf{w}\)

Summary: INLA estimates the weights \(\mathbf{w}\), which have prior precision \(\mathbf{Q}\) from the SPDE. The spatial effect at your data locations is \(\mathbf{A}\mathbf{w}\), which enters the linear predictor \(\eta\). The link function \(g\) connects \(\eta\) to the mean of your chosen likelihood family.

Next Steps

Now that you understand the theory, see how to fit SPDE models in practice:

Modeling SPDE - Step-by-step workflow for fitting SPDE models
Creating Meshes - Detailed guide on mesh construction
Priors - Setting PC priors for range and sigma

References

Lindgren, F., Rue, H., & Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. JRSS-B, 73(4), 423-498.
Fuglstad, G.-A., Simpson, D., Lindgren, F., & Rue, H. (2019). Constructing priors that penalize the complexity of Gaussian random fields. JASA, 114(525), 445-452.
Krainski, E., et al. (2018). Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Chapman & Hall/CRC.

The Big Idea

The Matérn Covariance

Correlation Function

Covariance Function

The Parameters

Understanding Smoothness (\(\nu\))

Practical Range

Marginal Standard Deviation (\(\sigma\))

The SPDE-Matérn Link

Why This Matters

The Key Insight (No Differential Equations Required)

The Mathematical Statement

Understanding Each Part

Connection to Matérn Parameters

Summary: What You Need to Remember

From SPDE to FEM: The Bridge

The Problem

The Solution: Discretize with a Mesh

Why This Works

FEM Matrices and Precision

Approximating the Spatial Field with Basis Functions

What Are Tent Functions?

Plugging Back into the SPDE

The FEM Matrices

The Precision Matrix

Worked Example: A Single Triangle

Step 1: The Mesh

Step 2: The Basis Functions

Step 3: The Spatial Field Approximation

Step 4: Computing the Mass Matrix C

Step 5: Computing the Stiffness Matrix G

Step 6: Building the Precision Matrix Q

Step 7: What Q Tells Us

Summary of the Full Chain

Deriving the Random Walk 1 (RW1) Precision Matrix

The 1D Differential Equation

Step 1: The 1D Mesh

Step 2: The 1D Basis Functions (Hat Functions)

Step 3: The Galerkin Derivation (Full Detail)

Step 4: Computing the Stiffness Matrix G

Deriving the Random Walk 2 (RW2) Precision Matrix

The Galerkin Approach for RW2

Step 1: The Second-Difference Matrix D

Step 2: Computing Q = DTD

Step 3: Computing Q Step by Step (n = 5)

From u(s) to Weights: What Are We Estimating?

The Continuous Field u(s)

The Basis Function Approximation

Connecting the Notation

The Projector Matrix A

Structure of A

The Complete Model

General Hierarchical Model

Where Does Q Come From?

Notation Summary

Next Steps

References

Step 2: Computing Q = D^TD