Overview
The Besag model, also known as the Intrinsic Conditional Autoregressive (ICAR) model, is the foundational spatial random effect for areal data. It assumes that:
- Spatial smoothing - each region's value is influenced by its neighbors
- Local dependence - only direct neighbors matter (Markov property)
- Improper prior - the model is intrinsic (rank-deficient), requiring a sum-to-zero constraint
The Besag model is ideal for disease mapping, ecological studies, and any application where you expect nearby areas to have similar characteristics due to shared environmental, social, or demographic factors.
Mathematical Formulation
For \(n\) regions with neighborhood structure defined by an adjacency graph, the Besag model specifies the conditional distribution:
\[x_i \mid x_j, i \neq j, \tau \sim \mathcal{N}\left(\frac{1}{n_i}\sum_{j \sim i} x_j, \frac{1}{n_i \tau}\right)\]
where:
- \(x_i\) is the spatial random effect for region \(i\)
- \(n_i\) is the number of neighbors of region \(i\)
- \(j \sim i\) indicates that regions \(i\) and \(j\) are neighbors
- \(\tau\) is the precision (controls the strength of spatial smoothing)
This specification means each region's value is drawn from a normal distribution centered on the average of its neighbors, with variance inversely proportional to the number of neighbors.
Precision Matrix
The joint distribution has precision matrix:
\[\mathbf{Q} = \tau \mathbf{R}\]
where \(\mathbf{R}\) is the structure matrix with entries:
- \(R_{ii} = n_i\) (number of neighbors of region \(i\))
- \(R_{ij} = -1\) if regions \(i\) and \(j\) are neighbors
- \(R_{ij} = 0\) otherwise
Since \(\mathbf{R}\) is singular (has rank \(n-1\) for a connected graph), a sum-to-zero constraint \(\sum_i x_i = 0\) is typically applied.
When to Use Besag
The Besag model is appropriate for areal (lattice) data where you have discrete spatial units and expect spatial autocorrelation:
Disease Mapping
Mapping disease incidence or mortality rates across administrative regions. Nearby areas often share similar risk factors and health outcomes.
Ecological Studies
Species abundance, biodiversity indices, or environmental measurements across spatial units where neighboring areas have similar conditions.
Election Analysis
Voting patterns across constituencies or counties, where political preferences tend to be spatially clustered.
Socioeconomic Indicators
Income levels, crime rates, or educational attainment across census tracts or neighborhoods with spatial dependence.
When NOT to Use Besag
The Besag model is for areal data (discrete spatial units with adjacency). For point-referenced data (continuous locations), use geostatistical models like SPDE/Matern instead.
Specification in pyINLA
Allowed Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
id |
Yes | - | Column name containing region indices (integers 1 to n) |
model |
Yes | - | Must be 'besag' |
graph |
Yes | - | Neighborhood structure: dense / sparse adjacency matrix, an InlaGraph, or a path to a text-format .graph file |
scale.model |
No | False |
Scale Q so the geometric mean of the marginal variances equals 1 (recommended with PC priors) |
adjust.for.con.comp |
No | True |
Adjust for disconnected components in the graph |
constr |
No | True |
Apply sum-to-zero constraint |
hyper |
No | loggamma | Prior specification for precision: [{prior, param, initial, fixed}] |
n |
No | auto | Number of areas (auto-detected from the graph if omitted) |
values |
No | auto | Explicit list of allowed area-id values (mirrors graph size) |
extraconstr |
No | - | Extra linear constraints: {'A': matrix, 'e': vector} where A @ x = e |
rankdef |
No | auto | Explicit rank-deficiency override (default uses graph's connected-component count) |
diagonal |
No | 0.0 |
Extra constant on the precision diagonal (numerical stability) |
weights |
No | - | Per-observation weighting vector applied to the random-effect contribution. Length must equal the number of observations. |
Per-Parameter Runnable Examples
Click any parameter to expand a minimal runnable snippet. All snippets assume the shared setup block below (8 areas arranged as a 4×2 grid, 80 observations).
# shared setup for every snippet below
import numpy as np
import pandas as pd
from pyinla import pyinla
rng = np.random.default_rng(42)
n_areas = 8 # 4x2 grid of areas
nobs = 80 # observations spread across the 8 areas
df = pd.DataFrame({
'y': rng.standard_normal(nobs),
'x': rng.standard_normal(nobs),
'area': rng.integers(1, n_areas + 1, size=nobs),
})
# Adjacency for the 4x2 grid:
# 1 - 2 - 3 - 4
# | | | |
# 5 - 6 - 7 - 8
adj = np.array([
[0,1,0,0, 1,0,0,0],
[1,0,1,0, 0,1,0,0],
[0,1,0,1, 0,0,1,0],
[0,0,1,0, 0,0,0,1],
[1,0,0,0, 0,1,0,0],
[0,1,0,0, 1,0,1,0],
[0,0,1,0, 0,1,0,1],
[0,0,0,1, 0,0,1,0],
], dtype=int)
id + model + graph required trio (minimal call)
# 'id' is the column holding the area index, 'graph' is the adjacency.
# 'n' auto-detected from the graph; constr=True by default.
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
graph alternative formats (sparse, dense, file path)
# pyinla accepts the adjacency in three forms:
# (1) a dense numpy adjacency matrix (above)
# (2) a scipy.sparse matrix (efficient for large graphs)
# (3) a path to a graph text file (first line n, then "i k nb_1 ... nb_k" per node)
import scipy.sparse as sp
# (2) sparse
adj_sparse = sp.csr_matrix(adj)
# (3) graph file. Text format: first line n; then "i k nb_1 ... nb_k" per node.
graph_file = '/path/to/regions.graph'
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj_sparse}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
hyper override the precision prior
# Default: prior='loggamma', param=[1, 0.00005], initial=4, fixed=False.
# Below: PC prior on the precision (recommended when also using scale.model=True).
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'pc.prec', 'param': [1, 0.01]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
n explicit number of areas (auto-detected by default)
# pyinla auto-derives n from the graph size; set 'n' explicitly only if
# you want a hard sanity-check or your spec needs to enforce a specific count.
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'n': 8}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
constr sum-to-zero constraint (default True)
# besag is intrinsic (rank deficient by 1 per connected component), so
# the default constr=True pins the level. Set constr=False to opt out;
# drop the intercept since the latent field will then absorb the level.
formula = {
'response': 'y',
'fixed': ['0', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'constr': False}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
scale.model geometric-mean variance scaling (recommended with PC priors)
# Scales R so the geometric mean of the marginal variances is 1, making
# the prior's magnitude interpretation independent of the graph structure.
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'scale.model': True}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
adjust.for.con.comp sum-to-zero per connected component (default True)
# When the graph has multiple disconnected components, the default
# True adds one sum-to-zero constraint PER component (rank deficiency =
# #components). Setting it to False uses a single global sum-to-zero.
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'adjust.for.con.comp': False}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
values explicit list of area-id values
# Pin the area-id grid explicitly. Useful for safety against data with
# missing levels: the latent field will be estimated for every listed id.
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'values': list(range(1, 9))}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
extraconstr additional linear constraints \(\mathbf{A}\mathbf{x} = \mathbf{e}\)
# When 'values' is supplied (explicit area-id grid), A is sized (k, n)
# where n is len(values). Below: pin the first and last areas;
# A has shape (2, 8).
n = 8
A = np.zeros((2, n))
A[0, 0] = 1.0
A[1, n - 1] = 1.0
e = np.array([0.0, 0.0])
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'values': list(range(1, n + 1)),
'extraconstr': {'A': A, 'e': e}}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
diagonal extra constant on the precision diagonal
# Small numerical-stability jitter. Rarely needed for besag with a proper prior.
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'diagonal': 1e-4}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
rankdef explicit rank-deficiency override
# By default pyinla computes the rank deficiency from the graph: it equals
# the number of connected components when adjust.for.con.comp=True (one
# null direction per component), or 1 otherwise. Override only when you
# know the true rank deficiency (e.g. supplying a non-standard structure
# matrix where the auto-count would be wrong).
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'rankdef': 1}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
weights per-observation weighting
# Per-observation weight vector applied to the besag contribution.
# Length must equal the number of observations.
w = rng.uniform(0.5, 1.5, size=nobs)
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'weights': w}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
Hyperparameters
The Besag model has one hyperparameter: the precision \(\tau\) controlling the strength of spatial smoothing. Higher precision means stronger smoothing (more similar neighbors).
Internal Representation
The hyperparameter is represented on the log scale for numerical stability:
\[\theta_1 = \log(\tau)\]
Default Configuration
| Parameter | Default Value | Description |
|---|---|---|
prior |
'loggamma' |
Log-gamma prior on precision |
param |
[1, 0.00005] |
Shape and rate parameters for the gamma prior |
initial |
4 |
Starting value for \(\theta = \log(\tau)\); corresponds to \(\tau \approx 54.6\) |
fixed |
False |
Whether to estimate (False) or fix (True) the precision |
Customizing Hyperparameters
Override defaults using the hyper key (list with a single dict matching pyinla's list-of-dicts convention):
'random': [
{
'id': 'area',
'model': 'besag',
'graph': adj,
'hyper': [{
'prior': 'pc.prec',
'param': [1, 0.01], # P(sigma > 1) = 0.01
'initial': 4,
'fixed': False,
}],
}
]
Model Options: Conceptual Notes
The runnable code for each option lives in the per-parameter catalog above. The notes here cover what each option means semantically.
scale.model
With scale.model=True the precision matrix \(\mathbf{R}\) is rescaled so that the geometric mean of the marginal variances equals 1, which makes the interpretation of \(\tau\) (and any prior placed on it) independent of the graph size and connectivity. This is the canonical pairing with pc.prec, where the prior is parameterized via the marginal standard deviation \(\sigma = 1/\sqrt{\tau}\).
adjust.for.con.comp
When the graph has more than one connected component, the default adjust.for.con.comp=True treats each component independently:
constr=Trueis enforced as a separate sum-to-zero on each component of size \(\geq 2\).rankdefis set automatically to the number of components.- Singleton nodes (no neighbors) get a standard Normal prior under
scale.model=Trueand an improper uniform underscale.model=False.
Setting adjust.for.con.comp=False collapses this to a single global sum-to-zero (rank deficiency = 1), even when the graph has multiple components.
constr
The intrinsic Besag precision is rank-deficient, so constr=True is the default to ensure identifiability when an intercept is present. Setting constr=False removes the constraint and shifts the level to the latent field: be sure to drop the intercept ('fixed': ['0', ...]) to keep the model identified.
Implementation Details
The behavior depends on the combination of scale.model and adjust.for.con.comp options. Let \(\mathbf{Q} = \tau \mathbf{R}\) be the precision matrix.
scale.model=False, adjust.for.con.comp=False
The constraint constr=True is interpreted as a single sum-to-zero constraint over the whole graph. Singleton nodes (no neighbors) have a uniform distribution on \((-\infty, \infty)\) before the constraint.
scale.model=True, adjust.for.con.comp=False
The constraint constr=True applies over the whole graph. The structure matrix \(\mathbf{R}\) (excluding singletons) is scaled so the geometric mean of marginal variances equals 1. Singletons are modified to have a standard Normal distribution.
scale.model=False, adjust.for.con.comp=True
The constraint constr=True is interpreted as one sum-to-zero constraint on each connected component of size \(\geq 2\). Singletons have a uniform distribution on \((-\infty, \infty)\).
scale.model=True, adjust.for.con.comp=True (Recommended)
The constraint constr=True applies separately to each connected component of size \(\geq 2\). The structure matrix \(\mathbf{R}\) is scaled so the geometric mean of marginal variances in each connected component equals 1. Singletons are modified to have a standard Normal distribution.
Available Priors
Default and headline priors
The Besag log-precision \(\theta_1 = \log(\tau)\) defaults to a log-gamma prior, with shape and rate parameters:
\[\tau \sim \text{Gamma}(1,\ 0.00005)\]
The PC (Penalised Complexity) prior on the standard deviation \(\sigma = 1/\sqrt{\tau}\) is the recommended alternative whenever scale.model=True is set, parameterized via
\[P(\sigma > u) = \alpha.\]
Prior catalog (one snippet per family; 1:1 with the besag parity scenarios)
Click any prior family to expand a minimal runnable snippet. All snippets reuse the shared setup block from the Specification section (8-area 4×2 grid in adj; nobs = 80).
prec · loggamma classical gamma prior on \(\tau\) (default)
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'loggamma',
'param': [1, 0.01],
'initial': 2,
'fixed': False}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · pc.prec PC prior on \(\sigma = 1/\sqrt{\tau}\)
# P(sigma > 1) = 0.01: "I think the spatial SD is unlikely to exceed 1".
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'pc.prec', 'param': [1, 0.01]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · fixed pin the precision at a known value
import math
# tau is held fixed at 10 (initial value on the log scale).
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'initial': math.log(10),
'fixed': True}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · flat improper flat on \(\theta_1\)
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'flat', 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · logtnormal half-normal on \(\tau\) (log scale)
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'logtnormal',
'param': [0.0, 1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · normal / gaussian normal on \(\theta_1\)
# 'gaussian' is accepted as an alias for 'normal' (same prior).
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'normal', 'param': [0.0, 1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · laplace double-exponential on \(\theta_1\)
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'laplace',
'param': [0.0, 1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · pc.mgamma PC prior, modified-gamma variant
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'pc.mgamma', 'param': [1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · jeffreystdf Jeffreys-type reference prior
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': 'jeffreystdf', 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · expression: … custom log-density (R expression)
# A custom log-prior supplied as an R expression string. The body is
# evaluated by the C engine; `theta` is the internal-scale variable.
expr = "expression: log_dens = -0.5 * theta * theta; log_dens;"
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': expr, 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
prec · table: … custom tabulated log-density
# Inline tabulated prior: alternating (theta, log_density) pairs.
# The C engine requires at least 4 grid points.
table = "table: -6 -18 -2 -2 2 -2 6 -18"
formula = {
'response': 'y',
'fixed': ['1', 'x'],
'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
'hyper': [{'prior': table, 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
Technical Notes
- The term \(\frac{1}{2}\log(|\mathbf{R}|^{*})\) of the normalization constant is not computed. Add this to the log marginal likelihood estimate if needed, where \(\mathbf{R}\) is the precision matrix with unit precision parameter.
- For graphs with disconnected components, use
adjust.for.con.comp=Trueto handle each component separately. - When using
scale.model=True, the interpretation of the precision parameter becomes consistent across different graph structures, making prior specification more intuitive.