Besag Model (ICAR)

Overview

The Besag model, also known as the Intrinsic Conditional Autoregressive (ICAR) model, is the foundational spatial random effect for areal data. It assumes that:

Spatial smoothing - each region's value is influenced by its neighbors
Local dependence - only direct neighbors matter (Markov property)
Improper prior - the model is intrinsic (rank-deficient), requiring a sum-to-zero constraint

The Besag model is ideal for disease mapping, ecological studies, and any application where you expect nearby areas to have similar characteristics due to shared environmental, social, or demographic factors.

Mathematical Formulation

For \(n\) regions with neighborhood structure defined by an adjacency graph, the Besag model specifies the conditional distribution:

\[x_i \mid x_j, i \neq j, \tau \sim \mathcal{N}\left(\frac{1}{n_i}\sum_{j \sim i} x_j, \frac{1}{n_i \tau}\right)\]

where:

\(x_i\) is the spatial random effect for region \(i\)
\(n_i\) is the number of neighbors of region \(i\)
\(j \sim i\) indicates that regions \(i\) and \(j\) are neighbors
\(\tau\) is the precision (controls the strength of spatial smoothing)

This specification means each region's value is drawn from a normal distribution centered on the average of its neighbors, with variance inversely proportional to the number of neighbors.

Precision Matrix

The joint distribution has precision matrix:

\[\mathbf{Q} = \tau \mathbf{R}\]

where \(\mathbf{R}\) is the structure matrix with entries:

\(R_{ii} = n_i\) (number of neighbors of region \(i\))
\(R_{ij} = -1\) if regions \(i\) and \(j\) are neighbors
\(R_{ij} = 0\) otherwise

Since \(\mathbf{R}\) is singular (has rank \(n-1\) for a connected graph), a sum-to-zero constraint \(\sum_i x_i = 0\) is typically applied.

When to Use Besag

The Besag model is appropriate for areal (lattice) data where you have discrete spatial units and expect spatial autocorrelation:

Disease Mapping

Mapping disease incidence or mortality rates across administrative regions. Nearby areas often share similar risk factors and health outcomes.

Ecological Studies

Species abundance, biodiversity indices, or environmental measurements across spatial units where neighboring areas have similar conditions.

Election Analysis

Voting patterns across constituencies or counties, where political preferences tend to be spatially clustered.

Socioeconomic Indicators

Income levels, crime rates, or educational attainment across census tracts or neighborhoods with spatial dependence.

When NOT to Use Besag

The Besag model is for areal data (discrete spatial units with adjacency). For point-referenced data (continuous locations), use geostatistical models like SPDE/Matern instead.

Specification in pyINLA

Allowed Parameters

Parameter	Required	Default	Description
`id`	Yes	-	Column name containing region indices (integers 1 to n)
`model`	Yes	-	Must be `'besag'`
`graph`	Yes	-	Neighborhood structure: dense / sparse adjacency matrix, an `InlaGraph`, or a path to a text-format `.graph` file
`scale.model`	No	`False`	Scale Q so the geometric mean of the marginal variances equals 1 (recommended with PC priors)
`adjust.for.con.comp`	No	`True`	Adjust for disconnected components in the graph
`constr`	No	`True`	Apply sum-to-zero constraint
`hyper`	No	loggamma	Prior specification for precision: `[{prior, param, initial, fixed}]`
`n`	No	auto	Number of areas (auto-detected from the graph if omitted)
`values`	No	auto	Explicit list of allowed area-id values (mirrors `graph` size)
`extraconstr`	No	-	Extra linear constraints: `{'A': matrix, 'e': vector}` where `A @ x = e`
`rankdef`	No	auto	Explicit rank-deficiency override (default uses graph's connected-component count)
`diagonal`	No	`0.0`	Extra constant on the precision diagonal (numerical stability)
`weights`	No	-	Per-observation weighting vector applied to the random-effect contribution. Length must equal the number of observations.

Per-Parameter Runnable Examples

Click any parameter to expand a minimal runnable snippet. All snippets assume the shared setup block below (8 areas arranged as a 4×2 grid, 80 observations).

# shared setup for every snippet below
import numpy as np
import pandas as pd
from pyinla import pyinla

rng = np.random.default_rng(42)
n_areas = 8            # 4x2 grid of areas
nobs    = 80           # observations spread across the 8 areas
df = pd.DataFrame({
    'y':    rng.standard_normal(nobs),
    'x':    rng.standard_normal(nobs),
    'area': rng.integers(1, n_areas + 1, size=nobs),
})

# Adjacency for the 4x2 grid:
#   1 - 2 - 3 - 4
#   |   |   |   |
#   5 - 6 - 7 - 8
adj = np.array([
    [0,1,0,0, 1,0,0,0],
    [1,0,1,0, 0,1,0,0],
    [0,1,0,1, 0,0,1,0],
    [0,0,1,0, 0,0,0,1],
    [1,0,0,0, 0,1,0,0],
    [0,1,0,0, 1,0,1,0],
    [0,0,1,0, 0,1,0,1],
    [0,0,0,1, 0,0,1,0],
], dtype=int)

id + model + graph required trio (minimal call)

# 'id' is the column holding the area index, 'graph' is the adjacency.
# 'n' auto-detected from the graph; constr=True by default.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

graph alternative formats (sparse, dense, file path)

# pyinla accepts the adjacency in three forms:
#   (1) a dense numpy adjacency matrix (above)
#   (2) a scipy.sparse matrix (efficient for large graphs)
#   (3) a path to a graph text file (first line n, then "i k nb_1 ... nb_k" per node)
import scipy.sparse as sp

# (2) sparse
adj_sparse = sp.csr_matrix(adj)

# (3) graph file. Text format: first line n; then "i k nb_1 ... nb_k" per node.
graph_file = '/path/to/regions.graph'

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj_sparse}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

hyper override the precision prior

# Default: prior='loggamma', param=[1, 0.00005], initial=4, fixed=False.
# Below: PC prior on the precision (recommended when also using scale.model=True).
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'pc.prec', 'param': [1, 0.01]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

n explicit number of areas (auto-detected by default)

# pyinla auto-derives n from the graph size; set 'n' explicitly only if
# you want a hard sanity-check or your spec needs to enforce a specific count.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'n': 8}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

constr sum-to-zero constraint (default True)

# besag is intrinsic (rank deficient by 1 per connected component), so
# the default constr=True pins the level. Set constr=False to opt out;
# drop the intercept since the latent field will then absorb the level.
formula = {
    'response': 'y',
    'fixed': ['0', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'constr': False}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

scale.model geometric-mean variance scaling (recommended with PC priors)

# Scales R so the geometric mean of the marginal variances is 1, making
# the prior's magnitude interpretation independent of the graph structure.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'scale.model': True}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

adjust.for.con.comp sum-to-zero per connected component (default True)

# When the graph has multiple disconnected components, the default
# True adds one sum-to-zero constraint PER component (rank deficiency =
# #components). Setting it to False uses a single global sum-to-zero.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'adjust.for.con.comp': False}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

values explicit list of area-id values

# Pin the area-id grid explicitly. Useful for safety against data with
# missing levels: the latent field will be estimated for every listed id.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'values': list(range(1, 9))}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

extraconstr additional linear constraints \(\mathbf{A}\mathbf{x} = \mathbf{e}\)

# When 'values' is supplied (explicit area-id grid), A is sized (k, n)
# where n is len(values). Below: pin the first and last areas;
# A has shape (2, 8).
n = 8
A = np.zeros((2, n))
A[0, 0]     = 1.0
A[1, n - 1] = 1.0
e = np.array([0.0, 0.0])
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'values':      list(range(1, n + 1)),
                 'extraconstr': {'A': A, 'e': e}}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

diagonal extra constant on the precision diagonal

# Small numerical-stability jitter. Rarely needed for besag with a proper prior.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'diagonal': 1e-4}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

rankdef explicit rank-deficiency override

# By default pyinla computes the rank deficiency from the graph: it equals
# the number of connected components when adjust.for.con.comp=True (one
# null direction per component), or 1 otherwise. Override only when you
# know the true rank deficiency (e.g. supplying a non-standard structure
# matrix where the auto-count would be wrong).
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'rankdef': 1}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

weights per-observation weighting

# Per-observation weight vector applied to the besag contribution.
# Length must equal the number of observations.
w = rng.uniform(0.5, 1.5, size=nobs)
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'weights': w}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

Hyperparameters

The Besag model has one hyperparameter: the precision \(\tau\) controlling the strength of spatial smoothing. Higher precision means stronger smoothing (more similar neighbors).

Internal Representation

The hyperparameter is represented on the log scale for numerical stability:

\[\theta_1 = \log(\tau)\]

Default Configuration

Parameter	Default Value	Description
`prior`	`'loggamma'`	Log-gamma prior on precision
`param`	`[1, 0.00005]`	Shape and rate parameters for the gamma prior
`initial`	`4`	Starting value for \(\theta = \log(\tau)\); corresponds to \(\tau \approx 54.6\)
`fixed`	`False`	Whether to estimate (False) or fix (True) the precision

Customizing Hyperparameters

Override defaults using the hyper key (list with a single dict matching pyinla's list-of-dicts convention):

'random': [
    {
        'id': 'area',
        'model': 'besag',
        'graph': adj,
        'hyper': [{
            'prior':   'pc.prec',
            'param':   [1, 0.01],    # P(sigma > 1) = 0.01
            'initial': 4,
            'fixed':   False,
        }],
    }
]

Model Options: Conceptual Notes

The runnable code for each option lives in the per-parameter catalog above. The notes here cover what each option means semantically.

scale.model

With scale.model=True the precision matrix \(\mathbf{R}\) is rescaled so that the geometric mean of the marginal variances equals 1, which makes the interpretation of \(\tau\) (and any prior placed on it) independent of the graph size and connectivity. This is the canonical pairing with pc.prec, where the prior is parameterized via the marginal standard deviation \(\sigma = 1/\sqrt{\tau}\).

adjust.for.con.comp

When the graph has more than one connected component, the default adjust.for.con.comp=True treats each component independently:

constr=True is enforced as a separate sum-to-zero on each component of size \(\geq 2\).
rankdef is set automatically to the number of components.
Singleton nodes (no neighbors) get a standard Normal prior under scale.model=True and an improper uniform under scale.model=False.

Setting adjust.for.con.comp=False collapses this to a single global sum-to-zero (rank deficiency = 1), even when the graph has multiple components.

constr

The intrinsic Besag precision is rank-deficient, so constr=True is the default to ensure identifiability when an intercept is present. Setting constr=False removes the constraint and shifts the level to the latent field: be sure to drop the intercept ('fixed': ['0', ...]) to keep the model identified.

Implementation Details

The behavior depends on the combination of scale.model and adjust.for.con.comp options. Let \(\mathbf{Q} = \tau \mathbf{R}\) be the precision matrix.

scale.model=False, adjust.for.con.comp=False

The constraint constr=True is interpreted as a single sum-to-zero constraint over the whole graph. Singleton nodes (no neighbors) have a uniform distribution on \((-\infty, \infty)\) before the constraint.

scale.model=True, adjust.for.con.comp=False

The constraint constr=True applies over the whole graph. The structure matrix \(\mathbf{R}\) (excluding singletons) is scaled so the geometric mean of marginal variances equals 1. Singletons are modified to have a standard Normal distribution.

scale.model=False, adjust.for.con.comp=True

The constraint constr=True is interpreted as one sum-to-zero constraint on each connected component of size \(\geq 2\). Singletons have a uniform distribution on \((-\infty, \infty)\).

scale.model=True, adjust.for.con.comp=True (Recommended)

The constraint constr=True applies separately to each connected component of size \(\geq 2\). The structure matrix \(\mathbf{R}\) is scaled so the geometric mean of marginal variances in each connected component equals 1. Singletons are modified to have a standard Normal distribution.

Available Priors

Default and headline priors

The Besag log-precision \(\theta_1 = \log(\tau)\) defaults to a log-gamma prior, with shape and rate parameters:

\[\tau \sim \text{Gamma}(1,\ 0.00005)\]

The PC (Penalised Complexity) prior on the standard deviation \(\sigma = 1/\sqrt{\tau}\) is the recommended alternative whenever scale.model=True is set, parameterized via

\[P(\sigma > u) = \alpha.\]

Prior catalog (one snippet per family; 1:1 with the besag parity scenarios)

Click any prior family to expand a minimal runnable snippet. All snippets reuse the shared setup block from the Specification section (8-area 4×2 grid in adj; nobs = 80).

prec · loggamma classical gamma prior on \(\tau\) (default)

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior':   'loggamma',
                            'param':   [1, 0.01],
                            'initial': 2,
                            'fixed':   False}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · pc.prec PC prior on \(\sigma = 1/\sqrt{\tau}\)

# P(sigma > 1) = 0.01: "I think the spatial SD is unlikely to exceed 1".
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'pc.prec', 'param': [1, 0.01]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · fixed pin the precision at a known value

import math
# tau is held fixed at 10 (initial value on the log scale).
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'initial': math.log(10),
                            'fixed':   True}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · flat improper flat on \(\theta_1\)

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'flat', 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · logtnormal half-normal on \(\tau\) (log scale)

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'logtnormal',
                            'param': [0.0, 1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · normal / gaussian normal on \(\theta_1\)

# 'gaussian' is accepted as an alias for 'normal' (same prior).
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'normal', 'param': [0.0, 1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · laplace double-exponential on \(\theta_1\)

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'laplace',
                            'param': [0.0, 1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · pc.mgamma PC prior, modified-gamma variant

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'pc.mgamma', 'param': [1.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · jeffreystdf Jeffreys-type reference prior

formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': 'jeffreystdf', 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · expression: … custom log-density (R expression)

# A custom log-prior supplied as an R expression string. The body is
# evaluated by the C engine; `theta` is the internal-scale variable.
expr = "expression: log_dens = -0.5 * theta * theta; log_dens;"
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': expr, 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

prec · table: … custom tabulated log-density

# Inline tabulated prior: alternating (theta, log_density) pairs.
# The C engine requires at least 4 grid points.
table = "table: -6 -18 -2 -2 2 -2 6 -18"
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'area', 'model': 'besag', 'graph': adj,
                 'hyper': [{'prior': table, 'param': []}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

Technical Notes

The term \(\frac{1}{2}\log(|\mathbf{R}|^{*})\) of the normalization constant is not computed. Add this to the log marginal likelihood estimate if needed, where \(\mathbf{R}\) is the precision matrix with unit precision parameter.
For graphs with disconnected components, use adjust.for.con.comp=True to handle each component separately.
When using scale.model=True, the interpretation of the precision parameter becomes consistent across different graph structures, making prior specification more intuitive.

Related Models

IID - Unstructured random effects (often combined with Besag for BYM-style models)
Generic0 - Custom precision structure specification