Random Effects

RW1 Model

Random Walk of order 1 for modeling temporal or spatial dependence where adjacent values are similar, with first differences following a Gaussian distribution.

← Back to Random Effects

Overview

The RW1 model (Random Walk of order 1) is a fundamental model for capturing smooth temporal or spatial variation. It assumes that:

  • First differences are independent - \(x_{i+1} - x_i \sim N(0, 1/\tau)\)
  • Adjacent values are correlated - nearby time points or locations share information
  • Intrinsically Gaussian - forms a Gaussian Markov Random Field (GMRF)

RW1 is ideal for modeling trends and piecewise linear patterns in ordered data, such as time series, age effects, or spatial transects.

Mathematical Formulation

For \(n\) ordered positions, the RW1 model specifies:

\[x_{i+1} - x_i \stackrel{\text{iid}}{\sim} \mathcal{N}(0, \tau^{-1}), \quad i = 1, \ldots, n-1\]

This can be written equivalently as:

\[x_{i+1} \mid x_i \sim \mathcal{N}(x_i, \tau^{-1})\]

where:

  • \(x_i\) is the random effect at position \(i\)
  • \(\tau\) is the precision of the increments (higher = smoother)
  • \(\sigma = 1/\sqrt{\tau}\) is the standard deviation of the increments

The precision matrix for the full vector \(\mathbf{x} = (x_1, \ldots, x_n)^\top\) is:

\[\mathbf{Q} = \tau \begin{pmatrix} 1 & -1 & & \\ -1 & 2 & -1 & \\ & \ddots & \ddots & \ddots \\ & & -1 & 2 & -1 \\ & & & -1 & 1 \end{pmatrix}\]

This tridiagonal structure makes RW1 computationally efficient, even for large \(n\).

Identifiability Note

RW1 is an improper prior - the precision matrix is rank-deficient (rank \(n-1\)). By default, INLA adds a sum-to-zero constraint (constr=True) to make it identifiable. If you disable this constraint, you should remove the intercept from your model.

When to Use RW1

The RW1 model is appropriate when you have ordered data where:

Time Series Trends

Capture step-wise temporal evolution where the level persists between time points but can shift. Good for trend extraction in epidemiological surveillance or economic data.

Age-Period Effects

Model how effects change with age, time period, or cohort. Adjacent ages/periods are expected to have similar effects.

Spatial Transects

One-dimensional spatial variation along a gradient (e.g., elevation, depth, distance from source). Adjacent positions share information.

Step-like Covariate Effects

Effects of an ordered categorical covariate where neighboring categories should be similar but the function need not be smooth. For genuinely smooth function estimation of a continuous covariate, prefer RW2.

RW1 vs RW2

Use RW1 when you expect locally constant behavior: a random walk whose level can shift between adjacent positions (penalizes first differences). Use RW2 when you expect locally linear behavior: smoother curves where the slope changes gradually (penalizes second differences).

Visual Understanding: How RW1 Works

RW1 creates a correlated random walk by linking adjacent values: each step is a Gaussian increment whose variance is controlled by the precision \(\tau\). Higher precision means smaller increments and a tighter trajectory; lower precision allows the walk to wander further between adjacent positions.

Random Walk Realizations

Each curve below is a single realization of an RW1 process. At high precision the trajectory stays tight; at low precision it can wander, with occasional large excursions.

Effect of Precision on Smoothness

The precision \(\tau\) controls how much the curve can wiggle. Higher precision = smoother trends.

First Differences: The Building Block

RW1 is defined by its first differences \(\Delta x_i = x_{i+1} - x_i\). These differences are IID normal - the model's single hyperparameter controls their variance.

RW1 vs IID: Side-by-Side Comparison

Compare the correlated RW1 (each step depends on the previous) with IID (every value drawn independently). The same number of parameters produces very different patterns.

Correlation Structure: Why RW1 has Local Dependence

The key difference between IID and RW1 is their correlation structure. In RW1, nearby positions share information - earlier positions "remember" their influence on later ones.

IID: Only diagonal is correlated
(each group independent)

RW1: Correlation decays with distance
(nearby positions similar)

Reading the matrix:
  • Diagonal = 1 (self-correlation)
  • Darker = higher correlation
  • RW1: correlation spreads from diagonal

Linear vs Cyclic: Topology of the Neighbor Graph

Setting 'cyclic': True changes the neighbor graph from a line to a circle: position n becomes a neighbor of position 1. This adds one extra increment, x1xn, to the prior, so values at the start and end of the index are tied together. Same single hyperparameter τ, same intrinsic structure, just one extra link in the graph.

Linear (default): each position has at most two neighbors. Positions 1 and 10 are endpoints with only one neighbor each.

12345678910

Precision τ on each of the 9 increments xi+1xi.

Cyclic (cyclic=True): position 10 wraps to position 1. The pink dashed edge is the extra link.

12345678910wrap edge: 10 ↔ 1

Precision τ on all 10 increments, including x1x10.

A sample path of 50 points makes the difference visible: the linear walk drifts (start and end fall at different heights), while the cyclic walk is constrained to come back to its starting value.

Linear sample path (start ≠ end)

i=1i=50startend

End point (red) sits well above / below the start (green). The walk is free to drift.

Cyclic sample path (start = end)

i=1i=50startend

End point lands at the same height as the start (both green). The extra wrap-edge increment x1xn in the prior pulls them together.

Use cyclic=True when the index wraps naturally:
  • hour-of-day (0 and 23 are adjacent)
  • day-of-week (Sunday and Monday)
  • month-of-year (December and January)
  • angular / directional variables (0° and 360°)
Side effect on the rank-deficiency: a linear RW1 on n points has rank n − 1 (one null direction: the overall level). A cyclic RW1 also has rank n − 1 with the same null direction. Both need constr=True (sum-to-zero), which is the pyINLA default for this family.

When Data Looks Like...

Pattern in Data Suggested Model Reason
Step-wise trend over time RW1 Adjacent time points share information; level can persist or shift
Sudden level shifts RW1 (low precision) RW1 allows jumps while maintaining local structure
Smooth curves (no kinks) RW2 RW2 penalizes changes in slope, producing locally linear behavior
Random scatter by group IID No temporal/spatial structure to exploit
Seasonal patterns RW1 + Seasonal Combine trend with periodic component

Specification in pyINLA

Allowed Parameters

The RW1 random-effect spec dict accepts the following keys. The source of truth is model_defs/latent/rw1.py's ALLOWED_KEYS set, mirrored 1:1 here.

Parameter Required Description
Core (model-specific)
model Yes Must be 'rw1'.
id Yes Column name containing time/position indices (integers).
hyper No List with one dict: [{prior, param, initial, fixed}] for the log-precision \(\theta\).
n No Number of positions (auto-detected if omitted).
constr No Sum-to-zero constraint. Default: True.
cyclic No Circular/periodic random walk. Default: False.
scale.model No Scale precision matrix to have generalized variance 1. Default: False.
values No Custom location grid (list of integers); allows prediction at unobserved locations.
extraconstr No Extra linear constraints: {'A': matrix, 'e': vector} where \(\mathbf{A}\mathbf{x} = \mathbf{e}\).
diagonal No Extra constant added to the precision-matrix diagonal for numerical stability. Default: 0.0.
Cross-model common keys
weights No Per-observation weighting vector applied to the random-effect contribution. Length must equal the number of observations.

Per-Parameter Runnable Examples

Click any parameter to expand a minimal runnable snippet. All snippets assume the shared setup block below (\(n = 30\) positions, \(\text{nobs} = 80\) observations on an integer time index 1..30).

# shared setup for every snippet below
import numpy as np
import pandas as pd
from pyinla import pyinla

rng = np.random.default_rng(42)
n    = 30           # number of ordered positions
nobs = 80           # observations
df = pd.DataFrame({
    'y':    rng.standard_normal(nobs),
    'x':    rng.standard_normal(nobs),
    'time': rng.integers(1, n + 1, size=nobs),
})
id + model   required pair (minimal call)
# Bare rw1: 'id' is the column holding the ordered position index.
# 'n' is auto-detected from max(id); constr=True is on by default.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1'}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
hyper   override the loggamma prior on log-precision
# Default: prior='loggamma', param=[1, 0.00005], initial=4, fixed=False.
# Below: a more informative loggamma and a different starting value.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'hyper': [{'prior': 'loggamma',
                            'param': [1, 0.01],
                            'initial': 2,
                            'fixed': False}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
n + values   extend the position grid for prediction at unobserved locations
# Set 'n' larger than max(id) AND supply 'values' to pin the explicit
# location grid. Latent positions 1..40 are estimated even though only
# positions 1..30 appear in the data.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'n': n + 10,
                 'values': list(range(1, n + 11))}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
constr   disable the sum-to-zero constraint (also drop the intercept)
# Default is constr=True. Disabling it leaves the latent field's level
# unidentifiable, so the intercept ('1') MUST be removed from 'fixed'.
formula = {
    'response': 'y',
    'fixed': ['0', 'x'],          # no intercept
    'random': [{'id': 'time', 'model': 'rw1',
                 'constr': False}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
cyclic   circular random walk (position n wraps to position 1)
# When cyclic=True the latent field wraps: x_n is a neighbor of x_1.
# The sum-to-zero constraint is removed automatically by the engine.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'cyclic': True}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
scale.model   scale Q to generalized variance 1
# Rescales the precision so the latent field's generalized variance
# is 1 regardless of n. Useful for making the prior on the
# log-precision comparable across rw1 components of different sizes.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'scale.model': True}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
values   explicit location grid
# Pin the full ordered set of positions explicitly. The list may extend
# beyond the observed range to enable prediction at new positions.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'values': list(range(1, n + 11))}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
extraconstr   additional linear constraints \(\mathbf{A}\mathbf{x} = \mathbf{e}\)
# The expected number of columns in A depends on how the latent grid is anchored:
#
# (A) Auto-detected grid (no 'values', no 'n'):
#       n_eff = len(unique(time))
#       A has shape (k, n_eff - 1)   # sum-to-zero augmentation reduces dim by 1
#     This is data-dependent: missing positions in 'time' shrink n_eff.
#
# (B) Anchored grid (supply BOTH 'n' AND 'values'):
#       n_eff = n   (length of the values list)
#       A has shape (k, n)           # full field dimension
#     Robust against sparse data; recommended for prediction-style usage.
#
# Below: case (B) pins the first and last positions to zero. With n=30,
# A has shape (2, 30); 'e' has length 2.
A = np.zeros((2, n))
A[0, 0]     = 1.0
A[1, n - 1] = 1.0
e = np.array([0.0, 0.0])
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'n':           n,
                 'values':      list(range(1, n + 1)),
                 'extraconstr': {'A': A, 'e': e}}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
diagonal   extra constant on the precision diagonal
# Small numerical-stability jitter. Rarely needed for rw1 with a proper prior.
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'diagonal': 1e-4}],
}
result = pyinla(formula=formula, family='gaussian', data=df)
weights   per-observation weighting
# Per-observation weight vector applied to the rw1 contribution.
# Length must equal the number of observations.
w = rng.uniform(0.5, 1.5, size=nobs)
formula = {
    'response': 'y',
    'fixed': ['1', 'x'],
    'random': [{'id': 'time', 'model': 'rw1',
                 'weights': w}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

Index Handling

The id column should contain integer values representing ordered positions. pyINLA maps these to consecutive indices internally:

Input Behavior
[1, 2, 3, 4, 5] Standard consecutive indices - works directly
[2000, 2001, 2002, ...] Year values - mapped to positions 1, 2, 3, ...
[1, 1, 2, 2, 3, 3] Repeated indices - multiple observations per time point
[1, NA, 3, NA, 5] NA/NaN values - observation excluded from this random effect

Key points:

  • Indices define the order - the RW1 structure depends on which positions are adjacent
  • Multiple observations can share the same index (e.g., multiple measurements at the same time point)
  • Gaps in indices are handled automatically (positions are mapped consecutively)

Missing values (NaN)

pyINLA accepts np.nan / NA in two places without error:

  • NaN in the time column: that observation contributes nothing to the rw1 component. The covariates file records a -1 sentinel for those rows. The remaining observations drive the latent field as usual.
  • NaN in the weights vector: also accepted. The affected rows contribute no weighted rw1 effect.

Both NaN positions and NaN weights interact cleanly with the other rw1 keys (constr, cyclic, explicit values). See the nan_in_idx_* and nan_in_weights cases in the rw1 parity suite for the verified combinations.

Hyperparameters

The RW1 model has one hyperparameter: the precision \(\tau\) of the first differences. In pyINLA, we work with the log-transformed precision \(\theta = \log(\tau)\).

Default Configuration

Parameter Default Value Description
prior 'loggamma' Log-gamma prior on precision (gamma prior on \(\tau\))
param [1, 0.00005] Shape and rate parameters for the gamma prior
initial 4 Starting value for \(\theta = \log(\tau)\); corresponds to \(\tau \approx 54.6\)
fixed False Whether to estimate (False) or fix (True) the precision

Customizing Hyperparameters

Override defaults using the hyper key:

'random': [
    {
        'id': 'time',
        'model': 'rw1',
        'hyper': [{
            'prior': 'loggamma',
            'param': [1, 0.01],      # More informative prior
            'initial': 2,             # Different starting point
            'fixed': False
        }]
    }
]

Available Priors

Log-Gamma Prior (Default)

The default loggamma prior places a gamma distribution on the precision \(\tau\):

\[\tau \sim \text{Gamma}(a, b)\]

With default param=[1, 0.00005], this is a weakly informative prior allowing both rough and smooth curves.

# Default loggamma prior
'hyper': [{'prior': 'loggamma', 'param': [1, 0.00005]}]

# More informative: expect moderate smoothing
'hyper': [{'prior': 'loggamma', 'param': [1, 0.01]}]

PC Prior (Penalised Complexity)

The pc.prec prior penalizes deviation from the base model (constant effect). Parameterized via the marginal standard deviation:

\[P(\sigma > u) = \alpha\]

# PC prior: P(sigma > 1) = 0.01
# "I believe the increment SD is unlikely to exceed 1"
'hyper': [{'prior': 'pc.prec', 'param': [1, 0.01]}]

# PC prior: P(sigma > 0.5) = 0.05
# Expect a smooth, slowly-varying trend
'hyper': [{'prior': 'pc.prec', 'param': [0.5, 0.05]}]

Fixed Precision

To fix the precision at a known value:

import math

# Fix precision at tau = 100 (very smooth)
'hyper': [{
    'initial': math.log(100),  # log(tau) = 4.605
    'fixed': True
}]

Sum-to-Zero Constraint

By default, RW1 includes a sum-to-zero constraint (constr=True) for identifiability with an intercept. You can disable this:

# Disable constraint (remove intercept from model)
model = {
    'response': 'y',
    'fixed': ['0', 'x'],  # No intercept
    'random': [
        {
            'id': 'time',
            'model': 'rw1',
            'constr': False
        }
    ]
}

When to Disable Constraint

Only set constr=False if you remove the intercept from your model. Otherwise, the model is not identifiable and results may be unreliable.

Interpreting Results

Hyperparameter Summary

After fitting, access the precision estimate via result.summary_hyperpar:

print(result.summary_hyperpar)
#                      mean        sd  0.025quant  0.5quant  0.975quant     mode
# Precision for time  45.23   12.34      24.56     43.21      74.89     41.02

Higher precision = smoother trend (smaller increments). To convert to the posterior of the standard deviation of increments, transform the precision marginal with pyinla.tmarginal. Plugging 1/sqrt(mean(tau)) in directly is biased: by Jensen's inequality the convex map \(\tau \mapsto 1/\sqrt{\tau}\) gives \(1/\sqrt{E[\tau]} \le E[1/\sqrt{\tau}]\), so the naive expression underestimates \(E[\sigma]\).

tau_marg = result.marginals_hyperpar['Precision for time']
sigma_marg = pyinla.tmarginal(lambda tau: 1 / np.sqrt(tau), tau_marg)
sigma_summary = pyinla.zmarginal(sigma_marg)  # mean, sd, quantiles of the increment SD
print(f"Posterior mean of increment SD: {sigma_summary['mean']:.4f}")

For a quick "central value" of \(\sigma\) only, the median is exact under any monotone transform (no marginal transform needed):

tau_median = result.summary_hyperpar['0.5quant'].values[0]
sigma_median = 1 / np.sqrt(tau_median)
print(f"Posterior median of increment SD: {sigma_median:.4f}")

Random Effect Estimates

The posterior estimates for each time point are in result.summary_random:

# Temporal trend estimates
trend = result.summary_random['time']
print(trend)
#    ID      mean        sd  0.025quant  0.5quant  0.975quant      mode
# 0   1 -0.234567  0.089123   -0.409258 -0.234567   -0.059876 -0.234567
# 1   2 -0.123456  0.081234   -0.282345 -0.123456    0.035433 -0.123456
# ...

# Plot the trend
import matplotlib.pyplot as plt
plt.fill_between(trend['ID'], trend['0.025quant'], trend['0.975quant'], alpha=0.3)
plt.plot(trend['ID'], trend['mean'], 'b-')
plt.xlabel('Time')
plt.ylabel('RW1 Effect')

Related Models

  • RW2 - Random Walk of order 2 (smoother, penalizes second differences)
  • IID - Independent random effects (no temporal/spatial structure)