Advanced

Posterior Sampling

Draw samples from the joint posterior distribution for inference beyond marginals: useful for derived quantities, predictions, and Monte Carlo integration.

Why Sample from the Posterior?

While INLA provides accurate marginal posteriors through numerical integration, sometimes you need samples from the joint posterior distribution. Common use cases include:

How pyINLA Generates Samples

pyINLA computes deterministic approximations to the posterior first, then generates samples from those approximations. This is fast, but means samples are approximate: they come from the Gaussian (or skew-corrected) approximation to the true posterior.

Sampling Functions

pyINLA provides four sampling functions:

pyinla.qsample()

Sample from a Gaussian Markov Random Field (GMRF) given a precision matrix Q. The fundamental building block for all other sampling.

pyinla.posterior_sample()

Generate samples from the full joint posterior of hyperparameters and latent field, using stored configurations from model fitting.

pyinla.posterior_sample_eval()

Evaluate a function on posterior samples. Extract specific elements or compute derived quantities.

pyinla.hyperpar_sample()

Sample only the hyperparameters from their joint posterior, using eigendecomposition of the covariance matrix.

qsample: GMRF Sampling

pyinla.qsample() draws samples from a Gaussian Markov Random Field (GMRF) defined by a precision matrix \(\mathbf{Q}\):

$$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{Q}^{-1})$$

How GMRF Sampling Works

The INLA binary uses a Cholesky factorization approach:

  1. Compute Cholesky factor: \(\mathbf{Q} = \mathbf{L}\mathbf{L}^T\)
  2. Generate standard normal: \(\mathbf{z} \sim \mathcal{N}(0, \mathbf{I})\)
  3. Solve: \(\mathbf{L}^T \mathbf{x} = \mathbf{z}\) via back-substitution
  4. Add mean: \(\mathbf{x} \leftarrow \mathbf{x} + \boldsymbol{\mu}\)

This exploits the sparse structure of Q for efficient sampling.

Parameters

ParameterTypeDescription
nintNumber of samples to generate
Qndarray, sparse, or strPrecision matrix (positive definite). Accepts a dense ndarray, any scipy.sparse format, or a path to a binary fmesher file.
mundarray, optionalMean vector (default: zero)
bndarray, optionalLinear term in the log-density \(-\tfrac{1}{2}(x-\mu)^\top Q (x-\mu) + b^\top x\). Shifts the posterior mean by \(Q^{-1} b\) (canonical / information form).
constrdict, optionalLinear constraints: {'A': matrix, 'e': vector} for \(\mathbf{A}\mathbf{x} = \mathbf{e}\)
samplendarray, optionalIf provided, compute the log-density of these samples instead of generating new ones.
compute_meanbool, optionalIf True, also return the (constrained) posterior mean.
reorderingstrCholesky reordering algorithm: 'auto' (default), 'amd', 'metis', 'band', 'identity'.
seedint, optionalRandom seed for reproducibility. Note: bit-exact reproducibility requires also pinning num_threads (e.g. num_threads='1:1'); the seed alone is not enough if the thread count varies across runs.
logdensboolIf True, also return log-densities
selectionarray, optionalIndices of elements to return (1-based)

Per-Parameter Runnable Examples

Click any parameter to expand a minimal runnable snippet. All snippets share the setup below; each chunk redefines Q as appropriate.

# shared setup for every qsample snippet below
import numpy as np
from scipy import sparse
import pyinla
n + Q   required minimal call
# Diagonal precision matrix (independent components):
# precision = 5 means variance = 0.2 per component.
Q = np.eye(5) * 5
samples = pyinla.qsample(n=1000, Q=Q)
print(f"Shape:    {samples.shape}")        # (5, 1000)
print(f"Variance: {np.var(samples, axis=1)}")  # ~0.2
mu   non-zero mean vector
Q  = np.eye(5) * 10
mu = np.array([1, 2, 3, 4, 5], dtype=float)
samples = pyinla.qsample(n=100, Q=Q, mu=mu, seed=42)
print(samples.mean(axis=1).round(2))  # empirical means ~ mu
constr   linear constraint A x = e
Q = np.eye(5) * 10

# --- Single constraint: A is 1 x 5, e is length 1 ---
# Sum-to-zero: x[0] + x[1] + x[2] + x[3] + x[4] = 0
constr = {'A': np.ones((1, 5)), 'e': np.array([0.0])}
samples = pyinla.qsample(n=100, Q=Q, constr=constr, seed=42)
print(np.sum(samples, axis=0)[:5].round(6))  # all ~0

# --- Multiple constraints: A is k x 5, e is length k ---
# Row 1: sum-to-zero;  Row 2: x[0] - x[4] = 2
A = np.array([
    [1.0,  1.0,  1.0,  1.0,  1.0],
    [1.0,  0.0,  0.0,  0.0, -1.0],
])
e = np.array([0.0, 2.0])
samples = pyinla.qsample(n=100, Q=Q, constr={'A': A, 'e': e}, seed=42)
print("sum:    ", samples.sum(axis=0)[:5].round(6))   # all ~0
print("x0 - x4:", (samples[0] - samples[4])[:5].round(6))  # all ~2
seed   reproducibility (pair with num_threads for bit-exact)
Q = np.eye(3) * 4

# seed alone: reproducible up to floating-point thread-scheduling noise.
a = pyinla.qsample(n=10, Q=Q, seed=7)
b = pyinla.qsample(n=10, Q=Q, seed=7)
print("byte-equal (seed only):    ", np.array_equal(a, b))

# Add num_threads='1:1' to make it byte-exact across runs.
a2 = pyinla.qsample(n=10, Q=Q, seed=7, num_threads="1:1")
b2 = pyinla.qsample(n=10, Q=Q, seed=7, num_threads="1:1")
print("byte-equal (seed + 1:1):   ", np.array_equal(a2, b2))  # True
logdens   also return log-densities (returns a dict)
# Sparse tridiagonal Q: GMRF Cholesky exploits the sparsity.
n        = 100
diag     = np.ones(n) * 4
off_diag = -np.ones(n - 1)
Q = sparse.diags([off_diag, diag, off_diag], [-1, 0, 1], format='csc')

result = pyinla.qsample(n=50, Q=Q, seed=77, logdens=True)
print(list(result.keys()))               # ['sample', 'logdens', 'mean']
print(result['sample'].shape, result['logdens'].shape)
selection   return only a subset of components (1-based)
Q = np.eye(5) * 4
# Only return components 1, 3, 5 (1-based indices).
samples = pyinla.qsample(n=10, Q=Q, seed=1,
                            selection=np.array([1, 3, 5]))
print(samples.shape)  # (3, 10)
b   canonical / information form (linear term)
# Log-density is -1/2 (x - mu)^T Q (x - mu) + b^T x.
# With mu = 0, the posterior mean equals Q^-1 b.
Q = np.eye(5) * 4
b = np.array([1.0, -1.0, 0.5, -0.5, 0.0])
samples = pyinla.qsample(n=200, Q=Q, b=b, seed=1)
print("empirical mean:", samples.mean(axis=1).round(3))
print("expected Q^-1 b:", np.linalg.solve(Q, b).round(3))
compute_mean   also return the (constrained) posterior mean
# compute_mean=True returns a dict with 'sample', 'logdens', 'mean'.
# With a sum-to-zero constraint, the mean is mu projected onto sum=0.
Q      = np.eye(5) * 10
mu     = np.array([1, 2, 3, 4, 5], dtype=float)
constr = {'A': np.ones((1, 5)), 'e': np.array([0.0])}
out = pyinla.qsample(n=50, Q=Q, mu=mu, constr=constr,
                       seed=1, compute_mean=True)
print(list(out.keys()))            # ['sample', 'logdens', 'mean']
print(out['mean'].round(2))         # [-2, -1, 0, 1, 2] (sums to 0)
sample   recompute log-density for given samples (no new draws)
# Pass an existing matrix of samples to evaluate their log-density
# under N(mu, Q^-1). Useful for importance sampling and model comparison.
Q = np.eye(5) * 4
x = np.random.default_rng(0).standard_normal((5, 10))
out = pyinla.qsample(n=10, Q=Q, sample=x, logdens=True)
print(out['logdens'].shape)  # (10,)
print(out['logdens'].round(3))
reordering   pick the Cholesky reordering algorithm
# 'auto' picks the best heuristic; 'band' is optimal for band-structured Q;
# 'amd' / 'metis' minimise fill-in for general sparsity. Same result, different speed.
n = 50
Q = (np.eye(n) * 4
     + np.diag(-np.ones(n - 1), 1)
     + np.diag(-np.ones(n - 1), -1))
s_auto = pyinla.qsample(n=10, Q=Q, seed=1, reordering='auto')
s_band = pyinla.qsample(n=10, Q=Q, seed=1, reordering='band')
print(s_auto.shape, s_band.shape)  # both (50, 10)

Q in different formats

The same call accepts dense ndarrays, any scipy.sparse format, or a path to a binary fmesher file on disk. All three produce identical samples for the same Q.

Q as scipy csr_matrix (any sparse format works)
Q = sparse.eye(5).tocsr() * 5.0  # csc, csr, dia, coo all OK
samples = pyinla.qsample(n=20, Q=Q, seed=1)
print(samples.shape)  # (5, 20)
Q via sparse.diags(...) (band / tridiagonal builders)
# Build a tridiagonal Q without materialising it densely.
n        = 100
diag     = np.ones(n) * 4
off_diag = -np.ones(n - 1)
Q = sparse.diags([off_diag, diag, off_diag], [-1, 0, 1], format='csc')
samples = pyinla.qsample(n=50, Q=Q, seed=1)
print(samples.shape)  # (100, 50)
Q as a file path (binary fmesher format on disk)
# Useful when Q is large and pre-built (e.g. SPDE precision exported once,
# then re-used across many sampling calls without re-serialising).
import tempfile, os
from pyinla.fmesher_io import write_fmesher_file

Q   = sparse.eye(5).tocsc() * 5.0
tmp = tempfile.NamedTemporaryFile(delete=False, suffix='.dat')
tmp.close()
write_fmesher_file(Q, tmp.name)

samples = pyinla.qsample(n=20, Q=tmp.name, seed=1)
print(samples.shape)  # (5, 20)
os.unlink(tmp.name)

posterior_sample: Full Posterior Sampling

pyinla.posterior_sample() generates samples from the joint posterior of hyperparameters \(\boldsymbol{\theta}\) and latent field \(\mathbf{x}\):

$$(\boldsymbol{\theta}, \mathbf{x}) \sim \pi(\boldsymbol{\theta}, \mathbf{x} \mid \mathbf{y})$$
Requirement: You must fit the model with control={'compute': {'config': True}} to enable posterior sampling. This stores the configurations needed for sampling.

The Algorithm

The sampling algorithm follows these steps:

  1. Sample configurations weighted by log-posterior
    Each configuration \(k\) has a weight \(\propto \exp(\log \pi(\boldsymbol{\theta}^{(k)} \mid \mathbf{y}))\). Configurations with higher posterior probability are sampled more frequently.
  2. For each sampled config, call qsample
    Sample the latent field from the GMRF: \(\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}_k, \mathbf{Q}_k^{-1})\) where \(\mathbf{Q}_k\) is the precision matrix for config \(k\).
  3. Compute the linear predictor
    The linear predictor is computed as: \(\boldsymbol{\eta} = \mathbf{A} \mathbf{x} + \text{offsets}\) where \(\mathbf{A}\) is the design matrix mapping latent field to observations.
  4. Apply skewness correction
    If the posterior is skewed, apply a correction using the skew-normal distribution to improve the Gaussian approximation.
  5. Transform hyperparameters
    Convert from internal scale (e.g., log-precision) to user scale (e.g., precision), computing the Jacobian for proper density transformation.

What is the Log-Posterior?

Log-Posterior Computation

For each hyperparameter configuration \(\boldsymbol{\theta}_k\), INLA computes:

$$\log \pi(\boldsymbol{\theta}_k \mid \mathbf{y}) \propto \log \tilde{\pi}(\mathbf{y} \mid \boldsymbol{\theta}_k) + \log \pi(\boldsymbol{\theta}_k)$$

The marginal likelihood is approximated using the Laplace approximation:

$$\log \tilde{\pi}(\mathbf{y} \mid \boldsymbol{\theta}) = \log \pi(\mathbf{y}, \mathbf{x}^*) - \frac{1}{2}\log|\mathbf{H}| + \frac{n}{2}\log(2\pi)$$

where \(\mathbf{x}^*\) is the mode and \(\mathbf{H}\) is the Hessian (precision matrix) at the mode.

Parameters

ParameterTypeDescription
nintNumber of samples to generate
resultPyINLAResultFitted model (with config=True)
selectiondict, optionalSelect specific latent elements, e.g., {'(Intercept)': 1, 'x': 1}
seedint, optionalRandom seed for reproducibility. Note: bit-exact reproducibility requires also pinning num_threads (e.g. num_threads='1:1'); the seed alone is not enough if the thread count varies across runs.
internboolIf True, return hyperparameters on internal scale (default: False)
use_improved_meanboolUse VB-corrected mean (default: True)
skew_corrboolApply skewness correction (default: True)

Per-Parameter Runnable Examples

Click any parameter to expand a minimal runnable snippet. All snippets share the setup block below: a small fitted model with config=True. The same res is reused by the posterior_sample_eval and hyperpar_sample sections.

# shared setup for posterior_sample / posterior_sample_eval / hyperpar_sample
import numpy as np
import pandas as pd
import pyinla
from pyinla import pyinla as pyinla_fit

rng  = np.random.default_rng(42)
nobs = 100
x1   = rng.standard_normal(nobs)
x2   = rng.standard_normal(nobs)
y    = 1.5 + 2.0*x1 - 1.0*x2 + 0.5*rng.standard_normal(nobs)
df   = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2})

# config=True stores the configurations needed for posterior sampling.
res = pyinla_fit(
    model={'response': 'y', 'fixed': ['1', 'x1', 'x2']},
    family='gaussian', data=df,
    control={'compute': {'config': True}},
)
n + result   required minimal call
samples = pyinla.posterior_sample(n=50, result=res, seed=42)
print(f"len(samples) = {len(samples)}")
print(f"keys of one sample: {list(samples[0].keys())}")
# Each entry has 'hyperpar', 'latent', 'logdens'.
selection   subset of latent elements (dict form)
# Each key names a latent element; value is the count to keep.
samples = pyinla.posterior_sample(
    n=20, result=res, seed=42,
    selection={'(Intercept)': 1, 'x1': 1, 'x2': 1},
)
# With selection, sample['latent'] is a (k, 1) ndarray of selected values.
print(samples[0]['latent'].shape)         # (3, 1)
print(samples[0]['latent'].ravel())
seed   reproducibility
a = pyinla.posterior_sample(n=10, result=res, seed=99)
b = pyinla.posterior_sample(n=10, result=res, seed=99)
key = list(a[0]['hyperpar'].keys())[0]
print("same seed -> same first hyperpar:",
      a[0]['hyperpar'][key] == b[0]['hyperpar'][key])
intern   user scale vs internal scale
s_user   = pyinla.posterior_sample(n=20, result=res, seed=1, intern=False)
s_intern = pyinla.posterior_sample(n=20, result=res, seed=1, intern=True)
print(list(s_user[0]['hyperpar'].keys()))    # ['Precision for the Gaussian observations']
print(list(s_intern[0]['hyperpar'].keys()))  # ['Log precision for the Gaussian observations']
use_improved_mean   VB-corrected mean (default True)
# Default uses the VB-corrected posterior mean of the latent field as
# the qsample mu. Setting False uses the Laplace mode instead.
samples_default = pyinla.posterior_sample(n=20, result=res, seed=1)
samples_mode    = pyinla.posterior_sample(n=20, result=res, seed=1,
                                            use_improved_mean=False)
print(f"both succeeded: {len(samples_default)}, {len(samples_mode)}")
skew_corr   skew-normal correction (default True)
# skew_corr=False returns pure Gaussian draws (no skewness correction).
s_on  = pyinla.posterior_sample(n=20, result=res, seed=1, skew_corr=True)
s_off = pyinla.posterior_sample(n=20, result=res, seed=1, skew_corr=False)
print(f"both succeeded: {len(s_on)}, {len(s_off)}")

posterior_sample_eval: Evaluate Functions on Samples

pyinla.posterior_sample_eval() applies a function to each posterior sample to compute derived quantities:

$$g(\boldsymbol{\theta}, \mathbf{x})^{(i)} = f(\text{sample}_i)$$

Parameters

ParameterTypeDescription
funstr or callableVariable name to extract, or function taking keyword arguments
sampleslistOutput from posterior_sample()
return_matrixboolReturn as matrix (default: True)

Per-Parameter Runnable Examples

Reuses the same fitted res from the posterior_sample section. Build a sample list once, then evaluate per-parameter.

# Build the sample list once; reuse across the chunks below.
samples = pyinla.posterior_sample(n=200, result=res, seed=7)
fun = string   extract a single named variable
intercepts = pyinla.posterior_sample_eval("(Intercept)", samples)
print(f"shape: {intercepts.shape}")        # (1, 200)
print(f"mean:  {intercepts.mean():.4f}")
print(f"sd:    {intercepts.std():.4f}")
fun = callable   derived quantity across multiple variables
# A callable receives one sample at a time via **kwargs keyed by latent name.
def product_func(**kwargs):
    intercept = kwargs.get('(Intercept)', 0)
    slope     = kwargs.get('x1', 0)
    return intercept * slope

prod = pyinla.posterior_sample_eval(product_func, samples)
print(f"E[Intercept * x1]: {prod.mean():.4f}")
print(f"95% CI: [{np.percentile(prod, 2.5):.4f},",
      f"{np.percentile(prod, 97.5):.4f}]")
return_matrix = False   return a list instead of a matrix
# Default return_matrix=True stacks per-sample outputs into one ndarray.
# Set False to receive a plain Python list of per-sample evaluations.
out = pyinla.posterior_sample_eval("x1", samples, return_matrix=False)
print(f"type: {type(out).__name__}, len: {len(out)}")

hyperpar_sample: Hyperparameter Sampling

pyinla.hyperpar_sample() generates samples only from the hyperparameter posterior \(\pi(\boldsymbol{\theta} \mid \mathbf{y})\), without sampling the latent field:

Algorithm

Hyperparameter sampling uses eigendecomposition of the Hessian at the mode:

  1. Get mode \(\boldsymbol{\theta}^*\) and Hessian \(\mathbf{H}\) from INLA
  2. Compute eigendecomposition: \(\mathbf{H}^{-1} = \mathbf{V}\boldsymbol{\Lambda}\mathbf{V}^T\)
  3. Generate: \(\boldsymbol{\theta} = \boldsymbol{\theta}^* + \mathbf{V}\boldsymbol{\Lambda}^{1/2}\mathbf{z}\) where \(\mathbf{z} \sim \mathcal{N}(0, \mathbf{I})\)
  4. Optionally transform from internal to user scale

Parameters

ParameterTypeDescription
nintNumber of samples
resultPyINLAResultFitted model
internboolIf True, return on internal scale (default: False)
improve_marginalsboolIf True, refine the Gaussian marginal with stored skewness/kurtosis info (default: False)

Per-Parameter Runnable Examples

Reuses the same fitted res from the posterior_sample section.

n + result   required minimal call
samples = pyinla.hyperpar_sample(n=500, result=res)
print(f"type: {type(samples).__name__}, shape: {samples.shape}")
# Returns a pandas DataFrame, one row per sample, columns = hyperparameters.
intern   user scale vs internal scale
s_user   = pyinla.hyperpar_sample(n=500, result=res, intern=False)
s_intern = pyinla.hyperpar_sample(n=500, result=res, intern=True)
print(f"user-scale mean:   {s_user.mean().values}")    # precision
print(f"intern-scale mean: {s_intern.mean().values}")  # log-precision
improve_marginals   refine the Gaussian marginal (default False)
# When True, the sampler refines the Gaussian eigendecomposition with
# the stored skewness/kurtosis information from the marginal posterior.
samples = pyinla.hyperpar_sample(n=500, result=res,
                                  improve_marginals=True)
print(f"shape: {samples.shape}")

Comparing with the INLA Marginal

import matplotlib.pyplot as plt

samples = pyinla.hyperpar_sample(n=5000, result=res)
# marginals_hyperpar entries are DataFrames with columns 'x' (support)
# and 'y' (density).
marg = res.marginals_hyperpar['Precision for the Gaussian observations']

fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(samples.values.flatten(), bins=40, density=True,
        alpha=0.7, label='Samples', edgecolor='black')
ax.plot(marg['x'], marg['y'], 'r-', linewidth=2, label='INLA Marginal')
ax.set_xlabel('Precision'); ax.set_ylabel('Density'); ax.legend()
plt.show()

Skewness Correction

INLA stores skewness information for each latent field element. The sampling algorithm can correct for non-Gaussianity using a skew-normal transformation:

$$x_{\text{corrected}} = x_{\text{gaussian}} + \sigma \cdot \delta \cdot \left( |z| - \sqrt{\frac{2}{\pi}} \right) / \sqrt{1 - \frac{2\delta^2}{\pi}}$$

where \(\delta\) is the skew-normal parameter derived from the stored skewness.

Skewness Parameter Conversion

The skewness \(\gamma_1\) is converted to the skew-normal delta parameter:

$$\delta = \text{sign}(\gamma_1) \cdot \sqrt{\frac{\pi/2 \cdot |\gamma_1|^{2/3}}{|\gamma_1|^{2/3} + ((4-\pi)/2)^{2/3}}}$$

Skewness is clamped to \([-0.99, 0.99]\) to ensure valid delta values.

Controlling Skewness Correction

import pyinla

# With skewness correction (default)
samples_skew = pyinla.posterior_sample(n=100, result=res, skew_corr=True)

# Without skewness correction (pure Gaussian)
samples_gauss = pyinla.posterior_sample(n=100, result=res, skew_corr=False)

Internal vs User Scale

INLA represents hyperparameters on an internal scale (typically unbounded) during optimization, then transforms to user scale for reporting:

User ScaleInternal ScaleTransformation
Precision (\(\tau > 0\))Log-precision\(\tau = \exp(\theta)\)
Variance (\(\sigma^2 > 0\))Log-variance\(\sigma^2 = \exp(\theta)\)
Correlation (\(\rho \in (-1,1)\))Fisher-z\(\rho = \tanh(\theta)\)
Probability (\(p \in (0,1)\))Logit\(p = 1/(1+\exp(-\theta))\)

Reading the column names

When you switch intern=FalseTrue, the column names in hyperpar_sample's DataFrame (or the keys of posterior_sample()'s 'hyperpar' dict) change in a predictable way. The new name tells you which transformation was applied:

User-scale nameInternal-scale nameTransformation
'Precision for ...''Log precision for ...'exp
'Variance for ...''Log variance for ...'exp
'Rho for ...''Rho_intern for ...'tanh (Fisher-z)
'Probability for ...''Logit ...'sigmoid (inverse logit)

Concrete example: an AR1 random effect contributes two hyperparameters. On the user scale they appear as 'Precision for t' and 'Rho for t'; on the internal scale they become 'Log precision for t' and 'Rho_intern for t'. Reading the prefix is how you tell exp apart from tanh in the same model.

Jensen warning when comparing means across scales

For any nonlinear transformation \(f\) (i.e. all four above), the mean does not commute with the transform:

\[\mathbb{E}[f(\theta)] \;\neq\; f(\mathbb{E}[\theta]).\]

For exp (precision / variance), the exact relationship under a Gaussian-approximated internal posterior \(\theta \sim \mathcal{N}(\mu,\sigma^{2})\) is

\[\mathbb{E}[\tau] \;=\; \mathbb{E}[\exp(\theta)] \;=\; \exp\!\left(\mu + \tfrac{\sigma^{2}}{2}\right),\]

so np.exp(s_intern.mean()) is biased low by the \(\sigma^{2}/2\) term and does not equal s_user.mean(). Same shape of pitfall applies to tanh and the sigmoid: applying the transform to the mean is not the mean of the transformed samples.

Always transform element-wise first:

# good: applies exp to each sample, then averages
np.exp(s_intern['Log precision for t']).mean()   # ~= s_user['Precision for t'].mean()

# bad:  applies exp to the mean of log-precisions
np.exp(s_intern['Log precision for t'].mean())   # biased low
Jacobian correction: When transforming from internal to user scale, the log-density is adjusted by the log-Jacobian of the transformation to ensure proper probability densities.