Posterior Sampling
Draw samples from the joint posterior distribution for inference beyond marginals: useful for derived quantities, predictions, and Monte Carlo integration.
Why Sample from the Posterior?
While INLA provides accurate marginal posteriors through numerical integration, sometimes you need samples from the joint posterior distribution. Common use cases include:
- Derived quantities: functions of multiple parameters (e.g., \(\beta_1 \times \beta_2\))
- Posterior predictions: propagating uncertainty through nonlinear transformations
- Model comparison: computing posterior predictive checks
- Correlation analysis: understanding dependencies between parameters
How pyINLA Generates Samples
pyINLA computes deterministic approximations to the posterior first, then generates samples from those approximations. This is fast, but means samples are approximate: they come from the Gaussian (or skew-corrected) approximation to the true posterior.
Sampling Functions
pyINLA provides four sampling functions:
pyinla.qsample()
Sample from a Gaussian Markov Random Field (GMRF) given a precision matrix Q. The fundamental building block for all other sampling.
pyinla.posterior_sample()
Generate samples from the full joint posterior of hyperparameters and latent field, using stored configurations from model fitting.
pyinla.posterior_sample_eval()
Evaluate a function on posterior samples. Extract specific elements or compute derived quantities.
pyinla.hyperpar_sample()
Sample only the hyperparameters from their joint posterior, using eigendecomposition of the covariance matrix.
qsample: GMRF Sampling
pyinla.qsample() draws samples from a Gaussian Markov Random Field (GMRF) defined by a precision matrix \(\mathbf{Q}\):
How GMRF Sampling Works
The INLA binary uses a Cholesky factorization approach:
- Compute Cholesky factor: \(\mathbf{Q} = \mathbf{L}\mathbf{L}^T\)
- Generate standard normal: \(\mathbf{z} \sim \mathcal{N}(0, \mathbf{I})\)
- Solve: \(\mathbf{L}^T \mathbf{x} = \mathbf{z}\) via back-substitution
- Add mean: \(\mathbf{x} \leftarrow \mathbf{x} + \boldsymbol{\mu}\)
This exploits the sparse structure of Q for efficient sampling.
Parameters
| Parameter | Type | Description |
|---|---|---|
n | int | Number of samples to generate |
Q | ndarray, sparse, or str | Precision matrix (positive definite). Accepts a dense ndarray, any scipy.sparse format, or a path to a binary fmesher file. |
mu | ndarray, optional | Mean vector (default: zero) |
b | ndarray, optional | Linear term in the log-density \(-\tfrac{1}{2}(x-\mu)^\top Q (x-\mu) + b^\top x\). Shifts the posterior mean by \(Q^{-1} b\) (canonical / information form). |
constr | dict, optional | Linear constraints: {'A': matrix, 'e': vector} for \(\mathbf{A}\mathbf{x} = \mathbf{e}\) |
sample | ndarray, optional | If provided, compute the log-density of these samples instead of generating new ones. |
compute_mean | bool, optional | If True, also return the (constrained) posterior mean. |
reordering | str | Cholesky reordering algorithm: 'auto' (default), 'amd', 'metis', 'band', 'identity'. |
seed | int, optional | Random seed for reproducibility. Note: bit-exact reproducibility requires also pinning num_threads (e.g. num_threads='1:1'); the seed alone is not enough if the thread count varies across runs. |
logdens | bool | If True, also return log-densities |
selection | array, optional | Indices of elements to return (1-based) |
Per-Parameter Runnable Examples
Click any parameter to expand a minimal runnable snippet. All snippets share the setup below; each chunk redefines Q as appropriate.
# shared setup for every qsample snippet below
import numpy as np
from scipy import sparse
import pyinla
n + Q required minimal call
# Diagonal precision matrix (independent components):
# precision = 5 means variance = 0.2 per component.
Q = np.eye(5) * 5
samples = pyinla.qsample(n=1000, Q=Q)
print(f"Shape: {samples.shape}") # (5, 1000)
print(f"Variance: {np.var(samples, axis=1)}") # ~0.2
mu non-zero mean vector
Q = np.eye(5) * 10
mu = np.array([1, 2, 3, 4, 5], dtype=float)
samples = pyinla.qsample(n=100, Q=Q, mu=mu, seed=42)
print(samples.mean(axis=1).round(2)) # empirical means ~ mu
constr linear constraint A x = e
Q = np.eye(5) * 10
# --- Single constraint: A is 1 x 5, e is length 1 ---
# Sum-to-zero: x[0] + x[1] + x[2] + x[3] + x[4] = 0
constr = {'A': np.ones((1, 5)), 'e': np.array([0.0])}
samples = pyinla.qsample(n=100, Q=Q, constr=constr, seed=42)
print(np.sum(samples, axis=0)[:5].round(6)) # all ~0
# --- Multiple constraints: A is k x 5, e is length k ---
# Row 1: sum-to-zero; Row 2: x[0] - x[4] = 2
A = np.array([
[1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 0.0, 0.0, 0.0, -1.0],
])
e = np.array([0.0, 2.0])
samples = pyinla.qsample(n=100, Q=Q, constr={'A': A, 'e': e}, seed=42)
print("sum: ", samples.sum(axis=0)[:5].round(6)) # all ~0
print("x0 - x4:", (samples[0] - samples[4])[:5].round(6)) # all ~2
seed reproducibility (pair with num_threads for bit-exact)
Q = np.eye(3) * 4
# seed alone: reproducible up to floating-point thread-scheduling noise.
a = pyinla.qsample(n=10, Q=Q, seed=7)
b = pyinla.qsample(n=10, Q=Q, seed=7)
print("byte-equal (seed only): ", np.array_equal(a, b))
# Add num_threads='1:1' to make it byte-exact across runs.
a2 = pyinla.qsample(n=10, Q=Q, seed=7, num_threads="1:1")
b2 = pyinla.qsample(n=10, Q=Q, seed=7, num_threads="1:1")
print("byte-equal (seed + 1:1): ", np.array_equal(a2, b2)) # True
logdens also return log-densities (returns a dict)
# Sparse tridiagonal Q: GMRF Cholesky exploits the sparsity.
n = 100
diag = np.ones(n) * 4
off_diag = -np.ones(n - 1)
Q = sparse.diags([off_diag, diag, off_diag], [-1, 0, 1], format='csc')
result = pyinla.qsample(n=50, Q=Q, seed=77, logdens=True)
print(list(result.keys())) # ['sample', 'logdens', 'mean']
print(result['sample'].shape, result['logdens'].shape)
selection return only a subset of components (1-based)
Q = np.eye(5) * 4
# Only return components 1, 3, 5 (1-based indices).
samples = pyinla.qsample(n=10, Q=Q, seed=1,
selection=np.array([1, 3, 5]))
print(samples.shape) # (3, 10)
b canonical / information form (linear term)
# Log-density is -1/2 (x - mu)^T Q (x - mu) + b^T x.
# With mu = 0, the posterior mean equals Q^-1 b.
Q = np.eye(5) * 4
b = np.array([1.0, -1.0, 0.5, -0.5, 0.0])
samples = pyinla.qsample(n=200, Q=Q, b=b, seed=1)
print("empirical mean:", samples.mean(axis=1).round(3))
print("expected Q^-1 b:", np.linalg.solve(Q, b).round(3))
compute_mean also return the (constrained) posterior mean
# compute_mean=True returns a dict with 'sample', 'logdens', 'mean'.
# With a sum-to-zero constraint, the mean is mu projected onto sum=0.
Q = np.eye(5) * 10
mu = np.array([1, 2, 3, 4, 5], dtype=float)
constr = {'A': np.ones((1, 5)), 'e': np.array([0.0])}
out = pyinla.qsample(n=50, Q=Q, mu=mu, constr=constr,
seed=1, compute_mean=True)
print(list(out.keys())) # ['sample', 'logdens', 'mean']
print(out['mean'].round(2)) # [-2, -1, 0, 1, 2] (sums to 0)
sample recompute log-density for given samples (no new draws)
# Pass an existing matrix of samples to evaluate their log-density
# under N(mu, Q^-1). Useful for importance sampling and model comparison.
Q = np.eye(5) * 4
x = np.random.default_rng(0).standard_normal((5, 10))
out = pyinla.qsample(n=10, Q=Q, sample=x, logdens=True)
print(out['logdens'].shape) # (10,)
print(out['logdens'].round(3))
reordering pick the Cholesky reordering algorithm
# 'auto' picks the best heuristic; 'band' is optimal for band-structured Q;
# 'amd' / 'metis' minimise fill-in for general sparsity. Same result, different speed.
n = 50
Q = (np.eye(n) * 4
+ np.diag(-np.ones(n - 1), 1)
+ np.diag(-np.ones(n - 1), -1))
s_auto = pyinla.qsample(n=10, Q=Q, seed=1, reordering='auto')
s_band = pyinla.qsample(n=10, Q=Q, seed=1, reordering='band')
print(s_auto.shape, s_band.shape) # both (50, 10)
Q in different formats
The same call accepts dense ndarrays, any scipy.sparse format, or a path to a binary fmesher file on disk. All three produce identical samples for the same Q.
Q as scipy csr_matrix (any sparse format works)
Q = sparse.eye(5).tocsr() * 5.0 # csc, csr, dia, coo all OK
samples = pyinla.qsample(n=20, Q=Q, seed=1)
print(samples.shape) # (5, 20)
Q via sparse.diags(...) (band / tridiagonal builders)
# Build a tridiagonal Q without materialising it densely.
n = 100
diag = np.ones(n) * 4
off_diag = -np.ones(n - 1)
Q = sparse.diags([off_diag, diag, off_diag], [-1, 0, 1], format='csc')
samples = pyinla.qsample(n=50, Q=Q, seed=1)
print(samples.shape) # (100, 50)
Q as a file path (binary fmesher format on disk)
# Useful when Q is large and pre-built (e.g. SPDE precision exported once,
# then re-used across many sampling calls without re-serialising).
import tempfile, os
from pyinla.fmesher_io import write_fmesher_file
Q = sparse.eye(5).tocsc() * 5.0
tmp = tempfile.NamedTemporaryFile(delete=False, suffix='.dat')
tmp.close()
write_fmesher_file(Q, tmp.name)
samples = pyinla.qsample(n=20, Q=tmp.name, seed=1)
print(samples.shape) # (5, 20)
os.unlink(tmp.name)
posterior_sample: Full Posterior Sampling
pyinla.posterior_sample() generates samples from the joint posterior of hyperparameters \(\boldsymbol{\theta}\) and latent field \(\mathbf{x}\):
control={'compute': {'config': True}} to enable posterior sampling. This stores the configurations needed for sampling.
The Algorithm
The sampling algorithm follows these steps:
-
Sample configurations weighted by log-posterior
Each configuration \(k\) has a weight \(\propto \exp(\log \pi(\boldsymbol{\theta}^{(k)} \mid \mathbf{y}))\). Configurations with higher posterior probability are sampled more frequently. -
For each sampled config, call qsample
Sample the latent field from the GMRF: \(\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}_k, \mathbf{Q}_k^{-1})\) where \(\mathbf{Q}_k\) is the precision matrix for config \(k\). -
Compute the linear predictor
The linear predictor is computed as: \(\boldsymbol{\eta} = \mathbf{A} \mathbf{x} + \text{offsets}\) where \(\mathbf{A}\) is the design matrix mapping latent field to observations. -
Apply skewness correction
If the posterior is skewed, apply a correction using the skew-normal distribution to improve the Gaussian approximation. -
Transform hyperparameters
Convert from internal scale (e.g., log-precision) to user scale (e.g., precision), computing the Jacobian for proper density transformation.
What is the Log-Posterior?
Log-Posterior Computation
For each hyperparameter configuration \(\boldsymbol{\theta}_k\), INLA computes:
The marginal likelihood is approximated using the Laplace approximation:
where \(\mathbf{x}^*\) is the mode and \(\mathbf{H}\) is the Hessian (precision matrix) at the mode.
Parameters
| Parameter | Type | Description |
|---|---|---|
n | int | Number of samples to generate |
result | PyINLAResult | Fitted model (with config=True) |
selection | dict, optional | Select specific latent elements, e.g., {'(Intercept)': 1, 'x': 1} |
seed | int, optional | Random seed for reproducibility. Note: bit-exact reproducibility requires also pinning num_threads (e.g. num_threads='1:1'); the seed alone is not enough if the thread count varies across runs. |
intern | bool | If True, return hyperparameters on internal scale (default: False) |
use_improved_mean | bool | Use VB-corrected mean (default: True) |
skew_corr | bool | Apply skewness correction (default: True) |
Per-Parameter Runnable Examples
Click any parameter to expand a minimal runnable snippet. All snippets share the setup block below: a small fitted model with config=True. The same res is reused by the posterior_sample_eval and hyperpar_sample sections.
# shared setup for posterior_sample / posterior_sample_eval / hyperpar_sample
import numpy as np
import pandas as pd
import pyinla
from pyinla import pyinla as pyinla_fit
rng = np.random.default_rng(42)
nobs = 100
x1 = rng.standard_normal(nobs)
x2 = rng.standard_normal(nobs)
y = 1.5 + 2.0*x1 - 1.0*x2 + 0.5*rng.standard_normal(nobs)
df = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2})
# config=True stores the configurations needed for posterior sampling.
res = pyinla_fit(
model={'response': 'y', 'fixed': ['1', 'x1', 'x2']},
family='gaussian', data=df,
control={'compute': {'config': True}},
)
n + result required minimal call
samples = pyinla.posterior_sample(n=50, result=res, seed=42)
print(f"len(samples) = {len(samples)}")
print(f"keys of one sample: {list(samples[0].keys())}")
# Each entry has 'hyperpar', 'latent', 'logdens'.
selection subset of latent elements (dict form)
# Each key names a latent element; value is the count to keep.
samples = pyinla.posterior_sample(
n=20, result=res, seed=42,
selection={'(Intercept)': 1, 'x1': 1, 'x2': 1},
)
# With selection, sample['latent'] is a (k, 1) ndarray of selected values.
print(samples[0]['latent'].shape) # (3, 1)
print(samples[0]['latent'].ravel())
seed reproducibility
a = pyinla.posterior_sample(n=10, result=res, seed=99)
b = pyinla.posterior_sample(n=10, result=res, seed=99)
key = list(a[0]['hyperpar'].keys())[0]
print("same seed -> same first hyperpar:",
a[0]['hyperpar'][key] == b[0]['hyperpar'][key])
intern user scale vs internal scale
s_user = pyinla.posterior_sample(n=20, result=res, seed=1, intern=False)
s_intern = pyinla.posterior_sample(n=20, result=res, seed=1, intern=True)
print(list(s_user[0]['hyperpar'].keys())) # ['Precision for the Gaussian observations']
print(list(s_intern[0]['hyperpar'].keys())) # ['Log precision for the Gaussian observations']
use_improved_mean VB-corrected mean (default True)
# Default uses the VB-corrected posterior mean of the latent field as
# the qsample mu. Setting False uses the Laplace mode instead.
samples_default = pyinla.posterior_sample(n=20, result=res, seed=1)
samples_mode = pyinla.posterior_sample(n=20, result=res, seed=1,
use_improved_mean=False)
print(f"both succeeded: {len(samples_default)}, {len(samples_mode)}")
skew_corr skew-normal correction (default True)
# skew_corr=False returns pure Gaussian draws (no skewness correction).
s_on = pyinla.posterior_sample(n=20, result=res, seed=1, skew_corr=True)
s_off = pyinla.posterior_sample(n=20, result=res, seed=1, skew_corr=False)
print(f"both succeeded: {len(s_on)}, {len(s_off)}")
posterior_sample_eval: Evaluate Functions on Samples
pyinla.posterior_sample_eval() applies a function to each posterior sample to compute derived quantities:
Parameters
| Parameter | Type | Description |
|---|---|---|
fun | str or callable | Variable name to extract, or function taking keyword arguments |
samples | list | Output from posterior_sample() |
return_matrix | bool | Return as matrix (default: True) |
Per-Parameter Runnable Examples
Reuses the same fitted res from the posterior_sample section. Build a sample list once, then evaluate per-parameter.
# Build the sample list once; reuse across the chunks below.
samples = pyinla.posterior_sample(n=200, result=res, seed=7)
fun = string extract a single named variable
intercepts = pyinla.posterior_sample_eval("(Intercept)", samples)
print(f"shape: {intercepts.shape}") # (1, 200)
print(f"mean: {intercepts.mean():.4f}")
print(f"sd: {intercepts.std():.4f}")
fun = callable derived quantity across multiple variables
# A callable receives one sample at a time via **kwargs keyed by latent name.
def product_func(**kwargs):
intercept = kwargs.get('(Intercept)', 0)
slope = kwargs.get('x1', 0)
return intercept * slope
prod = pyinla.posterior_sample_eval(product_func, samples)
print(f"E[Intercept * x1]: {prod.mean():.4f}")
print(f"95% CI: [{np.percentile(prod, 2.5):.4f},",
f"{np.percentile(prod, 97.5):.4f}]")
return_matrix = False return a list instead of a matrix
# Default return_matrix=True stacks per-sample outputs into one ndarray.
# Set False to receive a plain Python list of per-sample evaluations.
out = pyinla.posterior_sample_eval("x1", samples, return_matrix=False)
print(f"type: {type(out).__name__}, len: {len(out)}")
hyperpar_sample: Hyperparameter Sampling
pyinla.hyperpar_sample() generates samples only from the hyperparameter posterior \(\pi(\boldsymbol{\theta} \mid \mathbf{y})\), without sampling the latent field:
Algorithm
Hyperparameter sampling uses eigendecomposition of the Hessian at the mode:
- Get mode \(\boldsymbol{\theta}^*\) and Hessian \(\mathbf{H}\) from INLA
- Compute eigendecomposition: \(\mathbf{H}^{-1} = \mathbf{V}\boldsymbol{\Lambda}\mathbf{V}^T\)
- Generate: \(\boldsymbol{\theta} = \boldsymbol{\theta}^* + \mathbf{V}\boldsymbol{\Lambda}^{1/2}\mathbf{z}\) where \(\mathbf{z} \sim \mathcal{N}(0, \mathbf{I})\)
- Optionally transform from internal to user scale
Parameters
| Parameter | Type | Description |
|---|---|---|
n | int | Number of samples |
result | PyINLAResult | Fitted model |
intern | bool | If True, return on internal scale (default: False) |
improve_marginals | bool | If True, refine the Gaussian marginal with stored skewness/kurtosis info (default: False) |
Per-Parameter Runnable Examples
Reuses the same fitted res from the posterior_sample section.
n + result required minimal call
samples = pyinla.hyperpar_sample(n=500, result=res)
print(f"type: {type(samples).__name__}, shape: {samples.shape}")
# Returns a pandas DataFrame, one row per sample, columns = hyperparameters.
intern user scale vs internal scale
s_user = pyinla.hyperpar_sample(n=500, result=res, intern=False)
s_intern = pyinla.hyperpar_sample(n=500, result=res, intern=True)
print(f"user-scale mean: {s_user.mean().values}") # precision
print(f"intern-scale mean: {s_intern.mean().values}") # log-precision
improve_marginals refine the Gaussian marginal (default False)
# When True, the sampler refines the Gaussian eigendecomposition with
# the stored skewness/kurtosis information from the marginal posterior.
samples = pyinla.hyperpar_sample(n=500, result=res,
improve_marginals=True)
print(f"shape: {samples.shape}")
Comparing with the INLA Marginal
import matplotlib.pyplot as plt
samples = pyinla.hyperpar_sample(n=5000, result=res)
# marginals_hyperpar entries are DataFrames with columns 'x' (support)
# and 'y' (density).
marg = res.marginals_hyperpar['Precision for the Gaussian observations']
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(samples.values.flatten(), bins=40, density=True,
alpha=0.7, label='Samples', edgecolor='black')
ax.plot(marg['x'], marg['y'], 'r-', linewidth=2, label='INLA Marginal')
ax.set_xlabel('Precision'); ax.set_ylabel('Density'); ax.legend()
plt.show()
Skewness Correction
INLA stores skewness information for each latent field element. The sampling algorithm can correct for non-Gaussianity using a skew-normal transformation:
where \(\delta\) is the skew-normal parameter derived from the stored skewness.
Skewness Parameter Conversion
The skewness \(\gamma_1\) is converted to the skew-normal delta parameter:
Skewness is clamped to \([-0.99, 0.99]\) to ensure valid delta values.
Controlling Skewness Correction
import pyinla
# With skewness correction (default)
samples_skew = pyinla.posterior_sample(n=100, result=res, skew_corr=True)
# Without skewness correction (pure Gaussian)
samples_gauss = pyinla.posterior_sample(n=100, result=res, skew_corr=False)
Internal vs User Scale
INLA represents hyperparameters on an internal scale (typically unbounded) during optimization, then transforms to user scale for reporting:
| User Scale | Internal Scale | Transformation |
|---|---|---|
| Precision (\(\tau > 0\)) | Log-precision | \(\tau = \exp(\theta)\) |
| Variance (\(\sigma^2 > 0\)) | Log-variance | \(\sigma^2 = \exp(\theta)\) |
| Correlation (\(\rho \in (-1,1)\)) | Fisher-z | \(\rho = \tanh(\theta)\) |
| Probability (\(p \in (0,1)\)) | Logit | \(p = 1/(1+\exp(-\theta))\) |
Reading the column names
When you switch intern=False → True, the column names in hyperpar_sample's DataFrame (or the keys of posterior_sample()'s 'hyperpar' dict) change in a predictable way. The new name tells you which transformation was applied:
| User-scale name | Internal-scale name | Transformation |
|---|---|---|
'Precision for ...' | 'Log precision for ...' | exp |
'Variance for ...' | 'Log variance for ...' | exp |
'Rho for ...' | 'Rho_intern for ...' | tanh (Fisher-z) |
'Probability for ...' | 'Logit ...' | sigmoid (inverse logit) |
Concrete example: an AR1 random effect contributes two hyperparameters. On the user scale they appear as 'Precision for t' and 'Rho for t'; on the internal scale they become 'Log precision for t' and 'Rho_intern for t'. Reading the prefix is how you tell exp apart from tanh in the same model.
Jensen warning when comparing means across scales
For any nonlinear transformation \(f\) (i.e. all four above), the mean does not commute with the transform:
\[\mathbb{E}[f(\theta)] \;\neq\; f(\mathbb{E}[\theta]).\]
For exp (precision / variance), the exact relationship under a Gaussian-approximated internal posterior \(\theta \sim \mathcal{N}(\mu,\sigma^{2})\) is
\[\mathbb{E}[\tau] \;=\; \mathbb{E}[\exp(\theta)] \;=\; \exp\!\left(\mu + \tfrac{\sigma^{2}}{2}\right),\]
so np.exp(s_intern.mean()) is biased low by the \(\sigma^{2}/2\) term and does not equal s_user.mean(). Same shape of pitfall applies to tanh and the sigmoid: applying the transform to the mean is not the mean of the transformed samples.
Always transform element-wise first:
# good: applies exp to each sample, then averages np.exp(s_intern['Log precision for t']).mean() # ~= s_user['Precision for t'].mean() # bad: applies exp to the mean of log-precisions np.exp(s_intern['Log precision for t'].mean()) # biased low