Theory

How INLA Works

The Integrated Nested Laplace Approximation provides fast, accurate Bayesian inference for latent Gaussian models without MCMC sampling.

The Bayesian Inference Challenge

Bayesian inference requires computing the posterior distribution of parameters given observed data. Using Bayes' theorem:

p(\theta \mid \mathbf{y}) = \frac{p(\mathbf{y} \mid \theta) \, p(\theta)}{p(\mathbf{y})}

where:

$p(\theta \mid \mathbf{y})$ is the posterior, our updated belief about $\theta$ after observing data $\mathbf{y}$
$p(\mathbf{y} \mid \theta)$ is the likelihood, the probability of the observed data given the parameters
$p(\theta)$ is the prior, our belief about $\theta$ before seeing data
$p(\mathbf{y}) = \int p(\mathbf{y} \mid \theta) \, p(\theta) \, d\theta$ is the marginal likelihood (normalizing constant)

The denominator $p(\mathbf{y})$ requires integrating over all possible parameter values, which is analytically intractable for most realistic models. Traditional approaches use Markov chain Monte Carlo (MCMC) to draw samples from the posterior, but this can be computationally expensive and slow to converge for complex hierarchical models.

INLA's Key Insight

For a broad class of models called latent Gaussian models, we can exploit the Gaussian structure to compute fast, accurate approximations to posterior marginals without sampling. This makes INLA orders of magnitude faster than MCMC for these models.

Why "Integrated Nested Laplace Approximations"?

The name INLA reflects the structured, hierarchical nature of the methodology:

Integrated

Numerical Integration

INLA computes marginal posteriors by integrating over both the latent variables and hyperparameters using a combination of numerical integration and analytical approximations.

Nested

Hierarchical Structure

Approximation steps are performed in a nested manner, where each approximation depends on results from another level. Laplace approximations are applied conditionally within different levels.

Laplace

Core Technique

The Laplace approximation efficiently replaces computationally expensive integrations with optimization and curvature evaluations at the mode.

This combination of integration, hierarchical structure, and Laplace-based approximations underpins the efficiency and accuracy of INLA for latent Gaussian models.

Latent Gaussian Models

INLA works with latent Gaussian models (LGMs), which have a natural three-level hierarchical structure:

Common misconception: The "Gaussian" in latent Gaussian model refers to the latent field (Level 2), not the data. The observations can follow any likelihood: Poisson, Binomial, Gamma, Beta, Negative Binomial, and many others. INLA handles non-Gaussian data by design.

Level 1

Observations

Data $y_i$ are conditionally independent given the latent field. The response can follow any supported likelihood (Gaussian, Poisson, Binomial, Gamma, etc.), not just Gaussian.

Level 2

Latent Field

A Gaussian Markov random field (GMRF) $\mathbf{x}$ with sparse precision matrix $\mathbf{Q}(\boldsymbol{\theta})$.

Level 3

Hyperparameters

Parameters $\boldsymbol{\theta}$ controlling variance, correlation, and other structural aspects.

The latent field $\mathbf{x}$ contains all unobserved Gaussian variables: regression coefficients, random effects, and smooth functions. The linear predictor is then:

\eta_i = \beta_0 + \sum_j \beta_j z_{ij} + \sum_k f^{(k)}(u_{ik}) + \text{offset}_i

where:

$\eta_i$ is the linear predictor for observation $i$, linked to the mean response via a link function
$\beta_0$ is the intercept
$\beta_j$ are the fixed-effect coefficients for covariates $z_{ij}$
$f^{(k)}$ are smooth or structured functions (random effects, spatial fields, time trends, etc.) evaluated at inputs $u_{ik}$
$\text{offset}_i$ is a known offset (e.g., log-exposure in Poisson models)

Many common models fit the LGM framework:

Generalized Linear Models (GLMs)

Mixed Effects Models (GLMMs)

Generalized Additive Models (GAMs)

Spatial and Temporal Models

Survival Models with Frailties

State-Space Models

The INLA Algorithm

INLA computes posterior marginals through a two-stage nested approximation framework. The goal is to compute marginal posteriors for both hyperparameters $\theta_i$ and latent field elements $x_i$:

Stage 1: Hyperparameter Marginals

The marginal posterior of each hyperparameter is obtained by integrating out the latent field:

f(\theta_i \mid \mathbf{y}) = \int f(\boldsymbol{\theta} \mid \mathbf{y}) \, d\boldsymbol{\theta}_{-i} \propto \int_{\mathbf{x}} \int_{\boldsymbol{\theta}_{-i}} f(\mathbf{y} \mid \mathbf{x}, \boldsymbol{\theta}) \, f(\mathbf{x} \mid \boldsymbol{\theta}) \, f(\boldsymbol{\theta}) \, d\boldsymbol{\theta}_{-i} \, d\mathbf{x}

where:

$\theta_i$ is a single hyperparameter of interest; $\boldsymbol{\theta}_{-i}$ denotes all other hyperparameters
$\mathbf{x}$ is the latent Gaussian field (all unobserved random variables)
$f(\mathbf{y} \mid \mathbf{x}, \boldsymbol{\theta})$ is the likelihood of the data given the latent field and hyperparameters
$f(\mathbf{x} \mid \boldsymbol{\theta})$ is the Gaussian prior on the latent field, parameterized by $\boldsymbol{\theta}$
$f(\boldsymbol{\theta})$ is the prior on the hyperparameters

INLA constructs an approximation $\tilde{f}(\boldsymbol{\theta} \mid \mathbf{y})$ using the Laplace method, locates its mode $\boldsymbol{\theta}^*$, and computes marginals via numerical integration over selected evaluation points.

Stage 2: Latent Field Marginals

The marginal posterior for each latent field element is computed by integrating over the hyperparameters:

f(x_i \mid \mathbf{y}) = \int f(x_i \mid \boldsymbol{\theta}, \mathbf{y}) \, f(\boldsymbol{\theta} \mid \mathbf{y}) \, d\boldsymbol{\theta}

where $x_i$ is a single element of the latent field, $f(x_i \mid \boldsymbol{\theta}, \mathbf{y})$ is its conditional posterior given hyperparameters and data, and $f(\boldsymbol{\theta} \mid \mathbf{y})$ is the hyperparameter posterior from Stage 1.

This is approximated as a weighted sum over selected hyperparameter configurations:

\tilde{f}(x_i \mid \mathbf{y}) \approx \sum_{k} \tilde{f}(x_i \mid \boldsymbol{\theta}_k, \mathbf{y}) \cdot \tilde{f}(\boldsymbol{\theta}_k \mid \mathbf{y}) \cdot \Delta_k

where $\tilde{f}$ denotes an approximation (as opposed to the exact density), $\boldsymbol{\theta}_k$ are selected evaluation points in the hyperparameter space, and $\Delta_k$ are the corresponding quadrature weights (integration area assigned to each point).

At each integration point $\boldsymbol{\theta}_k$, the conditional posterior $f(x_i \mid \boldsymbol{\theta}, \mathbf{y})$ is approximated using one of several strategies (Gaussian, Simplified Laplace, Laplace, or Variational Bayes).

Step-by-Step Process

Approximate $\tilde{f}(\boldsymbol{\theta} \mid \mathbf{y})$
Use the Laplace approximation evaluated at the mode $\mathbf{x}^*(\boldsymbol{\theta})$ of the conditional posterior. The approximation is: $\tilde{f}(\boldsymbol{\theta} \mid \mathbf{y}) \propto \frac{f(\boldsymbol{\theta}) \, f(\mathbf{x}^* \mid \boldsymbol{\theta}) \, f(\mathbf{y} \mid \mathbf{x}^*, \boldsymbol{\theta})}{g(\mathbf{x}^* \mid \boldsymbol{\theta}, \mathbf{y})}$
where $\mathbf{x}^* = \mathbf{x}^*(\boldsymbol{\theta})$ is the mode of the conditional posterior $f(\mathbf{x} \mid \boldsymbol{\theta}, \mathbf{y})$, and $g(\mathbf{x}^* \mid \boldsymbol{\theta}, \mathbf{y})$ is the Gaussian approximation to $f(\mathbf{x} \mid \boldsymbol{\theta}, \mathbf{y})$ evaluated at the mode. The ratio replaces an expensive integral with evaluations at a single point.
Select integration points
Explore the hyperparameter space using grid search or Central Composite Design (CCD). Each point $\boldsymbol{\theta}_k$ receives a quadrature weight $\Delta_k$.
Compute conditional marginals $\tilde{f}(x_i \mid \boldsymbol{\theta}_k, \mathbf{y})$
At each integration point, approximate the conditional posterior using the selected strategy: Gaussian, Simplified Laplace, full Laplace, or Variational Bayes.
Integrate
Combine the conditional marginals with hyperparameter weights to obtain the final posterior marginals.

Key assumption: INLA assumes the hyperparameter dimension is low (typically < 15-20). For higher dimensions, numerical integration becomes less efficient, though INLA often remains faster than MCMC.

Why INLA is Fast

MCMC Sampling

Generates dependent samples from the posterior
Requires convergence diagnostics
Monte Carlo error decreases slowly ($\sim 1/\sqrt{N}$)
Can struggle with high-dimensional posteriors
Flexible for almost any model

INLA Approximation

Deterministic numerical integration
No convergence concerns
Accuracy controlled by approximation choice
Exploits sparsity for speed
Restricted to latent Gaussian models

Exploiting Sparsity

The latent field $\mathbf{x}$ has a sparse precision matrix $\mathbf{Q}$, where zeros indicate conditional independence. For example:

IID random effects: Diagonal precision (no connections)
Random walk (RW1): Tridiagonal precision (neighbors only)
Spatial GMRF: Non-zero entries only for neighboring regions

Sparse matrix algorithms compute Gaussian conditionals in $O(n)$ to $O(n^{1.5})$ time instead of $O(n^3)$, enabling inference on large latent fields.

Approximation Strategy

At each integration point, INLA needs to approximate the conditional marginal $f(x_i \mid \boldsymbol{\theta}, \mathbf{y})$. The default strategy uses a Variational Bayes (VB) correction to the Gaussian approximation, which provides fast and accurate results for the vast majority of models.

Alternative strategies (gaussian, simplified.laplace, laplace) are available for users who need finer control, but the VB default works well in practice and is what most users should use.

What pyINLA Provides

Architecture

The INLA methodology was developed by Havard Rue, who also designed and built the core computational engine in C/C++. This engine powers both official interfaces: R-INLA for R users and pyINLA for the Python ecosystem. Both wrappers leverage the same high-performance backend while offering idiomatic APIs for their respective languages.

pyINLA brings the INLA methodology to Python, providing:

Posterior marginals for fixed effects, hyperparameters, and random effects
Summary statistics: means, standard deviations, quantiles, credible intervals
Model comparison: DIC, WAIC, marginal likelihood
Predictions: fitted values and posterior predictive distributions
Utility functions: transform, sample, and summarize marginals

The Python interface integrates seamlessly with NumPy, pandas, and matplotlib for end-to-end Bayesian workflows.

Key References

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B, 71(2), 319-392.
Abdul Fattah, E. (2023). Approximate Bayesian Inference based on Dense Matrices and New Features using INLA. KAUST Research Repository. https://doi.org/10.25781/KAUST-6179U

Next Steps

Browse Examples Get Started