Fixed Effects - pyINLA

What is a Fixed Effect?

A fixed effect is a covariate whose coefficient is shared across all observations: a single parameter $\beta_j$ that does not vary by group, location, or time. In a typical regression formula:

$$\eta_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_p x_{ip}$$

Each $\beta_j$ is one fixed effect. The intercept $\beta_0$ is a fixed effect too, automatically added unless you remove it.

Fixed vs Random

Fixed effect: one coefficient for the whole dataset (e.g. an overall age slope).
Random effect: coefficients indexed by a group, location, or time, modeled as a Gaussian field with shared hyperparameters (e.g. a per-region intercept).

Fixed effects suit small sets of distinct, non-exchangeable levels (treatment arms, sex). Random effects suit exchangeable groups, partial pooling, or many sparsely-observed levels.

The Linear Predictor

In generalized linear (mixed) models, the expected value of the response $\mathbf{y}$ is connected to a linear predictor $\boldsymbol{\eta}$ via a link function $g(\cdot)$:

$$g(\mathbb{E}[\mathbf{y} \mid \boldsymbol{\beta},\,\cdot]) = \boldsymbol{\eta} \quad\Longleftrightarrow\quad g(\mathbb{E}[y_i \mid \boldsymbol{\beta},\,\cdot]) = \eta_i$$

What goes into $\boldsymbol{\eta}$?

The linear predictor decomposes as:

$\boldsymbol{\eta} = \text{offset} + \underbrace{\mathbf{Z}\,\boldsymbol{\beta}}_{\text{fixed effects}} + \text{random effects}$

Fixed effects $(\mathbf{Z}\,\boldsymbol{\beta})$: intercept and covariates declared in model['fixed']. Coefficients $\boldsymbol{\beta}$ are global (shared across all observations).
Random effects: latent components in model['random'] (iid, group-specific, temporal, spatial fields). They capture structure or extra variability across observations.
Offset: a known contribution (e.g. $\log(\text{exposure})$ in rate models).

In INLA terminology the entire contribution to $\boldsymbol{\eta}$ (including $\mathbf{Z}\boldsymbol{\beta}$) is part of the latent field: fixed effects are simply the latent components with global coefficients.

Fixed Effects in the Latent Field

In INLA, fixed effects are part of the latent Gaussian field $\mathbf{x}$. By default they get an independent Gaussian prior (a joint prior with a user-supplied correlation matrix is also possible, see correlation_matrix below):

$\beta_j \sim \mathcal{N}(\mu_j,\; 1/\tau_j)$

where $\mu_j$ is the prior mean and $\tau_j$ is the prior precision. The pyINLA defaults are listed below.

Term	Default mean	Default precision	Comment
Intercept $\beta_0$	$0$	$0$ (improper, flat)	Improper flat prior: no prior shrinkage on the overall level.
Slopes $\beta_j,\ j\ge 1$	$0$	$0.001$	Very weak, near-flat prior centered at zero. Variance $= 1000$.

You can override these globally or per-coefficient through control['fixed'] (see below).

Adding Fixed Effects in pyINLA

pyINLA describes the model with a Python dict. The "fixed" key holds a list of terms: a string is looked up as a column in data, a vector is taken as raw values, and a (name, values) tuple lets you supply a labelled column directly. The intercept is included by default and is controlled separately through model["intercept"].

Quickstart with synthetic data

The example below is fully runnable: it generates a small synthetic dataset and fits a Gaussian regression with three covariates.

import numpy as np
import pandas as pd
from pyinla import pyinla

# Synthetic dataset: y = 2 + 0.05*age + 0.7*(sex=='M') + 0.02*income + noise
rng = np.random.default_rng(0)
n = 200
age    = rng.normal(50, 10, n)
sex    = rng.choice(["F", "M"], size=n)
income = rng.normal(60, 15, n)
y = (2.0
     + 0.05 * age
     + 0.7 * (sex == "M").astype(float)
     + 0.02 * income
     + rng.normal(0, 0.5, n))

df = pd.DataFrame({"y": y, "age": age, "sex": sex, "income": income})

model = {
    "response": "y",
    "fixed":    ["age", "sex", "income"],
}

result = pyinla(model=model, family="gaussian", data=df)
print(result.summary_fixed)

Output

                 mean        sd  0.025quant  0.5quant  0.975quant      mode           kld
(Intercept)  1.858775  0.243689    1.380424  1.858775    2.337126  1.858775  1.989465e-09
age          0.053095  0.003561    0.046104  0.053095    0.060087  0.053095  1.989470e-09
sexM         0.689530  0.068038    0.555975  0.689530    0.823086  0.689530  1.989433e-09
income       0.019892  0.002264    0.015448  0.019892    0.024336  0.019892  1.989472e-09

This fits four fixed effects: (Intercept), age, sexM (the sex column is categorical and gets expanded; F is the reference level), and income.

Intercept only ($\eta_i = \beta_0$)

The intercept is added automatically when "fixed" is empty:

# intercept is included by default
model = {"response": "y", "fixed": []}

# or explicitly
model = {"response": "y", "fixed": [], "intercept": True}

One covariate ($\eta_i = \beta_0 + \beta_1 z_i$)

# Refer to a column 'z' in your DataFrame
model = {"response": "y", "fixed": ["z"]}
result = pyinla(model=model, family="gaussian", data=df)

Multiple covariates ($\eta_i = \beta_0 + \beta_1 z_{1i} + \beta_2 z_{2i}$)

model = {"response": "y", "fixed": ["z1", "z2"]}

Direct vectors (no DataFrame)

You can also pass raw vectors instead of column names. The response can be a vector too, so a DataFrame is optional:

# raw vectors (each must have len == len(y))
model = {"response": y, "fixed": [z]}

# labelled tuples for readable output
model = {"response": y, "fixed": [("z", z)]}

# a (n x p) matrix expands to columns name1, name2, ...
model = {"response": y, "fixed": [("feat", X)]}  # X.shape == (n, p)

# raw (n x p) matrix without a label: columns are auto-named
# fixed_<col_idx> (sequential), matching R's `y ~ X` formula expansion
model = {"response": y, "fixed": [X]}        # X.shape == (n, p)

Removing the Intercept

Use the token "-1" or "0" in the fixed list, or set model["intercept"] = False:

# with a token
model = {"response": "y", "fixed": ["-1", "age", "sex"]}

# equivalent
model = {"response": "y", "fixed": ["age", "sex"], "intercept": False}

You may also supply a custom intercept vector via model["intercept"] = vector (length $n$); it appears as (Intercept) in the output.

Categorical Covariates

Non-numeric named columns are automatically expanded into dummy variables (via pandas.get_dummies, named name + level with no separator, e.g. groupA, groupB). pyINLA uses treatment contrasts: with an intercept (the default), the first sorted level is dropped to avoid collinearity; without an intercept (intercept=False), all levels are kept.

Worked example: encoding a 3-level group

Suppose group takes values in $\{\text{A}, \text{B}, \text{C}\}$.

Input data (first 6 rows):

i	y	group
1	10.1	A
2	10.5	A
3	15.2	B
4	14.8	B
5	21.9	C
6	20.5	C

Design matrix with intercept (default), first level dropped:

$\eta_i = \beta_0 + \beta_B \, I(\text{group}_i=\text{B}) + \beta_C \, I(\text{group}_i=\text{C})$

i	(Intercept)	groupB	groupC
1	1	0	0
2	1	0	0
3	1	1	0
4	1	1	0
5	1	0	1
6	1	0	1

Interpretation: $\beta_0$ is the mean for group A (baseline); $\beta_B$ and $\beta_C$ are differences from A.

Design matrix without intercept (intercept=False), all levels kept:

$\eta_i = \beta_A \, I(\text{group}_i=\text{A}) + \beta_B \, I(\text{group}_i=\text{B}) + \beta_C \, I(\text{group}_i=\text{C})$

i	groupA	groupB	groupC
1	1	0	0
2	1	0	0
3	0	1	0
4	0	1	0
5	0	0	1
6	0	0	1

Interpretation: $\beta_A$, $\beta_B$, $\beta_C$ are the absolute means for each group.

Interactions

Use "z1*z2" for main effects plus their interaction, or "z1:z2" for the product term only. Three-way forms "z1*z2*z3" and "z1:z2:z3" are supported the same way. Duplicates are removed automatically.

$\eta_i = \beta_0 + \beta_1 z_{1i} + \beta_2 z_{2i} + \beta_{12}\, z_{1i} z_{2i}$

# main effects + interaction (expanded internally)
model = {"response": "y", "fixed": ["z1*z2"]}

# product term only
model = {"response": "y", "fixed": ["z1", "z2", "z1:z2"]}

# three-way: z1, z2, z3, all pairwise + triple product
model = {"response": "y", "fixed": ["z1*z2*z3"]}

Interaction strings require the referenced variables to exist as named columns in data. If you don't have a DataFrame, precompute the products and pass them as labelled vectors, e.g. ("z1:z2", z1 * z2).

Nonlinear Transforms

Helpers like I(x^2) are not parsed yet. Build the column yourself and add it to your DataFrame:

df["age_sq"] = df["age"] ** 2
model = {"response": "y", "fixed": ["age", "age_sq"]}

Term Forms (At a Glance)

Each entry in model["fixed"] can be one of:

Form	Example	Notes
String column name	`"z"`	Must exist in `data`. Numeric columns become a single slope; categorical columns expand to dummies.
Series or array	`z` (1D, length $n$)	Used as-is. Length must match the response.
Labelled vector	`("z", z)`	Same as above but gives the column a readable name in the output.
Labelled matrix	`("feat", X)` with `X.shape == (n, p)`	Expands into columns `feat1, feat2, ..., featp`.
Raw matrix	`X` with `X.shape == (n, p)`	Expands into `p` columns auto-named `fixed_<col_idx>` (sequential), mirroring R's `y ~ X` formula behaviour. Use the labelled form above for readable output names.
Interaction	`"z1:z2"`, `"z1z2"`, `"z1z2*z3"`	References require named columns in `data`. `*` expands to mains plus all lower-order interactions; `:` is the product only.
Intercept tokens	`"-1"`, `"0"`	Drop the intercept. Equivalent to `model["intercept"] = False`.

Other behaviour worth knowing:

Intercept: included by default. Use model["intercept"] = False (or pass a length-$n$ vector for a custom intercept, labelled (Intercept)) or place the token "-1" or "0" inside "fixed" to drop it. Mixing add ("1") and drop ("-1"/"0") tokens in the same list raises an error.
De-duplication and order: duplicate terms are dropped; the order of entries in "fixed" determines the order of fixed-effect components in the output.
Whitespace: term strings are trimmed, so " x1 " resolves to "x1".
Repeated names: if the same covariate is supplied more than once (e.g. as both a column reference and a labelled vector), only the first occurrence is kept and later duplicates are silently dropped.
Missing or non-finite values in numeric covariates are treated as $0$.
Data optional: you can omit data entirely if you pass the response and all fixed terms as vectors. Interaction strings still require named columns.

Configuring Priors with `control['fixed']`

The control['fixed'] dictionary controls priors and reporting for all fixed effects. Below are the available keys with their defaults.

Key	Default	Meaning
`mean`	`0.0`	Prior mean for non-intercept fixed effects. Scalar or per-name dict (see How `mean` and `prec` resolve per coefficient).
`prec`	`0.001`	Prior precision for non-intercept fixed effects. Same shape as `mean`.
`mean_intercept`	`0.0`	Prior mean for the intercept.
`prec_intercept`	`0.0`	Prior precision for the intercept (0 means improper flat).
`expand_factor_strategy`	`"model.matrix"`	How to expand categorical covariates. `"model.matrix"` drops a reference level; `"inla"` keeps INLA's internal expansion.
`correlation_matrix`	`False`	If `True`, return the posterior correlation matrix of the fixed effects.
`compute`	`True`	Whether to compute marginal posteriors for fixed effects.
`quantiles`	`None`	Custom quantiles for posterior summaries (defaults to global quantiles).
`cdf`	`None`	Optional CDF evaluation points for fixed-effect marginals.
`remove_names`	`None`	List of fixed-effect names to drop from output.

How `mean` and `prec` resolve per coefficient

Both keys accept either a single number or a dict. They follow the same lookup logic, so the rules below apply to both.

What you pass	How it's read
`"mean": 0.5` scalar	Every non-intercept coefficient gets `0.5`.
`"mean": {"age": 1.0}` dict, no `"default"`	`age` gets `1.0`. Every other coefficient falls back to the pyINLA-wide default (`0.0` for `mean`, `0.001` for `prec`).
`"mean": {"age": 1.0, "default": 0.5}` dict + `"default"`	`age` gets `1.0`. Every other coefficient gets `0.5`.

Concrete example with model["fixed"] = ["age", "sex", "income"]:

control = {"fixed": {"mean": {"age": 1.0, "default": 0.5}}}
# resolved means: age = 1.0, sex = 0.5, income = 0.5

Names in the dict must match the expanded coefficient names, not the original column names. For a categorical covariate region expanded to regionB, regionC, target the dummies ("regionB": ...), not the source column.

Examples

The snippets below all reuse the df data frame and model dict from the previous section.

Tighter prior on all slopes:

result = pyinla(
    model=model,
    family="gaussian",
    data=df,
    control={"fixed": {"prec": 1.0}},
)
print(result.summary_fixed)

Per-coefficient priors:

control = {
    "fixed": {
        "mean":           {"age": 0.0, "income": 0.0},
        "prec":           {"age": 10.0, "income": 1.0},
        "mean_intercept": 2.0,
        "prec_intercept": 0.01,
    }
}
result = pyinla(model=model, family="gaussian", data=df, control=control)
print(result.summary_fixed)

Request the posterior correlation matrix:

control = {"fixed": {"correlation_matrix": True}}
result = pyinla(model=model, family="gaussian", data=df, control=control)

print(result.summary_fixed)              # per-coef posterior table
print(result.correlation_matrix_fixed)   # labelled NxN DataFrame
print(result.covariance_matrix_fixed)    # companion covariance

Output of result.correlation_matrix_fixed

             (Intercept)         z
(Intercept)     1.000000  0.031479
z               0.031479  1.000000

The matrix is a pandas.DataFrame whose rows and columns are labelled with the fixed-effect names after factor expansion ((Intercept), z, sexM, ...). Both correlation_matrix_fixed and covariance_matrix_fixed are None when the flag is off.

What You Get Back

After fitting, the result object exposes posterior summaries for each fixed effect:

`summary_fixed`

Table with one row per fixed effect: posterior mean, sd, quantiles, mode, and KL divergence (kld) from a Gaussian approximation.

Use for: headline numbers and quick reporting.

`marginals_fixed`

Full marginal posterior densities $p(\beta_j \mid y)$ as $(x, y)$ grids, one per coefficient.

Use for: plotting, transformations, custom probability statements.

Fixed in `summary_random`?

No: random-effect summaries are kept separate. summary_fixed contains only $\beta$ coefficients.

Tip: if you need the full latent vector, look at summary_linear_predictor.

`correlation_matrix_fixed`

Posterior correlation matrix among the fixed-effect coefficients, as a labelled pandas.DataFrame. Populated only when control['fixed']['correlation_matrix']=True; otherwise None.

Use for: diagnosing collinearity between covariates after fitting.

`covariance_matrix_fixed`

Companion to correlation_matrix_fixed: the posterior covariance with the same row/column labels. Diagonals are coefficient variances ($\text{sd}^2$ from summary_fixed).

Use for: propagating uncertainty into linear combinations of coefficients.

Connecting Fixed Effects to the Response

pyINLA supports many likelihoods through the family= argument. The three most common choices, written in terms of the linear predictor $\boldsymbol{\eta}$, are below.

Gaussian (identity link)

$\mu_i = \eta_i = \beta_0 + \beta_1 z_i$

Likelihood (for $n$ observations):

$$\mathcal{L}(\boldsymbol{\beta}, \sigma^2 \mid y) = \prod_{i=1}^{n} \mathcal{N}\!\left(y_i \mid \beta_0 + \beta_1 z_i,\; \sigma^2\right)$$

Poisson (log link)

$\log(\lambda_i) = \eta_i = \beta_0 + \beta_1 z_i \;\Rightarrow\; \lambda_i = \exp(\eta_i)$

Likelihood (for $n$ observations):

$$\mathcal{L}(\boldsymbol{\beta} \mid y) = \prod_{i=1}^{n} \frac{\exp\!\big(-\exp(\beta_0 + \beta_1 z_i)\big)\, \exp(\beta_0 + \beta_1 z_i)^{y_i}}{y_i!}$$

Binomial (logit link)

$\text{logit}(p_i) = \eta_i = \beta_0 + \beta_1 z_i \;\Rightarrow\; p_i = \text{logit}^{-1}(\beta_0 + \beta_1 z_i)$

Bernoulli likelihood (for $n$ observations):

$$\mathcal{L}(\boldsymbol{\beta} \mid y) = \prod_{i=1}^{n} \big[\text{logit}^{-1}(\beta_0 + \beta_1 z_i)\big]^{y_i}\, \big[1 - \text{logit}^{-1}(\beta_0 + \beta_1 z_i)\big]^{1 - y_i}$$

Other families are available (Gamma, Negative Binomial, survival models, Beta, etc.). See the Likelihoods page for the full catalogue.

Interpreting Fixed-Effect Coefficients

The meaning of a coefficient depends on the link scale (see Link Functions). The cards below cover the three most common cases.

Identity link (Gaussian)

Additive on the mean: $\mathbb{E}[y_i] = \eta_i$.

Sign: $\beta_1 > 0$ raises the mean; $\beta_1 < 0$ lowers it.
Magnitude: per $+1$ unit in the covariate $z$, the expected mean changes by $\beta_1$. For a change $\Delta z$, $\Delta \mathbb{E}[y_i] = \beta_1 \, \Delta z$.

Log link (Poisson, Gamma)

Multiplicative on the rate: $\log(\lambda_i) = \eta_i \Rightarrow \lambda_i = \exp(\eta_i)$.

Sign: $\beta_1 > 0$ increases the rate; $\beta_1 < 0$ decreases it.
Magnitude: per $+1$ unit in $z$, the expected rate $\lambda_i$ is multiplied by $\exp(\beta_1)$. For a change $\Delta z$, the multiplicative factor is $\exp(\beta_1 \Delta z)$, i.e. a $100 \cdot (\exp(\beta_1 \Delta z) - 1)\%$ change.

Logit link (Binomial, Beta)

Multiplicative on the odds: $\text{logit}(p_i) = \eta_i$.

Sign: $\beta_1 > 0$ increases odds and probability; $\beta_1 < 0$ decreases them.
Magnitude: per $+1$ unit in $z$, the odds are multiplied by $\exp(\beta_1)$ (odds ratio). The probability change depends on the baseline $p_{0,i}$: $p_{\text{new}, i} = \text{logit}^{-1}\!\big(\text{logit}(p_{0,i}) + \beta_1 \Delta z\big)$.

Fixed‑Effect Extensions

Most fixed covariates belong in model['fixed'] and can have label‑specific priors via control['fixed']. When you need special treatment for a single slope, there are two useful patterns:

Single‑slope prior

Linear

Keep a slope as an ordinary fixed effect and set a label‑specific prior via control['fixed']. Prefer this over a latent linear for most cases.

Learn how to set slope priors

Constrained slope

Clinear

When a slope must be bounded (e.g., $\beta\ge0$), add a latent clinear term and set range=(low, high).

Learn about constrained slopes

Priors

Fixed‑Effect Priors

Set priors for the intercept and individual coefficients via control['fixed'] (label‑specific keys).

Explore priors