← Back to Learn
Learn pyINLA

Fixed Effects

Covariates with global, non-grouped coefficients in the linear predictor.

What is a Fixed Effect?

A fixed effect is a covariate whose coefficient is shared across all observations: a single parameter $\beta_j$ that does not vary by group, location, or time. In a typical regression formula:

$$\eta_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_p x_{ip}$$

Each $\beta_j$ is one fixed effect. The intercept $\beta_0$ is a fixed effect too, automatically added unless you remove it.

Fixed vs Random

  • Fixed effect: one coefficient for the whole dataset (e.g. an overall age slope).
  • Random effect: coefficients indexed by a group, location, or time, modeled as a Gaussian field with shared hyperparameters (e.g. a per-region intercept).

Fixed effects suit small sets of distinct, non-exchangeable levels (treatment arms, sex). Random effects suit exchangeable groups, partial pooling, or many sparsely-observed levels.

The Linear Predictor

In generalized linear (mixed) models, the expected value of the response $\mathbf{y}$ is connected to a linear predictor $\boldsymbol{\eta}$ via a link function $g(\cdot)$:

$$g(\mathbb{E}[\mathbf{y} \mid \boldsymbol{\beta},\,\cdot]) = \boldsymbol{\eta} \quad\Longleftrightarrow\quad g(\mathbb{E}[y_i \mid \boldsymbol{\beta},\,\cdot]) = \eta_i$$

What goes into $\boldsymbol{\eta}$?

The linear predictor decomposes as:

$\boldsymbol{\eta} = \text{offset} + \underbrace{\mathbf{Z}\,\boldsymbol{\beta}}_{\text{fixed effects}} + \text{random effects}$
  • Fixed effects $(\mathbf{Z}\,\boldsymbol{\beta})$: intercept and covariates declared in model['fixed']. Coefficients $\boldsymbol{\beta}$ are global (shared across all observations).
  • Random effects: latent components in model['random'] (iid, group-specific, temporal, spatial fields). They capture structure or extra variability across observations.
  • Offset: a known contribution (e.g. $\log(\text{exposure})$ in rate models).

In INLA terminology the entire contribution to $\boldsymbol{\eta}$ (including $\mathbf{Z}\boldsymbol{\beta}$) is part of the latent field: fixed effects are simply the latent components with global coefficients.

Fixed Effects in the Latent Field

In INLA, fixed effects are part of the latent Gaussian field $\mathbf{x}$. By default they get an independent Gaussian prior (a joint prior with a user-supplied correlation matrix is also possible, see correlation_matrix below):

$\beta_j \sim \mathcal{N}(\mu_j,\; 1/\tau_j)$

where $\mu_j$ is the prior mean and $\tau_j$ is the prior precision. The pyINLA defaults are listed below.

TermDefault meanDefault precisionComment
Intercept $\beta_0$ $0$ $0$ (improper, flat) Improper flat prior: no prior shrinkage on the overall level.
Slopes $\beta_j,\ j\ge 1$ $0$ $0.001$ Very weak, near-flat prior centered at zero. Variance $= 1000$.

You can override these globally or per-coefficient through control['fixed'] (see below).

Adding Fixed Effects in pyINLA

pyINLA describes the model with a Python dict. The "fixed" key holds a list of terms: a string is looked up as a column in data, a vector is taken as raw values, and a (name, values) tuple lets you supply a labelled column directly. The intercept is included by default and is controlled separately through model["intercept"].

Quickstart with synthetic data

The example below is fully runnable: it generates a small synthetic dataset and fits a Gaussian regression with three covariates.

import numpy as np
import pandas as pd
from pyinla import pyinla

# Synthetic dataset: y = 2 + 0.05*age + 0.7*(sex=='M') + 0.02*income + noise
rng = np.random.default_rng(0)
n = 200
age    = rng.normal(50, 10, n)
sex    = rng.choice(["F", "M"], size=n)
income = rng.normal(60, 15, n)
y = (2.0
     + 0.05 * age
     + 0.7 * (sex == "M").astype(float)
     + 0.02 * income
     + rng.normal(0, 0.5, n))

df = pd.DataFrame({"y": y, "age": age, "sex": sex, "income": income})

model = {
    "response": "y",
    "fixed":    ["age", "sex", "income"],
}

result = pyinla(model=model, family="gaussian", data=df)
print(result.summary_fixed)

Output

                 mean        sd  0.025quant  0.5quant  0.975quant      mode           kld
(Intercept)  1.858775  0.243689    1.380424  1.858775    2.337126  1.858775  1.989465e-09
age          0.053095  0.003561    0.046104  0.053095    0.060087  0.053095  1.989470e-09
sexM         0.689530  0.068038    0.555975  0.689530    0.823086  0.689530  1.989433e-09
income       0.019892  0.002264    0.015448  0.019892    0.024336  0.019892  1.989472e-09

This fits four fixed effects: (Intercept), age, sexM (the sex column is categorical and gets expanded; F is the reference level), and income.

Intercept only ($\eta_i = \beta_0$)

The intercept is added automatically when "fixed" is empty:

# intercept is included by default
model = {"response": "y", "fixed": []}

# or explicitly
model = {"response": "y", "fixed": [], "intercept": True}

One covariate ($\eta_i = \beta_0 + \beta_1 z_i$)

# Refer to a column 'z' in your DataFrame
model = {"response": "y", "fixed": ["z"]}
result = pyinla(model=model, family="gaussian", data=df)

Multiple covariates ($\eta_i = \beta_0 + \beta_1 z_{1i} + \beta_2 z_{2i}$)

model = {"response": "y", "fixed": ["z1", "z2"]}

Direct vectors (no DataFrame)

You can also pass raw vectors instead of column names. The response can be a vector too, so a DataFrame is optional:

# raw vectors (each must have len == len(y))
model = {"response": y, "fixed": [z]}

# labelled tuples for readable output
model = {"response": y, "fixed": [("z", z)]}

# a (n x p) matrix expands to columns name1, name2, ...
model = {"response": y, "fixed": [("feat", X)]}  # X.shape == (n, p)

# raw (n x p) matrix without a label: columns are auto-named
# fixed_<col_idx> (sequential), matching R's `y ~ X` formula expansion
model = {"response": y, "fixed": [X]}        # X.shape == (n, p)

Removing the Intercept

Use the token "-1" or "0" in the fixed list, or set model["intercept"] = False:

# with a token
model = {"response": "y", "fixed": ["-1", "age", "sex"]}

# equivalent
model = {"response": "y", "fixed": ["age", "sex"], "intercept": False}

You may also supply a custom intercept vector via model["intercept"] = vector (length $n$); it appears as (Intercept) in the output.

Categorical Covariates

Non-numeric named columns are automatically expanded into dummy variables (via pandas.get_dummies, named name + level with no separator, e.g. groupA, groupB). pyINLA uses treatment contrasts: with an intercept (the default), the first sorted level is dropped to avoid collinearity; without an intercept (intercept=False), all levels are kept.

Worked example: encoding a 3-level group

Suppose group takes values in $\{\text{A}, \text{B}, \text{C}\}$.

Input data (first 6 rows):

iygroup
110.1A
210.5A
315.2B
414.8B
521.9C
620.5C

Design matrix with intercept (default), first level dropped:

$\eta_i = \beta_0 + \beta_B \, I(\text{group}_i=\text{B}) + \beta_C \, I(\text{group}_i=\text{C})$
i(Intercept)groupBgroupC
1100
2100
3110
4110
5101
6101

Interpretation: $\beta_0$ is the mean for group A (baseline); $\beta_B$ and $\beta_C$ are differences from A.

Design matrix without intercept (intercept=False), all levels kept:

$\eta_i = \beta_A \, I(\text{group}_i=\text{A}) + \beta_B \, I(\text{group}_i=\text{B}) + \beta_C \, I(\text{group}_i=\text{C})$
igroupAgroupBgroupC
1100
2100
3010
4010
5001
6001

Interpretation: $\beta_A$, $\beta_B$, $\beta_C$ are the absolute means for each group.

Interactions

Use "z1*z2" for main effects plus their interaction, or "z1:z2" for the product term only. Three-way forms "z1*z2*z3" and "z1:z2:z3" are supported the same way. Duplicates are removed automatically.

$\eta_i = \beta_0 + \beta_1 z_{1i} + \beta_2 z_{2i} + \beta_{12}\, z_{1i} z_{2i}$
# main effects + interaction (expanded internally)
model = {"response": "y", "fixed": ["z1*z2"]}

# product term only
model = {"response": "y", "fixed": ["z1", "z2", "z1:z2"]}

# three-way: z1, z2, z3, all pairwise + triple product
model = {"response": "y", "fixed": ["z1*z2*z3"]}

Interaction strings require the referenced variables to exist as named columns in data. If you don't have a DataFrame, precompute the products and pass them as labelled vectors, e.g. ("z1:z2", z1 * z2).

Nonlinear Transforms

Helpers like I(x^2) are not parsed yet. Build the column yourself and add it to your DataFrame:

df["age_sq"] = df["age"] ** 2
model = {"response": "y", "fixed": ["age", "age_sq"]}

Term Forms (At a Glance)

Each entry in model["fixed"] can be one of:

FormExampleNotes
String column name "z" Must exist in data. Numeric columns become a single slope; categorical columns expand to dummies.
Series or array z (1D, length $n$) Used as-is. Length must match the response.
Labelled vector ("z", z) Same as above but gives the column a readable name in the output.
Labelled matrix ("feat", X) with X.shape == (n, p) Expands into columns feat1, feat2, ..., featp.
Raw matrix X with X.shape == (n, p) Expands into p columns auto-named fixed_<col_idx> (sequential), mirroring R's y ~ X formula behaviour. Use the labelled form above for readable output names.
Interaction "z1:z2", "z1*z2", "z1*z2*z3" References require named columns in data. * expands to mains plus all lower-order interactions; : is the product only.
Intercept tokens "-1", "0" Drop the intercept. Equivalent to model["intercept"] = False.

Other behaviour worth knowing:

Configuring Priors with control['fixed']

The control['fixed'] dictionary controls priors and reporting for all fixed effects. Below are the available keys with their defaults.

KeyDefaultMeaning
mean 0.0 Prior mean for non-intercept fixed effects. Scalar or per-name dict (see How mean and prec resolve per coefficient).
prec 0.001 Prior precision for non-intercept fixed effects. Same shape as mean.
mean_intercept 0.0 Prior mean for the intercept.
prec_intercept 0.0 Prior precision for the intercept (0 means improper flat).
expand_factor_strategy "model.matrix" How to expand categorical covariates. "model.matrix" drops a reference level; "inla" keeps INLA's internal expansion.
correlation_matrix False If True, return the posterior correlation matrix of the fixed effects.
compute True Whether to compute marginal posteriors for fixed effects.
quantiles None Custom quantiles for posterior summaries (defaults to global quantiles).
cdf None Optional CDF evaluation points for fixed-effect marginals.
remove_names None List of fixed-effect names to drop from output.

How mean and prec resolve per coefficient

Both keys accept either a single number or a dict. They follow the same lookup logic, so the rules below apply to both.

What you passHow it's read
"mean": 0.5
scalar
Every non-intercept coefficient gets 0.5.
"mean": {"age": 1.0}
dict, no "default"
age gets 1.0. Every other coefficient falls back to the pyINLA-wide default (0.0 for mean, 0.001 for prec).
"mean": {"age": 1.0, "default": 0.5}
dict + "default"
age gets 1.0. Every other coefficient gets 0.5.

Concrete example with model["fixed"] = ["age", "sex", "income"]:

control = {"fixed": {"mean": {"age": 1.0, "default": 0.5}}}
# resolved means: age = 1.0, sex = 0.5, income = 0.5

Names in the dict must match the expanded coefficient names, not the original column names. For a categorical covariate region expanded to regionB, regionC, target the dummies ("regionB": ...), not the source column.

Examples

The snippets below all reuse the df data frame and model dict from the previous section.

Tighter prior on all slopes:

result = pyinla(
    model=model,
    family="gaussian",
    data=df,
    control={"fixed": {"prec": 1.0}},
)
print(result.summary_fixed)

Per-coefficient priors:

control = {
    "fixed": {
        "mean":           {"age": 0.0, "income": 0.0},
        "prec":           {"age": 10.0, "income": 1.0},
        "mean_intercept": 2.0,
        "prec_intercept": 0.01,
    }
}
result = pyinla(model=model, family="gaussian", data=df, control=control)
print(result.summary_fixed)

Request the posterior correlation matrix:

control = {"fixed": {"correlation_matrix": True}}
result = pyinla(model=model, family="gaussian", data=df, control=control)

print(result.summary_fixed)              # per-coef posterior table
print(result.correlation_matrix_fixed)   # labelled NxN DataFrame
print(result.covariance_matrix_fixed)    # companion covariance

Output of result.correlation_matrix_fixed

             (Intercept)         z
(Intercept)     1.000000  0.031479
z               0.031479  1.000000

The matrix is a pandas.DataFrame whose rows and columns are labelled with the fixed-effect names after factor expansion ((Intercept), z, sexM, ...). Both correlation_matrix_fixed and covariance_matrix_fixed are None when the flag is off.

What You Get Back

After fitting, the result object exposes posterior summaries for each fixed effect:

summary_fixed

Table with one row per fixed effect: posterior mean, sd, quantiles, mode, and KL divergence (kld) from a Gaussian approximation.

Use for: headline numbers and quick reporting.

marginals_fixed

Full marginal posterior densities $p(\beta_j \mid y)$ as $(x, y)$ grids, one per coefficient.

Use for: plotting, transformations, custom probability statements.

Fixed in summary_random?

No: random-effect summaries are kept separate. summary_fixed contains only $\beta$ coefficients.

Tip: if you need the full latent vector, look at summary_linear_predictor.

correlation_matrix_fixed

Posterior correlation matrix among the fixed-effect coefficients, as a labelled pandas.DataFrame. Populated only when control['fixed']['correlation_matrix']=True; otherwise None.

Use for: diagnosing collinearity between covariates after fitting.

covariance_matrix_fixed

Companion to correlation_matrix_fixed: the posterior covariance with the same row/column labels. Diagonals are coefficient variances ($\text{sd}^2$ from summary_fixed).

Use for: propagating uncertainty into linear combinations of coefficients.

Connecting Fixed Effects to the Response

pyINLA supports many likelihoods through the family= argument. The three most common choices, written in terms of the linear predictor $\boldsymbol{\eta}$, are below.

Gaussian (identity link)

$\mu_i = \eta_i = \beta_0 + \beta_1 z_i$

Likelihood (for $n$ observations):

$$\mathcal{L}(\boldsymbol{\beta}, \sigma^2 \mid y) = \prod_{i=1}^{n} \mathcal{N}\!\left(y_i \mid \beta_0 + \beta_1 z_i,\; \sigma^2\right)$$

Poisson (log link)

$\log(\lambda_i) = \eta_i = \beta_0 + \beta_1 z_i \;\Rightarrow\; \lambda_i = \exp(\eta_i)$

Likelihood (for $n$ observations):

$$\mathcal{L}(\boldsymbol{\beta} \mid y) = \prod_{i=1}^{n} \frac{\exp\!\big(-\exp(\beta_0 + \beta_1 z_i)\big)\, \exp(\beta_0 + \beta_1 z_i)^{y_i}}{y_i!}$$

Binomial (logit link)

$\text{logit}(p_i) = \eta_i = \beta_0 + \beta_1 z_i \;\Rightarrow\; p_i = \text{logit}^{-1}(\beta_0 + \beta_1 z_i)$

Bernoulli likelihood (for $n$ observations):

$$\mathcal{L}(\boldsymbol{\beta} \mid y) = \prod_{i=1}^{n} \big[\text{logit}^{-1}(\beta_0 + \beta_1 z_i)\big]^{y_i}\, \big[1 - \text{logit}^{-1}(\beta_0 + \beta_1 z_i)\big]^{1 - y_i}$$

Other families are available (Gamma, Negative Binomial, survival models, Beta, etc.). See the Likelihoods page for the full catalogue.

Interpreting Fixed-Effect Coefficients

The meaning of a coefficient depends on the link scale (see Link Functions). The cards below cover the three most common cases.

Identity link (Gaussian)

Additive on the mean: $\mathbb{E}[y_i] = \eta_i$.

  • Sign: $\beta_1 > 0$ raises the mean; $\beta_1 < 0$ lowers it.
  • Magnitude: per $+1$ unit in the covariate $z$, the expected mean changes by $\beta_1$. For a change $\Delta z$, $\Delta \mathbb{E}[y_i] = \beta_1 \, \Delta z$.

Log link (Poisson, Gamma)

Multiplicative on the rate: $\log(\lambda_i) = \eta_i \Rightarrow \lambda_i = \exp(\eta_i)$.

  • Sign: $\beta_1 > 0$ increases the rate; $\beta_1 < 0$ decreases it.
  • Magnitude: per $+1$ unit in $z$, the expected rate $\lambda_i$ is multiplied by $\exp(\beta_1)$. For a change $\Delta z$, the multiplicative factor is $\exp(\beta_1 \Delta z)$, i.e. a $100 \cdot (\exp(\beta_1 \Delta z) - 1)\%$ change.

Logit link (Binomial, Beta)

Multiplicative on the odds: $\text{logit}(p_i) = \eta_i$.

  • Sign: $\beta_1 > 0$ increases odds and probability; $\beta_1 < 0$ decreases them.
  • Magnitude: per $+1$ unit in $z$, the odds are multiplied by $\exp(\beta_1)$ (odds ratio). The probability change depends on the baseline $p_{0,i}$: $p_{\text{new}, i} = \text{logit}^{-1}\!\big(\text{logit}(p_{0,i}) + \beta_1 \Delta z\big)$.

Other links (cloglog, neglog, probit, ...)

For survival, robust, and asymmetric models, see the Link Functions page for sign and magnitude rules. For Weibull survival the variant setting determines whether $\exp(\beta_j)$ is a hazard ratio or a scale-multiplier on the survival time.

Posterior probability statements

Bayesian inference lets you go beyond a point estimate. Useful queries:

  • $P(\beta_j > 0 \mid y)$: probability the effect is positive.
  • $P(\beta_j > c \mid y)$: probability of exceeding a meaningful threshold $c$.
  • Credible intervals from the marginal at any level.

All of these come straight from marginals_fixed[name].

Fixed‑Effect Extensions

Most fixed covariates belong in model['fixed'] and can have label‑specific priors via control['fixed']. When you need special treatment for a single slope, there are two useful patterns: