Fixed Effects
Covariates with global, non-grouped coefficients in the linear predictor.
What is a Fixed Effect?
A fixed effect is a covariate whose coefficient is shared across all observations: a single parameter $\beta_j$ that does not vary by group, location, or time. In a typical regression formula:
Each $\beta_j$ is one fixed effect. The intercept $\beta_0$ is a fixed effect too, automatically added unless you remove it.
Fixed vs Random
- Fixed effect: one coefficient for the whole dataset (e.g. an overall age slope).
- Random effect: coefficients indexed by a group, location, or time, modeled as a Gaussian field with shared hyperparameters (e.g. a per-region intercept).
Fixed effects suit small sets of distinct, non-exchangeable levels (treatment arms, sex). Random effects suit exchangeable groups, partial pooling, or many sparsely-observed levels.
The Linear Predictor
In generalized linear (mixed) models, the expected value of the response $\mathbf{y}$ is connected to a linear predictor $\boldsymbol{\eta}$ via a link function $g(\cdot)$:
What goes into $\boldsymbol{\eta}$?
The linear predictor decomposes as:
- Fixed effects $(\mathbf{Z}\,\boldsymbol{\beta})$: intercept and covariates declared in
model['fixed']. Coefficients $\boldsymbol{\beta}$ are global (shared across all observations). - Random effects: latent components in
model['random'](iid, group-specific, temporal, spatial fields). They capture structure or extra variability across observations. - Offset: a known contribution (e.g. $\log(\text{exposure})$ in rate models).
In INLA terminology the entire contribution to $\boldsymbol{\eta}$ (including $\mathbf{Z}\boldsymbol{\beta}$) is part of the latent field: fixed effects are simply the latent components with global coefficients.
Fixed Effects in the Latent Field
In INLA, fixed effects are part of the latent Gaussian field $\mathbf{x}$. By default they get an independent Gaussian prior (a joint prior with a user-supplied correlation matrix is also possible, see correlation_matrix below):
where $\mu_j$ is the prior mean and $\tau_j$ is the prior precision. The pyINLA defaults are listed below.
| Term | Default mean | Default precision | Comment |
|---|---|---|---|
| Intercept $\beta_0$ | $0$ | $0$ (improper, flat) | Improper flat prior: no prior shrinkage on the overall level. |
| Slopes $\beta_j,\ j\ge 1$ | $0$ | $0.001$ | Very weak, near-flat prior centered at zero. Variance $= 1000$. |
You can override these globally or per-coefficient through control['fixed'] (see below).
Adding Fixed Effects in pyINLA
pyINLA describes the model with a Python dict. The "fixed" key holds a list of terms: a string is looked up as a column in data, a vector is taken as raw values, and a (name, values) tuple lets you supply a labelled column directly. The intercept is included by default and is controlled separately through model["intercept"].
Quickstart with synthetic data
The example below is fully runnable: it generates a small synthetic dataset and fits a Gaussian regression with three covariates.
import numpy as np import pandas as pd from pyinla import pyinla # Synthetic dataset: y = 2 + 0.05*age + 0.7*(sex=='M') + 0.02*income + noise rng = np.random.default_rng(0) n = 200 age = rng.normal(50, 10, n) sex = rng.choice(["F", "M"], size=n) income = rng.normal(60, 15, n) y = (2.0 + 0.05 * age + 0.7 * (sex == "M").astype(float) + 0.02 * income + rng.normal(0, 0.5, n)) df = pd.DataFrame({"y": y, "age": age, "sex": sex, "income": income}) model = { "response": "y", "fixed": ["age", "sex", "income"], } result = pyinla(model=model, family="gaussian", data=df) print(result.summary_fixed)
Output
mean sd 0.025quant 0.5quant 0.975quant mode kld (Intercept) 1.858775 0.243689 1.380424 1.858775 2.337126 1.858775 1.989465e-09 age 0.053095 0.003561 0.046104 0.053095 0.060087 0.053095 1.989470e-09 sexM 0.689530 0.068038 0.555975 0.689530 0.823086 0.689530 1.989433e-09 income 0.019892 0.002264 0.015448 0.019892 0.024336 0.019892 1.989472e-09
This fits four fixed effects: (Intercept), age, sexM (the sex column is categorical and gets expanded; F is the reference level), and income.
Intercept only ($\eta_i = \beta_0$)
The intercept is added automatically when "fixed" is empty:
# intercept is included by default model = {"response": "y", "fixed": []} # or explicitly model = {"response": "y", "fixed": [], "intercept": True}
One covariate ($\eta_i = \beta_0 + \beta_1 z_i$)
# Refer to a column 'z' in your DataFrame model = {"response": "y", "fixed": ["z"]} result = pyinla(model=model, family="gaussian", data=df)
Multiple covariates ($\eta_i = \beta_0 + \beta_1 z_{1i} + \beta_2 z_{2i}$)
model = {"response": "y", "fixed": ["z1", "z2"]}
Direct vectors (no DataFrame)
You can also pass raw vectors instead of column names. The response can be a vector too, so a DataFrame is optional:
# raw vectors (each must have len == len(y)) model = {"response": y, "fixed": [z]} # labelled tuples for readable output model = {"response": y, "fixed": [("z", z)]} # a (n x p) matrix expands to columns name1, name2, ... model = {"response": y, "fixed": [("feat", X)]} # X.shape == (n, p) # raw (n x p) matrix without a label: columns are auto-named # fixed_<col_idx> (sequential), matching R's `y ~ X` formula expansion model = {"response": y, "fixed": [X]} # X.shape == (n, p)
Removing the Intercept
Use the token "-1" or "0" in the fixed list, or set model["intercept"] = False:
# with a token model = {"response": "y", "fixed": ["-1", "age", "sex"]} # equivalent model = {"response": "y", "fixed": ["age", "sex"], "intercept": False}
You may also supply a custom intercept vector via model["intercept"] = vector (length $n$); it appears as (Intercept) in the output.
Categorical Covariates
Non-numeric named columns are automatically expanded into dummy variables (via pandas.get_dummies, named name + level with no separator, e.g. groupA, groupB). pyINLA uses treatment contrasts: with an intercept (the default), the first sorted level is dropped to avoid collinearity; without an intercept (intercept=False), all levels are kept.
Worked example: encoding a 3-level group
Suppose group takes values in $\{\text{A}, \text{B}, \text{C}\}$.
Input data (first 6 rows):
| i | y | group |
|---|---|---|
| 1 | 10.1 | A |
| 2 | 10.5 | A |
| 3 | 15.2 | B |
| 4 | 14.8 | B |
| 5 | 21.9 | C |
| 6 | 20.5 | C |
Design matrix with intercept (default), first level dropped:
| i | (Intercept) | groupB | groupC |
|---|---|---|---|
| 1 | 1 | 0 | 0 |
| 2 | 1 | 0 | 0 |
| 3 | 1 | 1 | 0 |
| 4 | 1 | 1 | 0 |
| 5 | 1 | 0 | 1 |
| 6 | 1 | 0 | 1 |
Interpretation: $\beta_0$ is the mean for group A (baseline); $\beta_B$ and $\beta_C$ are differences from A.
Design matrix without intercept (intercept=False), all levels kept:
| i | groupA | groupB | groupC |
|---|---|---|---|
| 1 | 1 | 0 | 0 |
| 2 | 1 | 0 | 0 |
| 3 | 0 | 1 | 0 |
| 4 | 0 | 1 | 0 |
| 5 | 0 | 0 | 1 |
| 6 | 0 | 0 | 1 |
Interpretation: $\beta_A$, $\beta_B$, $\beta_C$ are the absolute means for each group.
Interactions
Use "z1*z2" for main effects plus their interaction, or "z1:z2" for the product term only. Three-way forms "z1*z2*z3" and "z1:z2:z3" are supported the same way. Duplicates are removed automatically.
# main effects + interaction (expanded internally) model = {"response": "y", "fixed": ["z1*z2"]} # product term only model = {"response": "y", "fixed": ["z1", "z2", "z1:z2"]} # three-way: z1, z2, z3, all pairwise + triple product model = {"response": "y", "fixed": ["z1*z2*z3"]}
Interaction strings require the referenced variables to exist as named columns in data. If you don't have a DataFrame, precompute the products and pass them as labelled vectors, e.g. ("z1:z2", z1 * z2).
Nonlinear Transforms
Helpers like I(x^2) are not parsed yet. Build the column yourself and add it to your DataFrame:
df["age_sq"] = df["age"] ** 2 model = {"response": "y", "fixed": ["age", "age_sq"]}
Term Forms (At a Glance)
Each entry in model["fixed"] can be one of:
| Form | Example | Notes |
|---|---|---|
| String column name | "z" |
Must exist in data. Numeric columns become a single slope; categorical columns expand to dummies. |
| Series or array | z (1D, length $n$) |
Used as-is. Length must match the response. |
| Labelled vector | ("z", z) |
Same as above but gives the column a readable name in the output. |
| Labelled matrix | ("feat", X) with X.shape == (n, p) |
Expands into columns feat1, feat2, ..., featp. |
| Raw matrix | X with X.shape == (n, p) |
Expands into p columns auto-named fixed_<col_idx> (sequential), mirroring R's y ~ X formula behaviour. Use the labelled form above for readable output names. |
| Interaction | "z1:z2", "z1*z2", "z1*z2*z3" |
References require named columns in data. * expands to mains plus all lower-order interactions; : is the product only. |
| Intercept tokens | "-1", "0" |
Drop the intercept. Equivalent to model["intercept"] = False. |
Other behaviour worth knowing:
- Intercept: included by default. Use
model["intercept"] = False(or pass a length-$n$ vector for a custom intercept, labelled(Intercept)) or place the token"-1"or"0"inside"fixed"to drop it. Mixing add ("1") and drop ("-1"/"0") tokens in the same list raises an error. - De-duplication and order: duplicate terms are dropped; the order of entries in
"fixed"determines the order of fixed-effect components in the output. - Whitespace: term strings are trimmed, so
" x1 "resolves to"x1". - Repeated names: if the same covariate is supplied more than once (e.g. as both a column reference and a labelled vector), only the first occurrence is kept and later duplicates are silently dropped.
- Missing or non-finite values in numeric covariates are treated as $0$.
- Data optional: you can omit
dataentirely if you pass the response and all fixed terms as vectors. Interaction strings still require named columns.
Configuring Priors with control['fixed']
The control['fixed'] dictionary controls priors and reporting for all fixed effects. Below are the available keys with their defaults.
| Key | Default | Meaning |
|---|---|---|
mean |
0.0 |
Prior mean for non-intercept fixed effects. Scalar or per-name dict (see How mean and prec resolve per coefficient). |
prec |
0.001 |
Prior precision for non-intercept fixed effects. Same shape as mean. |
mean_intercept |
0.0 |
Prior mean for the intercept. |
prec_intercept |
0.0 |
Prior precision for the intercept (0 means improper flat). |
expand_factor_strategy |
"model.matrix" |
How to expand categorical covariates. "model.matrix" drops a reference level; "inla" keeps INLA's internal expansion. |
correlation_matrix |
False |
If True, return the posterior correlation matrix of the fixed effects. |
compute |
True |
Whether to compute marginal posteriors for fixed effects. |
quantiles |
None |
Custom quantiles for posterior summaries (defaults to global quantiles). |
cdf |
None |
Optional CDF evaluation points for fixed-effect marginals. |
remove_names |
None |
List of fixed-effect names to drop from output. |
How mean and prec resolve per coefficient
Both keys accept either a single number or a dict. They follow the same lookup logic, so the rules below apply to both.
| What you pass | How it's read |
|---|---|
"mean": 0.5scalar |
Every non-intercept coefficient gets 0.5. |
"mean": {"age": 1.0}dict, no "default" |
age gets 1.0. Every other coefficient falls back to the pyINLA-wide default (0.0 for mean, 0.001 for prec). |
"mean": {"age": 1.0, "default": 0.5}dict + "default" |
age gets 1.0. Every other coefficient gets 0.5. |
Concrete example with model["fixed"] = ["age", "sex", "income"]:
control = {"fixed": {"mean": {"age": 1.0, "default": 0.5}}}
# resolved means: age = 1.0, sex = 0.5, income = 0.5
Names in the dict must match the expanded coefficient names, not the original column names. For a categorical covariate region expanded to regionB, regionC, target the dummies ("regionB": ...), not the source column.
Examples
The snippets below all reuse the df data frame and model dict from the previous section.
Tighter prior on all slopes:
result = pyinla(
model=model,
family="gaussian",
data=df,
control={"fixed": {"prec": 1.0}},
)
print(result.summary_fixed)
Per-coefficient priors:
control = {
"fixed": {
"mean": {"age": 0.0, "income": 0.0},
"prec": {"age": 10.0, "income": 1.0},
"mean_intercept": 2.0,
"prec_intercept": 0.01,
}
}
result = pyinla(model=model, family="gaussian", data=df, control=control)
print(result.summary_fixed)
Request the posterior correlation matrix:
control = {"fixed": {"correlation_matrix": True}}
result = pyinla(model=model, family="gaussian", data=df, control=control)
print(result.summary_fixed) # per-coef posterior table
print(result.correlation_matrix_fixed) # labelled NxN DataFrame
print(result.covariance_matrix_fixed) # companion covariance
Output of result.correlation_matrix_fixed
(Intercept) z (Intercept) 1.000000 0.031479 z 0.031479 1.000000
The matrix is a pandas.DataFrame whose rows and columns are labelled with the fixed-effect names after factor expansion ((Intercept), z, sexM, ...). Both correlation_matrix_fixed and covariance_matrix_fixed are None when the flag is off.
What You Get Back
After fitting, the result object exposes posterior summaries for each fixed effect:
summary_fixed
Table with one row per fixed effect: posterior mean, sd, quantiles, mode, and KL divergence (kld) from a Gaussian approximation.
Use for: headline numbers and quick reporting.
marginals_fixed
Full marginal posterior densities $p(\beta_j \mid y)$ as $(x, y)$ grids, one per coefficient.
Use for: plotting, transformations, custom probability statements.
Fixed in summary_random?
No: random-effect summaries are kept separate. summary_fixed contains only $\beta$ coefficients.
Tip: if you need the full latent vector, look at summary_linear_predictor.
correlation_matrix_fixed
Posterior correlation matrix among the fixed-effect coefficients, as a labelled pandas.DataFrame. Populated only when control['fixed']['correlation_matrix']=True; otherwise None.
Use for: diagnosing collinearity between covariates after fitting.
covariance_matrix_fixed
Companion to correlation_matrix_fixed: the posterior covariance with the same row/column labels. Diagonals are coefficient variances ($\text{sd}^2$ from summary_fixed).
Use for: propagating uncertainty into linear combinations of coefficients.
Connecting Fixed Effects to the Response
pyINLA supports many likelihoods through the family= argument. The three most common choices, written in terms of the linear predictor $\boldsymbol{\eta}$, are below.
Gaussian (identity link)
Likelihood (for $n$ observations):
Poisson (log link)
Likelihood (for $n$ observations):
Binomial (logit link)
Bernoulli likelihood (for $n$ observations):
Other families are available (Gamma, Negative Binomial, survival models, Beta, etc.). See the Likelihoods page for the full catalogue.
Interpreting Fixed-Effect Coefficients
The meaning of a coefficient depends on the link scale (see Link Functions). The cards below cover the three most common cases.
Identity link (Gaussian)
Additive on the mean: $\mathbb{E}[y_i] = \eta_i$.
- Sign: $\beta_1 > 0$ raises the mean; $\beta_1 < 0$ lowers it.
- Magnitude: per $+1$ unit in the covariate $z$, the expected mean changes by $\beta_1$. For a change $\Delta z$, $\Delta \mathbb{E}[y_i] = \beta_1 \, \Delta z$.
Log link (Poisson, Gamma)
Multiplicative on the rate: $\log(\lambda_i) = \eta_i \Rightarrow \lambda_i = \exp(\eta_i)$.
- Sign: $\beta_1 > 0$ increases the rate; $\beta_1 < 0$ decreases it.
- Magnitude: per $+1$ unit in $z$, the expected rate $\lambda_i$ is multiplied by $\exp(\beta_1)$. For a change $\Delta z$, the multiplicative factor is $\exp(\beta_1 \Delta z)$, i.e. a $100 \cdot (\exp(\beta_1 \Delta z) - 1)\%$ change.
Logit link (Binomial, Beta)
Multiplicative on the odds: $\text{logit}(p_i) = \eta_i$.
- Sign: $\beta_1 > 0$ increases odds and probability; $\beta_1 < 0$ decreases them.
- Magnitude: per $+1$ unit in $z$, the odds are multiplied by $\exp(\beta_1)$ (odds ratio). The probability change depends on the baseline $p_{0,i}$: $p_{\text{new}, i} = \text{logit}^{-1}\!\big(\text{logit}(p_{0,i}) + \beta_1 \Delta z\big)$.
Other links (cloglog, neglog, probit, ...)
For survival, robust, and asymmetric models, see the Link Functions page for sign and magnitude rules. For Weibull survival the variant setting determines whether $\exp(\beta_j)$ is a hazard ratio or a scale-multiplier on the survival time.
Posterior probability statements
Bayesian inference lets you go beyond a point estimate. Useful queries:
- $P(\beta_j > 0 \mid y)$: probability the effect is positive.
- $P(\beta_j > c \mid y)$: probability of exceeding a meaningful threshold $c$.
- Credible intervals from the marginal at any level.
All of these come straight from marginals_fixed[name].
Fixed‑Effect Extensions
Most fixed covariates belong in model['fixed'] and can have label‑specific priors via control['fixed']. When you need special treatment for a single slope, there are two useful patterns:
Linear
Keep a slope as an ordinary fixed effect and set a label‑specific prior via control['fixed']. Prefer this over a latent linear for most cases.
Clinear
When a slope must be bounded (e.g., $\beta\ge0$), add a latent clinear term and set range=(low, high).
Fixed‑Effect Priors
Set priors for the intercept and individual coefficients via control['fixed'] (label‑specific keys).