Logistic Distribution Regression
A complete tutorial on Bayesian linear regression with Logistic likelihood, providing robust modeling of continuous data with heavier tails than the Gaussian distribution.
Introduction
In this tutorial, we fit a Bayesian linear regression model where the response variable follows a Logistic distribution. The Logistic distribution is a continuous probability distribution with a symmetric, bell-shaped curve but heavier tails compared to the Gaussian distribution.
This makes the Logistic distribution particularly useful when modeling data that may contain outliers, as it provides a more robust alternative to normal errors while maintaining computational tractability.
The Model
We consider a linear regression model with $n$ observations. For each observation $i = 1, \ldots, n$, the response $y_i$ is modeled as:
where:
- $y_i \in \mathbb{R}$ is the continuous response variable
- $z_i \in \mathbb{R}$ is the covariate
- $\beta_0$ is the intercept
- $\beta_1$ is the slope coefficient
- $\kappa = \tau s \pi / \sqrt{3}$ is the scale parameter, where $\tau > 0$ is the precision and $s > 0$ is a fixed scaling factor
- $\varepsilon_i$ is the Logistic noise term
Linear Predictor
The linear predictor $\eta_i$ links the mean of the response to the covariates through a linear combination:
For the Logistic likelihood with identity link, the location parameter equals the linear predictor:
In matrix notation, we can write $\boldsymbol{\eta} = \mathbf{X}\boldsymbol{\beta}$, where $\mathbf{X}$ is the design matrix with a column of ones (for the intercept) and a column containing the covariate values.
Likelihood Function
The probability density function for the Logistic distribution is:
where $\kappa = \tau s \pi / \sqrt{3}$. The variance of each observation is:
The likelihood for the full dataset $\mathbf{y} = (y_1, \ldots, y_n)^\top$ is the product of individual densities:
Prior on the Precision
The precision parameter $\tau$ controls the spread of the distribution. In pyINLA, we work with the log-transformed precision $\theta = \log(\tau)$ to ensure positivity:
The default prior is a log-gamma distribution with parameters $a = 1$ and $b = 5 \times 10^{-5}$:
This corresponds to a Gamma$(a, b)$ prior on $\tau$ itself, which is vague enough to let the data inform the posterior while ensuring proper behavior.
Dataset
The dataset contains $n = 1000$ observations with two columns:
| Column | Description | Type |
|---|---|---|
y | Continuous response variable (Logistic distributed) | float |
z | Covariate (predictor variable) | float |
Download the CSV file and place it in your working directory.
Implementation in pyINLA
We now fit the model using pyINLA. The key steps are:
- Load the data into a pandas DataFrame
- Define the model structure (response and fixed effects)
- Specify the likelihood family as
'logistic' - Call the
pyinla()function
import pandas as pd
from pyinla import pyinla
# Load data
df = pd.read_csv('dataset_logistic.csv')
# Define model: y ~ 1 + z (intercept + slope)
model = {
'response': 'y',
'fixed': ['1', 'z']
}
# Specify control options
control = {
'compute': {
'dic': True,
'cpo': True,
'mlik': True,
'return_marginals': True
}
}
# Fit the model with logistic family
result = pyinla(
model=model,
family='logistic',
data=df,
control=control
)
# View posterior summaries
print(result.summary_fixed)
print(result.summary_hyperpar)
Extracting Results
The result object contains all posterior inference. Here are the key attributes:
# Fixed effects: posterior summaries for beta_0 and beta_1
print(result.summary_fixed)
# Extract posterior means
intercept = result.summary_fixed.loc['(Intercept)', 'mean']
slope = result.summary_fixed.loc['z', 'mean']
# Hyperparameters: posterior summary for precision tau
print(result.summary_hyperpar)
# Access the marginal likelihood for model comparison
print(result.mlik)
The summary_fixed DataFrame includes columns for mean, sd, 0.025quant, 0.5quant, 0.975quant, and mode.
Results and Diagnostics
The posterior summaries provide estimates for $\beta_0$, $\beta_1$, and the precision $\tau$. The true parameters used for data generation were $\beta_0 = 1$, $\beta_1 = 1$, and $\tau = 1$.
To reproduce these figures locally, download the render_logistic_plots.py script and run it alongside the CSV dataset.
Data Generation (Reference)
For reproducibility, the dataset was simulated from the following data-generating process:
with true parameter values: $\beta_0 = 1$, $\beta_1 = 1$, $\tau = 1$ (i.e., $\sigma = 1$).
# Data generation script (for reference only)
import numpy as np
import pandas as pd
def rlogistic(n, mean=0, sd=1, rng=None):
"""Generate random samples from a logistic distribution."""
if rng is None:
rng = np.random.default_rng()
p = rng.uniform(size=n)
A = np.pi / np.sqrt(3)
tauA = A / sd**2
return (tauA * mean - np.log((1 - p) / p)) / tauA
rng = np.random.default_rng(2026)
n = 1000
# Generate covariate
z = rng.normal(0, 0.1, size=n)
# Linear predictor
eta = 1 + z
# Generate logistic response
y = rlogistic(n, mean=eta, sd=1, rng=rng)
df = pd.DataFrame({'y': y, 'z': np.round(z, 6)})
# df.to_csv('dataset_logistic.csv', index=False)