← Back to Likelihoods
Learn pyINLA

Link Functions

Understanding how link functions connect your linear predictor to the response variable.

What is a Link Function?

In generalized linear models (GLMs), a link function is a mathematical transformation that connects the linear predictor $\eta_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots$ to the expected value of the response variable.

The Core Idea

Your linear predictor $\eta_i$ can take any value from $-\infty$ to $+\infty$. But many response variables have natural constraints:

  • Counts must be non-negative $(0, 1, 2, \ldots)$
  • Probabilities must be between 0 and 1
  • Durations/times must be positive

The link function transforms between these constrained values and the unconstrained linear predictor.

Mathematically, if $\mu_i = \mathbb{E}[y_i]$ is the expected value of the response, then:

$$g(\mu_i) = \eta_i \quad \Leftrightarrow \quad \mu_i = g^{-1}(\eta_i)$$

where $g(\cdot)$ is the link function and $g^{-1}(\cdot)$ is its inverse (the "mean function").

Why Do We Need Link Functions?

Example 1: Modeling Counts with Poisson

Suppose you're modeling the number of accidents per day. Your linear predictor might be:

$\eta_i = 2.5 - 0.3 \times x_i$   where $x_i$ = safety training hours

Without a link function, if $x = 10$, you'd predict $\eta = 2.5 - 3 = -0.5$ accidents, which is impossible!

The log link solves this by setting $\log(\mu_i) = \eta_i$, so $\mu_i = e^{\eta_i} = e^{-0.5} \approx 0.61$ accidents. The mean is always positive, as required.

Example 2: Modeling Probabilities with Binomial

For binary outcomes (success/failure), probabilities must satisfy $0 < p < 1$. The linear predictor could give any value, but the logit link constrains the probability:

$\text{logit}(p_i) = \log\left(\frac{p_i}{1-p_i}\right) = \eta_i \quad \Rightarrow \quad p_i = \frac{e^{\eta_i}}{1 + e^{\eta_i}}$

This ensures probabilities are always between 0 and 1, regardless of the predictor values.

Common Link Functions

Link Functions by Likelihood Family

Continuous (Real-valued) Response can be any real number
FamilyDefault LinkAvailable LinksWhy This Default?
Gaussian identity identity Response is unbounded; no transformation needed
Logistic identity identity Unbounded response with heavier tails than Gaussian
Student's t identity identity Robust to outliers; models the location parameter
Skew Normal identity identity Asymmetric but still unbounded response
Positive Continuous Response must be positive $(y > 0)$
FamilyDefault LinkAvailable LinksWhy This Default?
Gamma log log, identity, neglog, quantile Log ensures positive mean; canonical for GLM
Exponential log log, neglog Positive rate parameter; neglog for AFT survival
Log-Normal identity identity, neglog Models $\log(y) \sim N(\mu, \sigma^2)$; $\mu$ is unbounded
Weibull log log, neglog, quantile Log for positive scale; neglog for AFT interpretation
Log-Logistic log log, neglog Positive scale parameter; common in survival analysis
Count Data Response is a non-negative integer $(y \in \{0, 1, 2, \ldots\})$
FamilyDefault LinkAvailable LinksWhy This Default?
Poisson log log Canonical link; ensures positive mean rate
Negative Binomial log log, logoffset, quantile Same as Poisson; handles overdispersion
Negative Binomial (type 2) logit logit, loga, cauchit, probit, cloglog, ccloglog, loglog Models success probability $p \in (0,1)$
Binary & Probability Response is probability or proportion $(0 < y < 1)$
FamilyDefault LinkAvailable LinksWhy This Default?
Binomial logit logit, probit, cloglog, ccloglog, loglog, loga, cauchit, robit, sn Canonical link; log-odds interpretation
Beta logit logit, probit, cloglog, ccloglog, loglog, loga, cauchit Maps $(0,1)$ to real line; intuitive for proportions
Beta-Binomial logit logit, probit, cloglog, ccloglog, loglog, loga, cauchit, robit, sn Same as binomial; probability parameter in $(0,1)$

Special Link Functions

Choosing a Link Function

General Guidelines

  1. Start with the default. Canonical links have nice statistical properties and are well-tested.
  2. Consider interpretability. Logit gives odds ratios, log gives multiplicative effects.
  3. Match the domain. Ensure the link maps to the correct range for your response.
  4. Check model fit. Compare DIC, WAIC, or CPO across different links if unsure.

When to Use Alternative Links

SituationConsiderReason
Extreme probabilities (rare events) cloglog or loglog Better behavior near 0 or 1
Outliers in probability models cauchit or robit Heavier tails provide robustness
Survival analysis (AFT) neglog Positive coefficients mean longer survival
Asymmetric probability response sn (skew normal) Allows asymmetric dose-response curves
Known exposure in count data logoffset Properly accounts for varying observation periods

Interpreting Coefficients by Link Function

Each coefficient $\beta$ tells you: when a covariate increases by one unit, how does the response change? The link function determines the scale on which this change is expressed. To convert back to the natural scale of the response, you often exponentiate.

LinkWhen covariate increases by 1 unit...exp($\beta$) givesExample
identity The mean response changes by $\beta$ (directly) Not needed $\beta = 0.5$ → mean response increases by 0.5 units
log $\log(\text{mean response})$ changes by $\beta$ Rate ratio: the mean response is multiplied by exp($\beta$) $\beta = 0.3$ → $e^{0.3} = 1.35$ → mean response increases by 35%
logit The log-odds of the event change by $\beta$ Odds ratio: the odds are multiplied by exp($\beta$) $\beta = -1.4$ → $e^{-1.4} = 0.25$ → odds decrease by 75%
probit $\Phi^{-1}(p)$ changes by $\beta$ No simple exp() interpretation $\beta = 0.5$ → probability increases, but the amount depends on the baseline $p$
cloglog $\log(-\log(1-p))$ changes by $\beta$ Hazard ratio (in discrete-time survival models) $\beta = 0.7$ → $e^{0.7} = 2.01$ → hazard doubles
neglog $-\log(\text{survival time})$ changes by $\beta$ (AFT models) Time ratio (inverse direction) $\beta = 0.5$ → $e^{-0.5} = 0.61$ → survival time multiplied by 0.61

The General Rule

For logarithmic links (log, logit, cloglog), exponentiating the coefficient gives a multiplicative effect on the natural scale:

  • exp($\beta$) > 1 → increase. Percentage increase = (exp($\beta$) − 1) × 100
  • exp($\beta$) < 1 → decrease. Percentage decrease = (1 − exp($\beta$)) × 100
  • exp($\beta$) = 1 (i.e., $\beta = 0$) → no effect

For the identity link (Gaussian), coefficients are directly interpretable without any transformation.

For the probit link, there is no simple exp() interpretation. Use marginal effects or convert to approximate odds ratios by multiplying probit coefficients by 1.6.

Concrete Examples

ModelCoefficientInterpretation
Gaussian (identity link)
Predicting income
$\beta_{\text{education}} = 3200$ Each additional year of education increases income by $3,200
Poisson (log link)
Counting hospital visits
$\beta_{\text{age}} = 0.02$ $e^{0.02} = 1.02$ → each year of age increases visits by 2%
Binomial (logit link)
Modeling vote choice
$\beta_{\text{female}} = -0.18$ $e^{-0.18} = 0.84$ → women have 16% lower odds of voting Bush
Poisson (log link)
Disease counts with offset
$\beta_{\text{pollution}} = 0.15$ $e^{0.15} = 1.16$ → 16% higher disease rate per unit pollution increase

Quick Reference: All Link Functions

LinkFormula $g(\mu)$Inverse $\mu = g^{-1}(\eta)$Domain
identity$\mu$$\eta$$(-\infty, \infty)$
log$\log(\mu)$$e^\eta$$(0, \infty)$
neglog$-\log(\mu)$$e^{-\eta}$$(0, \infty)$
logit$\log(\mu/(1-\mu))$$e^\eta/(1+e^\eta)$$(0, 1)$
probit$\Phi^{-1}(\mu)$$\Phi(\eta)$$(0, 1)$
cloglog$\log(-\log(1-\mu))$$1 - e^{-e^\eta}$$(0, 1)$
ccloglog$\log(-\log(\mu))$$e^{-e^\eta}$$(0, 1)$
loglog$-\log(-\log(\mu))$$e^{-e^{-\eta}}$$(0, 1)$
cauchit$\tan(\pi(\mu-0.5))$$0.5 + \arctan(\eta)/\pi$$(0, 1)$