Negative Binomial Distribution

Parametrization

The Negative Binomial distribution for a random vector \(\pmb{y} = (y_1, y_2, \dots, y_m)\) of count observations is defined by the probability mass function:

\[\text{Prob}(y_i) = \frac{\Gamma(y_i + n_i)}{\Gamma(n_i) \Gamma(y_i + 1)} p_i^{n_i} (1 - p_i)^{y_i}, \quad y_i = 0, 1, 2, \ldots, \quad i = 1, 2, \dots, m\]

where:

\(n_i > 0\) is the number of successful trials (size) or dispersion parameter. Must be strictly positive but need not be an integer.
\(p_i \in (0, 1)\) is the probability of success in each trial.

Negative Binomial PMF with different parameter variations — Negative Binomial PMF. **Left**: \(n=5\) fixed, varying \(p \in \{0.2, 0.5, 0.8\}\). **Right**: \(p=0.5\) fixed, varying \(n \in \{3, 5, 8\}\).

Mean and Variance

The mean and variance of each observation are:

\[\mu_i = n_i \frac{1 - p_i}{p_i}, \qquad \sigma_i^2 = \mu_i \left(1 + \frac{\mu_i}{n_i}\right), \quad i = 1, 2, \dots, m\]

Note that the variance exceeds the mean (overdispersion), which distinguishes the negative binomial from the Poisson distribution.

Link Function

The mean is linked to the linear predictor \(\pmb{\eta} = (\eta_1, \eta_2, \dots, \eta_m)\) by:

\[\mu_i = E_i \exp(\eta_i), \quad i = 1, 2, \dots, m\]

or in vector form:

\[\pmb{\mu} = \pmb{E} \circ \exp(\pmb{\eta})\]

where \(\pmb{E} = (E_1, E_2, \dots, E_m)\) represents known constants (exposure), and \(\log(\pmb{E})\) is the offset of \(\pmb{\eta}\).

Possible link functions: default, log.

Hyperparameters

The hyperparameter is the dispersion parameter \(n\) (size), which depends on the chosen variant:

variant=0 (default): The dispersion parameter is a scalar:

\[n = \exp(\theta)\]

variant=1: The dispersion parameter scales with exposure:

\[n_i = E_i \exp(\theta), \quad i = 1, 2, \dots, m\]

variant=2: The dispersion parameter scales with scale:

\[n_i = s_i \exp(\theta), \quad i = 1, 2, \dots, m\]

where \(s_i\) is the scale for each observation, and the prior is defined on \(\theta\).

Key: size

Default prior specification:

Prior: pc.mgamma
Parameters: [7]
Initial value: log(10) = 2.303
Fixed: False (estimated)

When translated into control['family']['hyper'], the default entry looks like this:

control = {
    'family': {
        'hyper': [{
            'id': 'size',
            'prior': 'pc.mgamma',
            'param': [7],
            'initial': 2.303,
            'fixed': False,
        }]
    }
}

Each entry in control['family']['hyper'] may contain these keys:

id - Hyperparameter identifier (size). Can be omitted for the first (and only) hyperparameter.
prior - Prior distribution name
param - Prior parameters (list)
initial - Initial value on log scale
fixed - Whether to fix the hyperparameter (True/False)

Allowed priors for \(\theta = \log(n)\):

Prior	Param shape	Notes
`pc.mgamma` (default)	`[U]` with `U > 0`	PC prior on the dispersion. Recommended weakly-informative default.
`loggamma`	`[shape, rate]`, both positive	Conjugate prior on the precision-like quantity.
`normal` / `gaussian`	`[mean, precision]` on \(\theta\)	Soft Gaussian prior on the log-size.
`flat`	`[]`	Improper flat. Pass `param=[]` explicitly.
`logtnormal`	`[location, scale]`	Truncated-normal on \(\theta\).

Unknown prior names raise pyinla safety check: unknown prior '...' with a "Did you mean ..." hint. Wrong param length similarly trips a safety error before the C engine runs.

Note: The PC-prior (pc.mgamma) is the schema default for the size hyperparameter and is recommended for all three variants (0, 1, 2).

Validation Rules

pyINLA enforces several validation rules for nbinomial models to ensure correct specification:

Exposure (E)

Key: E

When providing the E (exposure) argument:

All values must be strictly positive (> 0)
Length must match the number of observations

# With exposure
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="nbinomial", data=df, E=df["exposure"].to_numpy())

Scale

Key: scale

When providing the scale argument (used with variant=2):

All values must be strictly positive (> 0)
Length must match the number of observations

# With scale (variant=2)
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(
    model=model, family="nbinomial", data=df,
    E=df["E"].to_numpy(),
    scale=df["scale"].to_numpy(),
    control={"family": {"variant": 2}}
)

Variant

Key: control['family']['variant']

The variant must be one of:

0 (default): Scalar dispersion parameter
1: Dispersion scales with exposure
2: Dispersion scales with scale

# variant=1: dispersion scales with exposure
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(
    model=model, family="nbinomial", data=df,
    E=df["E"].to_numpy(),
    control={"family": {"variant": 1}}
)

Hyperparameters

Key: control['family']['hyper']

Hyperparameter configuration is allowed for nbinomial. The size hyperparameter controls the dispersion.

# Custom prior on dispersion parameter
model = {"response": "y", "fixed": ["1", "x"]}
control = {
    'family': {
        'hyper': [{
            'id': 'size',
            'prior': 'pc.mgamma',
            'param': [10.0]
        }]
    }
}
result = pyinla(model=model, family="nbinomial", data=df, control=control)

# Fixed dispersion (not estimated)
control = {
    'family': {
        'hyper': [{
            'id': 'size',
            'initial': 2.303,  # log(10), so size = 10
            'fixed': True
        }]
    }
}
result = pyinla(model=model, family="nbinomial", data=df, control=control)

Ntrials Not Allowed

The Ntrials argument is not allowed for nbinomial. Use nbinomial2 if you need fixed trial counts.

Allowed Link Functions

Key: control['family']['link']

These link functions are supported:

log (default)

Response Values

Response variable \(\pmb{y}\) must be non-negative integers (counts: 0, 1, 2, ...). pyINLA will raise PyINLAError if any response value is negative or non-integer.

Specification

family="nbinomial"
Required arguments: \(\pmb{y}\) (response), \(\pmb{E}\) (exposure, default = 1), and scale (default = 1).
Choose variant with control={'family': {'variant': 0}} (default), {'variant': 1}, or {'variant': 2}.

Notes

As \(n \to \infty\), the negative binomial converges to the Poisson distribution. For numerical reasons, if \(n\) is too large:

\[\frac{\mu}{n} < 10^{-4}\]

then the Poisson limit is used.

The nbinomial2 Distribution

The negative binomial distribution is also available in its "pure form" as the number of excess experiments to get \(n\) successes with a success in the last experiment:

\[\text{Prob}(y_i) = \binom{y_i + n_i - 1}{n_i - 1} (1 - p_i)^{y_i} p_i^{n_i}, \quad y_i = 0, 1, 2, \ldots, \quad i = 1, 2, \dots, m\]

where:

\(n_i = 1, 2, \ldots\) is the (fixed) number of successes before stopping.
\(p_i\) is the probability of success in each independent trial.

Link Function for nbinomial2

The probability \(p_i\) is linked to the linear predictor \(\eta_i\) via the logit link (default):

\[p_i = \frac{\exp(\eta_i)}{1 + \exp(\eta_i)}, \quad i = 1, 2, \dots, m\]

Possible link functions: default, logit, loga, cauchit, probit, cloglog, ccloglog, loglog.

Hyperparameters for nbinomial2

None.

Validation Rules for nbinomial2

pyINLA enforces several validation rules for nbinomial2 models to ensure correct specification:

Ntrials (Required)

Key: Ntrials

The Ntrials argument is required for nbinomial2:

Must be provided for each observation
Values must be positive integers
Length must match the number of observations

# nbinomial2 requires Ntrials
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="nbinomial2", data=df, Ntrials=df["n"].to_numpy())

Exposure Not Allowed

The E (exposure) argument is not allowed for nbinomial2. Use nbinomial if you need exposure.

Scale Not Allowed

The scale argument is not allowed for nbinomial2. Use nbinomial if you need scale.

No Hyper Configuration

Key: control['family']['hyper']

The nbinomial2 likelihood has no hyperparameters. Do NOT provide control['family']['hyper'] configuration.

Allowed Link Functions

Key: control['family']['link']

These link functions are supported:

logit (default)
loga
cauchit
probit
cloglog
ccloglog
loglog

Specification for nbinomial2

family="nbinomial2"
Required arguments: \(\pmb{y}\) (response) and Ntrials (the value of \(n\) for each observation).

Worked examples

Negative Binomial Regression