Learn PyINLA

Negative Binomial Distribution

The negative binomial distribution models count data with overdispersion (variance exceeds the mean). It describes the number of failures before achieving a specified number of successes in independent Bernoulli trials. This distribution is widely used in ecology, epidemiology, and other fields where count data exhibits extra-Poisson variation.

← Back to Likelihoods

Parametrization

The Negative Binomial distribution for a random vector \(\pmb{y} = (y_1, y_2, \dots, y_m)\) of count observations is defined by the probability mass function:

\[\text{Prob}(y_i) = \frac{\Gamma(y_i + n_i)}{\Gamma(n_i) \Gamma(y_i + 1)} p_i^{n_i} (1 - p_i)^{y_i}, \quad y_i = 0, 1, 2, \ldots, \quad i = 1, 2, \dots, m\]

where:

  • \(n_i > 0\) is the number of successful trials (size) or dispersion parameter. Must be strictly positive but need not be an integer.

  • \(p_i \in (0, 1)\) is the probability of success in each trial.

Negative Binomial PMF with different parameter variations
Negative Binomial PMF. Left: \(n=5\) fixed, varying \(p \in \{0.2, 0.5, 0.8\}\). Right: \(p=0.5\) fixed, varying \(n \in \{3, 5, 8\}\).

Mean and Variance

The mean and variance of each observation are:

\[\mu_i = n_i \frac{1 - p_i}{p_i}, \qquad \sigma_i^2 = \mu_i \left(1 + \frac{\mu_i}{n_i}\right), \quad i = 1, 2, \dots, m\]

Note that the variance exceeds the mean (overdispersion), which distinguishes the negative binomial from the Poisson distribution.

The mean is linked to the linear predictor \(\pmb{\eta} = (\eta_1, \eta_2, \dots, \eta_m)\) by:

\[\mu_i = E_i \exp(\eta_i), \quad i = 1, 2, \dots, m\]

or in vector form:

\[\pmb{\mu} = \pmb{E} \circ \exp(\pmb{\eta})\]

where \(\pmb{E} = (E_1, E_2, \dots, E_m)\) represents known constants (exposure), and \(\log(\pmb{E})\) is the offset of \(\pmb{\eta}\).

Possible link functions: default, log, logoffset.

Hyperparameters

The hyperparameter is the dispersion parameter \(n\) (size), which depends on the chosen variant:

variant=0 (default): The dispersion parameter is a scalar:

\[n = \exp(\theta)\]

variant=1: The dispersion parameter scales with exposure:

\[n_i = E_i \exp(\theta), \quad i = 1, 2, \dots, m\]

variant=2: The dispersion parameter scales with scale:

\[n_i = s_i \exp(\theta), \quad i = 1, 2, \dots, m\]

where \(s_i\) is the scale for each observation, and the prior is defined on \(\theta\).

Key: size

Default prior specification:

  • Prior: pc.mgamma

  • Parameters: [7]

  • Initial value: log(10) = 2.303

  • Fixed: False (estimated)

When translated into control['family']['hyper'], the default entry looks like this:

control = {
    'family': {
        'hyper': [{
            'id': 'size',
            'prior': 'pc.mgamma',
            'param': [7],
            'initial': 2.303,
            'fixed': False,
        }]
    }
}

Each entry in control['family']['hyper'] may contain these keys:

  • id - Hyperparameter identifier (size). Can be omitted for the first (and only) hyperparameter.

  • prior - Prior distribution name

  • param - Prior parameters (list)

  • initial - Initial value on log scale

  • fixed - Whether to fix the hyperparameter (True/False)

Note: The PC-prior is available for variant=1.

Validation Rules

pyINLA enforces several validation rules for nbinomial models to ensure correct specification:

Exposure (E)

Key: E

When providing the E (exposure) argument:

  • All values must be strictly positive (> 0)

  • Length must match the number of observations

# With exposure
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="nbinomial", data=df, E=df["exposure"].to_numpy())

Scale

Key: scale

When providing the scale argument (used with variant=2):

  • All values must be strictly positive (> 0)

  • Length must match the number of observations

# With scale (variant=2)
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(
    model=model, family="nbinomial", data=df,
    E=df["E"].to_numpy(),
    scale=df["scale"].to_numpy(),
    control={"family": {"variant": 2}}
)

Variant

Key: control['family']['variant']

The variant must be one of:

  • 0 (default): Scalar dispersion parameter

  • 1: Dispersion scales with exposure

  • 2: Dispersion scales with scale

# variant=1: dispersion scales with exposure
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(
    model=model, family="nbinomial", data=df,
    E=df["E"].to_numpy(),
    control={"family": {"variant": 1}}
)

Hyperparameters

Key: control['family']['hyper']

Hyperparameter configuration is allowed for nbinomial. The size hyperparameter controls the dispersion.

# Custom prior on dispersion parameter
model = {"response": "y", "fixed": ["1", "x"]}
control = {
    'family': {
        'hyper': [{
            'id': 'size',
            'prior': 'pc.mgamma',
            'param': [10.0]
        }]
    }
}
result = pyinla(model=model, family="nbinomial", data=df, control=control)

# Fixed dispersion (not estimated)
control = {
    'family': {
        'hyper': [{
            'id': 'size',
            'initial': 2.303,  # log(10), so size = 10
            'fixed': True
        }]
    }
}
result = pyinla(model=model, family="nbinomial", data=df, control=control)

Ntrials Not Allowed

The Ntrials argument is not allowed for nbinomial. Use nbinomial2 if you need fixed trial counts.

Allowed Link Functions

Key: control['family']['link']

These link functions are supported:

  • log (default)

  • logoffset

Response Values

Response variable \(\pmb{y}\) must be non-negative integers (counts: 0, 1, 2, ...). pyINLA will raise PyINLAError if any response value is negative or non-integer.

Specification

  • family="nbinomial"

  • Required arguments: \(\pmb{y}\) (response), \(\pmb{E}\) (exposure, default = 1), and scale (default = 1).

  • Choose variant with control={'family': {'variant': 0}} (default), {'variant': 1}, or {'variant': 2}.

Notes

As \(n \to \infty\), the negative binomial converges to the Poisson distribution. For numerical reasons, if \(n\) is too large:

\[\frac{\mu}{n} < 10^{-4}\]

then the Poisson limit is used.

The nbinomial2 Distribution

The negative binomial distribution is also available in its "pure form" as the number of excess experiments to get \(n\) successes with a success in the last experiment:

\[\text{Prob}(y_i) = \binom{y_i + n_i - 1}{n_i - 1} (1 - p_i)^{y_i} p_i^{n_i}, \quad y_i = 0, 1, 2, \ldots, \quad i = 1, 2, \dots, m\]

where:

  • \(n_i = 1, 2, \ldots\) is the (fixed) number of successes before stopping.

  • \(p_i\) is the probability of success in each independent trial.

The probability \(p_i\) is linked to the linear predictor \(\eta_i\) via the logit link (default):

\[p_i = \frac{\exp(\eta_i)}{1 + \exp(\eta_i)}, \quad i = 1, 2, \dots, m\]

Possible link functions: default, logit, loga, cauchit, probit, cloglog, ccloglog, loglog.

Hyperparameters for nbinomial2

None.

Validation Rules for nbinomial2

pyINLA enforces several validation rules for nbinomial2 models to ensure correct specification:

Ntrials (Required)

Key: Ntrials

The Ntrials argument is required for nbinomial2:

  • Must be provided for each observation

  • Values must be positive integers

  • Length must match the number of observations

# nbinomial2 requires Ntrials
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="nbinomial2", data=df, Ntrials=df["n"].to_numpy())

Exposure Not Allowed

The E (exposure) argument is not allowed for nbinomial2. Use nbinomial if you need exposure.

Scale Not Allowed

The scale argument is not allowed for nbinomial2. Use nbinomial if you need scale.

No Hyper Configuration

Key: control['family']['hyper']

The nbinomial2 likelihood has no hyperparameters. Do NOT provide control['family']['hyper'] configuration.

Allowed Link Functions

Key: control['family']['link']

These link functions are supported:

  • logit (default)

  • loga

  • cauchit

  • probit

  • cloglog

  • ccloglog

  • loglog

Specification for nbinomial2

  • family="nbinomial2"

  • Required arguments: \(\pmb{y}\) (response) and Ntrials (the value of \(n\) for each observation).