Learn PyINLA

Binomial Distribution

The binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent Bernoulli trials. Each trial has the same probability of success. This distribution is widely used in statistics for modeling binary outcomes.

← Back to Likelihoods

Parametrization

The probability mass function (PMF) for the binomial distribution, considering \(\pmb{y}\) as a vector, is expressed as:

\[f(\pmb{y}) = \prod_{i=1}^{m} \binom{n_i}{y_i} p_i^{y_i} (1 - p_i)^{n_i - y_i},\]

where:

  • \(\pmb{y} = (y_1, y_2, \ldots, y_m)\) represents a vector of the number of successes for each trial group.

  • \(\pmb{n} = (n_1, n_2, \ldots, n_m)\) is the vector of the number of trials in each group.

  • \(\pmb{p} = (p_1, p_2, \ldots, p_m)\) is the vector of probabilities of success in each trial group.

  • \(y_i = 0, 1, 2, \ldots, n_i\) represents the number of successes in the \(i\)-th trial group.

Assume \(m\) = 1. To demonstrate how different Binomial parameters \(n\) and \(p\) shape the probability mass function (PMF), we create two subplots in Figure 1:

Two separate Binomial PMF plots. Left plot: Fix \(n=10\) and vary \(p\) (0.2, 0.5, 0.8). Right plot: Fix \(p=0.5\) and vary \(n\) (5, 10, 15). Notice how each parameter affects the distribution of successes.

Left plot (fixed \(n=10\)): As \(p\) changes from 0.2 to 0.8, the mass of the distribution shifts from left (fewer successes) to right (more successes).

Right plot (fixed \(p=0.5\)): As \(n\) increases, the possible range of successes expands, resulting in a wider spread of PMF values.

Mean and Variance

The mean and variance vectors for the binomial case are:

\[\pmb{\mu} = \pmb{n} \circ \pmb{p} \quad \text{and} \quad \pmb{\sigma}^2 = \pmb{n} \circ \pmb{p} \circ (1 - \pmb{p}),\]

where \(\circ\) denotes element-wise multiplication.

The probability vector \(\pmb{p}\) is linked to the linear predictor vector \(\pmb{\eta}\) via a link function. The default is the logit link:

\[\pmb{p}(\pmb{\eta}) = \frac{\exp(\pmb{\eta})}{1 + \exp(\pmb{\eta})}.\]

Supported link functions:

  • default: same as logit

  • logit (default): \(\eta = \log\bigl(\frac{p}{1-p}\bigr)\)

  • probit: \(\eta = \Phi^{-1}(p)\) where \(\Phi\) is the standard normal CDF

  • cloglog: \(\eta = \log(-\log(1-p))\) (complementary log-log)

  • ccloglog: \(\eta = -\log(-\log(p))\)

  • cauchit: \(\eta = \tan(\pi(p - 0.5))\)

  • log: \(\eta = \log(p)\)

  • loglog: \(\eta = -\log(-\log(p))\)

  • loga: log-a link (requires parameter a with \(0 < a \le 1\) via control={'family': {'link': 'loga', 'control.link': {'a': ...}}})

  • robit: robit link (t-distribution based)

  • sn: skew-normal link

  • powerlogit: power logit link

To specify a link function, use control={'family': {'link': 'probit'}}.

Hyperparameters

The binomial likelihood has no hyperparameters. The success probability \(p\) is fully determined by the linear predictor \(\eta\) through the link function.

Validation Rules

pyINLA enforces several validation rules for binomial models to ensure correct specification:

Ntrials Argument

Key: Ntrials

The Ntrials argument specifies the number of trials \(n_i\) for each observation. The validation rules are:

  • Aggregated binomial: When \(y_i\) can be any integer from 0 to \(n_i\), you must provide Ntrials.

  • Bernoulli (binary): When all responses are strictly 0 or 1, Ntrials can be omitted (implicitly \(n_i = 1\)).

  • Length requirement: Ntrials must have the same length as the response vector.

  • Positive integers: All entries in Ntrials must be positive integers.

# Bernoulli (binary 0/1): Ntrials not required
result = pyinla(model={'response': 'y', 'fixed': ['1', 'x']}, family="binomial", data=df)

# Aggregated binomial: Ntrials required
result = pyinla(model={'response': 'successes', 'fixed': ['1', 'x']}, family="binomial", data=df, Ntrials=df["trials"])

Link Function

Key: control['family']['link']

The link function can be specified via the control dictionary. If not specified, the default logit link is used.

# Default logit link
result = pyinla(model={'response': 'y', 'fixed': ['1', 'x']}, family="binomial", data=df)

# Probit link
result = pyinla(model={'response': 'y', 'fixed': ['1', 'x']}, family="binomial", data=df,
                control={'family': {'link': 'probit'}})

# Complementary log-log link
result = pyinla(model={'response': 'y', 'fixed': ['1', 'x']}, family="binomial", data=df,
                control={'family': {'link': 'cloglog'}})

Variant Parameter

Key: control['family']['variant']

The variant parameter selects between the standard binomial and the negative binomial variant:

  • variant=0 (default): Standard binomial distribution.

  • variant=1: Negative binomial variant (see section below).

# Standard binomial (default)
result = pyinla(model={'response': 'y', 'fixed': ['1', 'x']}, family="binomial", data=df, Ntrials=n)

# Negative binomial variant
result = pyinla(model={'response': 'y', 'fixed': ['1', 'x']}, family="binomial", data=df, Ntrials=n,
                control={'family': {'variant': 1}})

Response Values

Response variable \(\pmb{y}\) must satisfy:

  • All values must be non-negative integers (0, 1, 2, ...)

  • All values must be less than or equal to the corresponding Ntrials value

pyINLA will raise PyINLAError if any response value is negative, non-integer, or exceeds its trial count.

Specification

  • family="binomial"

  • Required arguments:

    • \(\pmb y\) and \(\pmb n\): observed data (\(y_i\) for each observation) and trials (Ntrials).

  • Optional arguments:

    • control={'family': {'variant': 0}} for standard binomial (default), and control={'family': {'variant': 1}} for the negative binomial variant.

Negative Binomial Variant

When variant=1 is specified, the negative binomial distribution is used instead of the standard binomial. The probability mass function is:

\[f(\pmb{n}) = \prod_{i=1}^{m} \binom{n_i-1}{y_i-1} p_i^{y_i} (1-p_i)^{n_i-y_i}\]

for given \(\pmb{y} = (y_1, y_2, \ldots, y_m)\) where \(y_i = 1, 2, \ldots\), and response \(n_i - y_i = 0, 1, 2, \ldots\).

Note: In this variant, the "data" enters via the Ntrials argument (since \(\pmb{y}\) is predetermined), which may seem unconventional.

Specification for Expert Version

  • family="xbinomial"

  • Required arguments:

    • \(\pmb y\) and \(\pmb n\): observed data (\(y_i\) for each observation) and trials (Ntrials).

  • Optional arguments:

    • scale=q, which scales the probability with \(0< \pmb q \le1\) into \(\pmb p'\), where \[\pmb p' = \pmb q \pmb p( \pmb \eta).\] By default, \(\pmb q= \pmb 1\). Note that “fitted values” will still be be \(\pmb p(\pmb \eta)\).

Expert version notes: The expert version (xbinomial) allows non-integer values for both \(\pmb{y}\) and \(\pmb{n}\). The condition \(0 \le y_i \le n_i\) still applies. The normalizing constant is computed using the integer parts (floor) of \(y\) and \(n\). Note that this extension can make the marginal likelihood estimate less interpretable.

Note: If the response is a factor, it must be converted to {0, 1} before calling pyinla(), as this conversion is not done automatically.