Beta-Binomial Distribution

Parametrization

The Beta-Binomial distribution for a random vector \(\pmb{y} = (y_1, y_2, \dots, y_n)\) of count observations arises from a hierarchical model. The probability of success \(p_i\) follows a Beta distribution:

\[\pi(p_i) = \frac{1}{B(\alpha, \beta)} p_i^{\alpha-1} (1-p_i)^{\beta-1}, \quad \alpha > 0, \; \beta > 0, \quad i = 1, 2, \dots, n\]

Given \(p_i\), the response \(y_i\) is Binomial:

\[\pi(y_i \mid p_i) = \binom{N_i}{y_i} p_i^{y_i} (1-p_i)^{N_i-y_i}, \quad y_i = 0, 1, \ldots, N_i\]

Marginalizing over \(p_i\), the distribution of \(y_i\) is Beta-Binomial:

\[\pi(y_i) = \binom{N_i}{y_i} \frac{B(y_i + \alpha, N_i - y_i + \beta)}{B(\alpha, \beta)}, \quad i = 1, 2, \dots, n\]

where \(B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}\) is the Beta function.

Beta-Binomial PMFs with different parameter variations — Beta-Binomial PMFs for different \((\alpha, \beta)\) pairs with fixed \(N=10\). The distribution shape changes with different mean probabilities and overdispersion levels.

Mean and Variance

The mean and variance of each observation are:

\[\mu_i = N_i \mu_p, \qquad \sigma_i^2 = N_i \mu_p (1 - \mu_p) \left(1 + (N_i - 1) \rho\right), \quad i = 1, 2, \dots, n\]

where:

\(\mu_p = \frac{\alpha}{\alpha + \beta}\) is the mean probability from the Beta distribution.
\(\rho = \frac{1}{\alpha + \beta + 1}\) is the overdispersion parameter (\(0 < \rho < 1\)).

The term \((1 + (N_i - 1)\rho)\) inflates the variance beyond the standard Binomial, capturing overdispersion.

Link Function

The mean probability \(\mu_p\) is linked to the linear predictor \(\pmb{\eta} = (\eta_1, \eta_2, \dots, \eta_n)\) using the logit link (default):

\[\mu_p = \frac{\exp(\eta_i)}{1 + \exp(\eta_i)}, \quad i = 1, 2, \dots, n\]

Possible link functions: logit (default), loga, cauchit, probit, cloglog, ccloglog, loglog, robit, sn.

Hyperparameters

The betabinomial likelihood has one hyperparameter: the overdispersion parameter \(\rho \in (0, 1)\). It is represented internally on the logit scale:

\[\theta = \mathrm{logit}(\rho), \qquad \rho = \frac{\exp(\theta)}{1 + \exp(\theta)}\]

where the prior is defined on \(\theta\).

Hyperparameter \(\theta\) (rho)

The default configuration assigns a gaussian prior to \(\theta\) with mean 0 and precision 0.4. The initial value is set to \(\theta = 0\), corresponding to \(\rho = 0.5\).

Key: rho

When translated into control['family']['hyper'], the default entry is:

control = {
    'family': {
        'hyper': [{
            'id': 'rho',
            'prior': 'gaussian',
            'param': [0.0, 0.4],
            'initial': 0,
            'fixed': False,
        }]
    }
}

Each entry in control['family']['hyper'] may contain these keys:

id - Hyperparameter identifier (rho). Can be omitted for the first (and only) hyperparameter.
prior - Prior distribution name
param - Prior parameters (list)
initial - Initial value on logit scale
fixed - Whether to fix the hyperparameter (True/False)

The overdispersion hyperparameter (rho) accepts any prior from the prior registry. The most commonly useful choices on \(\theta = \mathrm{logit}(\rho)\) are:

Prior	Param shape	Use when
`gaussian` (default) / `normal`	`[mean, precision]` on \(\theta\)	Soft Gaussian prior on the logit-transformed overdispersion.
`logitbeta`	`[a, b]`, both positive	Beta prior on \(\rho \in (0, 1)\) via the logit map.
`flat`	`[]`	Improper flat. Pass `param=[]` explicitly.
`loggamma`	`[shape, rate]`, both positive	Less common on this slot; accepted by the engine.
`logtnormal`	`[location, scale]`	Truncated-normal on \(\theta\).

The overdispersion can also be pinned at a known value by setting fixed=True (no prior required). Unknown prior names raise a clear pyinla safety check: unknown prior '...' error before the engine runs, with a "Did you mean ..." hint for typos. Wrong param length similarly trips a safety error.

Validation Rules

pyINLA enforces several validation rules for betabinomial models to ensure correct specification:

Ntrials (Required)

Key: Ntrials

The Ntrials argument is required for betabinomial:

Must be provided for each observation
Values must be positive integers
Length must match the number of observations

# betabinomial requires Ntrials
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="betabinomial", data=df, Ntrials=df["n"].to_numpy())

Hyperparameters

Key: control['family']['hyper']

When configuring hyperparameters (see the Hyperparameters section above for the full prior registry and key listing):

If prior is omitted, the schema default is used (gaussian with param=[0, 0.4]); a fixed entry can also omit prior and param.
If specified, the prior name must be in the registry. Unknown names raise pyinla safety check: unknown prior.
param length must match the prior's expected count.

# Custom hyperparameter configuration (default values shown)
model = {"response": "y", "fixed": ["1", "x"]}
control = {
    'family': {
        'hyper': [{
            'id': 'rho',
            'prior': 'gaussian',
            'param': [0.0, 0.4],
            'initial': 0,
            'fixed': False
        }]
    }
}
result = pyinla(model=model, family="betabinomial", data=df,
                Ntrials=df["n"].to_numpy(), control=control)

# Fixed overdispersion (not estimated)
control = {
    'family': {
        'hyper': [{
            'id': 'rho',
            'initial': 0.0,  # logit(0.5) = 0, so rho = 0.5
            'fixed': True
        }]
    }
}
result = pyinla(model=model, family="betabinomial", data=df,
                Ntrials=df["n"].to_numpy(), control=control)

Variant

Key: control['family']['variant']

The variant parameter is allowed for betabinomial:

0 (default): Standard parameterization
1: Alternative parameterization

Response Values

Response variable \(\pmb{y}\) must be non-negative integers with \(0 \leq y_i \leq N_i\) for each observation. pyINLA will raise PyINLAError if any response value is negative, non-integer, or exceeds its corresponding Ntrials value.

Exposure Not Allowed

The E (exposure) argument is not allowed for betabinomial. Use nbinomial or poisson if you need exposure.

Scale Not Allowed

The scale argument is not allowed for betabinomial.

Allowed Link Functions

Key: control['family']['link']

These link functions are supported:

logit (default)
loga
cauchit
probit
cloglog
ccloglog
loglog
robit
sn

# With probit link
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="betabinomial", data=df,
                Ntrials=df["n"].to_numpy(), control={"family": {"link": "probit"}})

Specification

family="betabinomial"
Required arguments: \(\pmb{y}\) (response) and Ntrials (\(N_i\)).
Optional: control['family']['variant'] (0 or 1), control['family']['hyper'], control['family']['link'].

Worked examples

Beta-Binomial Regression