Learn PyINLA

Beta Distribution

The Beta distribution is a continuous probability distribution defined on the interval (0, 1). It is commonly used to model random variables representing proportions or probabilities, such as success rates, percentages, or concentration indices.

← Back to Likelihoods

Parametrization

The Beta distribution for a random vector \(\pmb{y} = (y_1, y_2, \dots, y_n)\) of observations on \((0, 1)\) is defined by the probability density function:

\[\pi(y_i) = \frac{1}{B(a_i, b_i)} y_i^{a_i-1}(1-y_i)^{b_i-1}, \quad 0 < y_i < 1, \quad i = 1, 2, \dots, n\]

where \(a_i > 0\) and \(b_i > 0\) are shape parameters, and \(B(a_i, b_i)\) is the Beta function:

\[B(a_i, b_i) = \frac{\Gamma(a_i)\Gamma(b_i)}{\Gamma(a_i+b_i)}\]

Beta PDFs with different parameter variations
Beta PDFs illustrating how the distribution changes as parameters vary. Distributions become more peaked as either parameter grows, and skew left or right depending on whether \(a\) or \(b\) is larger.

Mean and Variance

The Beta distribution is reparameterized using the mean \(\mu_i\) and precision parameter \(\phi_i\):

\[\mu_i = \frac{a_i}{a_i + b_i}, \qquad \phi_i = a_i + b_i, \quad i = 1, 2, \dots, n\]

Under this parameterization, the mean and variance are:

\[\text{E}(y_i) = \mu_i, \qquad \text{Var}(y_i) = \frac{\mu_i(1-\mu_i)}{1+\phi_i}, \quad i = 1, 2, \dots, n\]

The shape parameters are recovered as:

\[a_i = \mu_i \phi_i, \qquad b_i = \phi_i (1-\mu_i), \quad i = 1, 2, \dots, n\]

The precision parameter \(\phi_i\) controls the variance: for fixed \(\mu_i\), larger \(\phi_i\) results in smaller variance.

The mean is linked to the linear predictor \(\pmb{\eta} = (\eta_1, \eta_2, \dots, \eta_n)\) using the logit link (default):

\[\mu_i = \frac{\exp(\eta_i)}{1 + \exp(\eta_i)}, \quad i = 1, 2, \dots, n\]

Possible link functions: logit (default), loga, cauchit, probit, cloglog, ccloglog, loglog.

Censoring

In some applications, observations close to 0 or 1 are censored and recorded exactly as 0 or 1. A censoring threshold \(0 < \delta < 0.5\) can be specified:

  • Observations \(y_i \leq \delta\) are treated as censored at 0.

  • Observations \(y_i \geq 1 - \delta\) are treated as censored at 1.

By default, no censoring is applied (\(\delta = 0\)).

Hyperparameters

The Beta likelihood has one hyperparameter: the precision parameter \(\phi > 0\). It is represented internally on the log scale:

\[\theta = \log(\phi), \qquad \phi = \exp(\theta)\]

With an optional scale vector \(\pmb{s} = (s_1, s_2, \dots, s_n)\), the observation-specific precision is:

\[\phi_i = s_i \cdot \phi = s_i \exp(\theta), \quad i = 1, 2, \dots, n\]

Hyperparameter \(\theta\) (phi)

The default configuration assigns a loggamma prior to \(\theta\) with shape and rate parameters \((1, 0.1)\). The initial value is set to \(\theta = \log(10) \approx 2.303\).

Key: phi (not prec)

When translated into control['family']['hyper'], the default entry is:

control = {
    'family': {
        'hyper': [{
            'id': 'phi',
            'prior': 'loggamma',
            'param': [1.0, 0.1],
            'initial': 2.303,
            'fixed': False,
        }]
    }
}

Each entry in control['family']['hyper'] may contain these keys:

  • id - Hyperparameter identifier (phi). Can be omitted for the first (and only) hyperparameter.

  • prior - Prior distribution name

  • param - Prior parameters (list)

  • initial - Initial value on log scale

  • fixed - Whether to fix the hyperparameter (True/False)

The precision hyperparameter (phi) accepts any prior from the prior registry. The most commonly useful choices for a positive precision are:

PriorParam shapeUse when
loggamma (default)[shape, rate], both positiveConjugate prior on the log-precision; widely used.
pc.prec[U, alpha] with P(sd > U) = alphaPenalised-complexity prior on the standard deviation; weakly informative on a meaningful scale.
normal / gaussian[mean, precision] on \(\theta\)Soft Gaussian prior on the log-precision; useful for nudging without strong shrinkage.
flat[]Improper flat. Pass param=[] explicitly.
logtnormal[location, scale]Truncated-normal on \(\theta\); bounded informative prior.

The precision can also be pinned at a known value by setting fixed=True (no prior required). Unknown prior names raise a clear pyinla safety check: unknown prior '...' error before the engine runs, with a "Did you mean ..." hint for typos. Wrong param length similarly trips a safety error.

Validation Rules

pyINLA enforces several validation rules for beta models to ensure correct specification:

Response Values

Key: response variable

Response values must satisfy:

  • Without censoring: All values must be strictly in (0, 1) - exclusive bounds

  • With censoring (beta.censor.value set): Values can be in [0, 1] - inclusive bounds

Censoring Threshold (beta.censor.value)

Key: control['family']['beta.censor.value']

When providing the censoring threshold:

  • Must be in the range [0, 0.5) - values at or above 0.5 are rejected

  • Default is 0 (no censoring)

# With censoring threshold
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="beta", data=df,
                control={"family": {"beta.censor.value": 0.05}})

Scale

Key: scale

When providing the scale argument:

  • All values must be strictly positive (> 0)

  • Length must match the number of observations

Hyperparameters

Key: control['family']['hyper']

When configuring hyperparameters:

  • If prior is omitted, the schema default is used (loggamma with param=[1, 0.1]); a fixed entry can also omit prior and param.

  • If specified, the prior name must be in the registry (see the table above for the most common choices). Unknown names raise pyinla safety check: unknown prior.

  • param length must match the prior's expected count.

# Custom loggamma prior on precision
model = {"response": "y", "fixed": ["1", "z"]}
control = {
    'family': {
        'hyper': [{
            'id': 'phi',
            'prior': 'loggamma',
            'param': [1.0, 0.5],
            'initial': 1.609,  # log(5), so phi = 5
            'fixed': False
        }]
    }
}
result = pyinla(model=model, family="beta", data=df, scale=scale, control=control)

# PC prior on precision
control = {
    'family': {
        'hyper': [{
            'id': 'phi',
            'prior': 'pc.prec',
            'param': [1.0, 0.01],
            'initial': 1.609,
            'fixed': False
        }]
    }
}
result = pyinla(model=model, family="beta", data=df, scale=scale, control=control)

# Fixed precision (not estimated)
control = {
    'family': {
        'hyper': [{
            'id': 'phi',
            'initial': 2.303,  # log(10), so phi = 10
            'fixed': True
        }]
    }
}
result = pyinla(model=model, family="beta", data=df, scale=scale, control=control)

Exposure Not Allowed

The E (exposure) argument is not allowed for beta. Use nbinomial or poisson if you need exposure.

Ntrials Not Allowed

The Ntrials argument is not allowed for beta. Use binomial or betabinomial if you need trial counts.

Variant Not Allowed

The control['family']['variant'] option is not allowed for beta.

Allowed Link Functions

Key: control['family']['link']

These link functions are supported:

  • logit (default)

  • loga

  • cauchit

  • probit

  • cloglog

  • ccloglog

  • loglog

# With probit link
model = {"response": "y", "fixed": ["1", "x"]}
result = pyinla(model=model, family="beta", data=df,
                control={"family": {"link": "probit"}})

Specification

  • family="beta"

  • Required arguments: \(\pmb{y}\) (response).

  • Optional arguments: \(\pmb{s}\) (scale, default = 1) and beta.censor.value (\(\delta\), default = 0).