Beta Regression with Censoring

Bayesian Beta regression with censored observations at the boundaries (0 and 1).

Introduction

In some applications, observations very close to 0 or 1 are recorded as exactly 0 or 1 due to measurement limitations or rounding. The Beta distribution is defined on the open interval (0, 1), so exact 0s and 1s require special handling.

The Beta likelihood in pyINLA supports censoring: observations at or beyond a threshold are treated as censored rather than exact values.

Censoring Mechanism

A censoring threshold $\delta \in (0, 0.5)$ is specified:

For example, with $\delta = 0.05$:

$$ y_i = \begin{cases} 0 & \text{if } y_i \leq 0.05 \\ 1 & \text{if } y_i \geq 0.95 \\ y_i & \text{otherwise} \end{cases} $$

Dataset

The dataset contains $n = 1000$ observations with two columns:

ColumnDescriptionType
yResponse (may include 0s and 1s)float
zCovariate (predictor variable)float

Download the CSV file and place it in your working directory.

Implementation in pyINLA

We fit the model using pyINLA with censoring enabled via beta.censor.value:

import pandas as pd
from pyinla import pyinla

# Load data
df = pd.read_csv('dataset_beta_censored.csv')
cens = 0.05  # censoring threshold

# Define model
model = {
    'response': 'y',
    'fixed': ['1', 'z']
}

# Control settings with censoring
control = {
    'family': {
        'beta.censor.value': cens
    }
}

# Fit the Beta regression model with censoring
result = pyinla(
    model=model,
    family='beta',
    data=df,
    control=control
)

# View posterior summaries
print(result.summary_fixed)
print(result.summary_hyperpar)

Extracting Results

The result object contains all posterior inference:

import numpy as np

# Fixed effects: posterior summaries for beta_0 and beta_1
print(result.summary_fixed)

# Extract posterior means
intercept = result.summary_fixed.loc['(Intercept)', 'mean']
slope = result.summary_fixed.loc['z', 'mean']

# Compute predicted probabilities (logit link)
eta_hat = intercept + slope * df['z'].to_numpy()
mu_hat = np.exp(eta_hat) / (1 + np.exp(eta_hat))

# Compute residuals
residuals = df['y'].to_numpy() - mu_hat

# Hyperparameters: precision phi
print(result.summary_hyperpar)

The summary_fixed DataFrame includes columns for mean, sd, 0.025quant, 0.5quant, 0.975quant, and mode.

Results and Diagnostics

The posterior summaries provide estimates for $\beta_0$, $\beta_1$, and the precision $\phi$. Censored observations (at 0 and 1) are shown separately.

Observed vs fitted mean with censored points highlighted
Observed values $y_i$ versus fitted means $\hat{\mu}_i$. Censored points (at 0 or 1) shown in red.
Residuals versus fitted mean
Residuals $y_i - \hat{\mu}_i$ versus fitted means. Censored points shown in red.

To reproduce these figures locally, download the render_beta_plots.py script and run it alongside the CSV dataset.

Data Generation (Reference)

For reproducibility, the dataset was simulated with fixed precision $\phi = 5$:

$$ z_i \sim \mathcal{N}(0, 0.2), \quad \eta_i = 1 + z_i, \quad \mu_i = \text{logit}^{-1}(\eta_i) $$

Then $y_i \sim \text{Beta}(\mu_i \phi, \phi(1-\mu_i))$ with censoring applied at $\delta = 0.05$.

True parameter values: $\beta_0 = 1$, $\phi = 5$.