Weibull Survival Analysis

A tutorial on Bayesian survival analysis with Weibull likelihood, handling right-censored time-to-event data.

Introduction

Survival analysis deals with time-to-event data where some observations may be censored (the event was not observed during the study period). The Weibull distribution is a popular choice for survival models because its hazard function can model increasing, decreasing, or constant hazard rates depending on the shape parameter.

This example demonstrates fitting a Weibull survival model with right-censored data using pyINLA's weibullsurv family and the inla_surv response object.

The Survival Model

For survival data, we observe pairs $(t_i, \delta_i)$ where $t_i$ is the observed time and $\delta_i$ is the event indicator:

The Weibull hazard function is:

$$ h(t) = \alpha \lambda t^{\alpha - 1} $$

The hazard function shape depends on $\alpha$:

Handling Censored Data

The survival likelihood accounts for censoring. For an observation $(t_i, \delta_i)$:

$$ L_i = \begin{cases} f(t_i) & \text{if } \delta_i = 1 \text{ (event)} \\ S(t_i) & \text{if } \delta_i = 0 \text{ (censored)} \end{cases} $$

where $f(t)$ is the PDF and $S(t) = 1 - F(t)$ is the survival function.

In pyINLA, this is handled automatically using the inla_surv function to create the survival response object.

Dataset

The dataset contains $n = 1000$ observations with approximately 56% censoring:

ColumnDescriptionType
yObserved time (min of event time and censoring time)float
xCovariate (standardized)float
eventEvent indicator (1 = event, 0 = censored)int

Implementation in pyINLA

Fitting a Weibull survival model requires using inla_surv to create the survival response:

import pandas as pd
from pyinla import pyinla, inla_surv

# Load survival data
df = pd.read_csv('dataset_weibull_surv_v0.csv')

# Create survival response object
y_surv = inla_surv(
    time=df['y'].to_numpy(),
    event=df['event'].to_numpy()
)

# Define model with survival response
model = {
    'response': y_surv,
    'fixed': ['1', 'x']
}

# Specify variant
control = {
    'family': {
        'variant': 0
    },
    'compute': {
        'dic': True,
        'cpo': True,
        'mlik': True
    }
}

# Fit the survival model
result = pyinla(
    model=model,
    family='weibullsurv',
    data=df,
    control=control
)

# View results
print(result.summary_fixed)
print(result.summary_hyperpar)

Interpreting Results

The fixed effects represent the log-scale relationship with the hazard:

A positive $\beta_1$ means higher values of $x$ are associated with higher $\lambda$, which corresponds to shorter expected survival times.

The hyperparameter summary shows the posterior for the shape parameter $\alpha$:

Using Variant 1

For variant 1 parameterization:

# Load variant 1 data
df_v1 = pd.read_csv('dataset_weibull_surv_v1.csv')

y_surv_v1 = inla_surv(
    time=df_v1['y'].to_numpy(),
    event=df_v1['event'].to_numpy()
)

model_v1 = {'response': y_surv_v1, 'fixed': ['1', 'x']}
control_v1 = {'family': {'variant': 1}}

result_v1 = pyinla(
    model=model_v1,
    family='weibullsurv',
    data=df_v1,
    control=control_v1
)

print(result_v1.summary_fixed)
print(result_v1.summary_hyperpar)

Results and Diagnostics

The posterior summaries provide estimates for $\beta_0$, $\beta_1$, and the shape parameter $\alpha$. The true parameters used for data generation were $\beta_0 = 1.0$, $\beta_1 = 2.2$, and $\alpha = 1.1$.

Scatter plot of observed time vs fitted median survival time
Observed times $y_i$ versus fitted median survival times.
Residuals versus fitted median survival time
Residuals $y_i - \hat{m}_i$ versus fitted median.
Posterior density for the shape parameter alpha
Posterior density for the shape parameter $\alpha$.

To reproduce these figures locally, download the render_weibullsurv_plots.py script and run it alongside the CSV dataset.

Data Generation (Reference)

The survival data was simulated with random censoring:

# Data generation with censoring
import numpy as np
import pandas as pd

rng = np.random.default_rng(2026)
n = 1000
alpha = 1.1
beta0, beta1 = 1.0, 2.2

x = rng.uniform(-1, 1, size=n)
x = (x - x.mean()) / x.std()

eta = beta0 + beta1 * x
lam = np.exp(eta)

# True event times (variant=0)
scale_v0 = lam ** (-1/alpha)
y_true = rng.weibull(alpha, size=n) * scale_v0

# Censoring times (exponential random censoring)
cens_time = rng.exponential(scale=np.median(y_true), size=n)

# Observed times and event indicator
y_obs = np.minimum(y_true, cens_time)
event = (y_true <= cens_time).astype(int)

df = pd.DataFrame({
    'y': y_obs,
    'x': np.round(x, 6),
    'event': event
})

print(f"Censoring rate: {100 * (1 - event.mean()):.1f}%")
# df.to_csv('dataset_weibull_surv_v0.csv', index=False)

Summary

Key points for Weibull survival analysis in pyINLA: