Negative Binomial Variant

Using the binomial family with variant=1 to model negative binomial data.

Introduction

In this tutorial, we demonstrate how to use the binomial family with the variant=1 option to fit a negative binomial model. This variant is useful when modeling the number of failures before a given number of successes is achieved.

The negative binomial variant swaps the roles of successes and failures, with the "data" entering through the Ntrials argument rather than the response.

The Model

When variant=1 is specified, the negative binomial distribution is used instead of the standard binomial. The probability mass function is:

$$ f(\pmb{n}) = \prod_{i=1}^{m} \binom{n_i-1}{y_i-1} p_i^{y_i} (1-p_i)^{n_i-y_i} $$

where:

Note: In this variant, the "data" enters via the Ntrials argument (since $\pmb{y}$ is predetermined), which may seem unconventional.

Linear Predictor

The linear predictor $\eta_i$ links the log-odds of success to the covariates:

$$ \eta_i = \beta_0 + \beta_1 z_i $$

where $\beta_0$ is the intercept and $\beta_1$ is the slope coefficient for covariate $z_i$.

The probability of success is:

$$ p_i = \frac{\exp(\eta_i)}{1 + \exp(\eta_i)} = \text{logit}^{-1}(\eta_i) $$

Hyperparameters

Like the standard binomial, the negative binomial variant has no hyperparameters. The model is fully determined by the regression coefficients $\boldsymbol{\beta}$.

Dataset

The dataset contains $n = 100$ observations with three columns:

ColumnDescriptionType
yNumber of successes (predetermined)integer
zCovariate (predictor variable)float
NtrialsTotal count ($y + \text{failures}$)integer

Download the CSV file and place it in your working directory.

Implementation in pyINLA

We fit the model using pyINLA by specifying control_family={'variant': 1}:

  1. Load the data into a pandas DataFrame
  2. Extract the Ntrials vector
  3. Define the model structure (response and fixed effects)
  4. Call pyinla() with family='binomial' and control_family={'variant': 1}
import pandas as pd
from pyinla import pyinla

# Load data
df = pd.read_csv('dataset_nbinomial.csv')
Ntrials = df['Ntrials'].to_numpy()

# Define model: y ~ 1 + z (intercept + slope)
model = {
    'response': 'y',
    'fixed': ['1', 'z']
}

# Fit the negative binomial variant model
result = pyinla(
    model=model,
    family='binomial',
    data=df,
    Ntrials=Ntrials,
    control_family={'variant': 1}
)

# View posterior summaries for fixed effects
print(result.summary_fixed)

Extracting Results

The result object contains all posterior inference:

# Fixed effects: posterior summaries for beta_0 and beta_1
print(result.summary_fixed)

# Extract posterior means
intercept = result.summary_fixed.loc['(Intercept)', 'mean']
slope = result.summary_fixed.loc['z', 'mean']

# Compute predicted probabilities
import numpy as np
eta_hat = intercept + slope * df['z'].to_numpy()
p_hat = np.exp(eta_hat) / (1 + np.exp(eta_hat))

The summary_fixed DataFrame includes columns for mean, sd, 0.025quant, 0.5quant, 0.975quant, and mode.

Results and Diagnostics

The posterior summaries provide point estimates and credible intervals for $\beta_0$ and $\beta_1$.

Observed Ntrials vs expected total count
Observed total count (Ntrials) versus expected total count $y_i / \hat{p}_i$.
Residuals versus expected total count
Residuals (Ntrials $- y_i / \hat{p}_i$) versus expected total count.

To reproduce these figures locally, download the render_nbinomial_plots.py script and run it alongside the CSV dataset.

Data Generation (Reference)

For reproducibility, the dataset was simulated from the following data-generating process:

$$ z_i \stackrel{\text{iid}}{\sim} \mathcal{N}(0, 1), \quad y_i \sim \text{Uniform}\{1, 2, 3\}, \quad n_i = y_i + \text{NegBin}(y_i, p_i) $$

where $p_i = \text{logit}^{-1}(\beta_0 + \beta_1 z_i)$ with true parameter values: $\beta_0 = 1$, $\beta_1 = 1$.

# Data generation script (for reference only)
import numpy as np
import pandas as pd

rng = np.random.default_rng(2026)
n = 100
beta0, beta1 = 1.0, 1.0

z = rng.normal(size=n)
eta = beta0 + beta1 * z
prob = np.exp(eta) / (1 + np.exp(eta))

# y is predetermined number of successes
y = rng.integers(1, 4, size=n)  # y = sample(1:3, size=n, replace=TRUE)
# Ntrials = y + number of failures (negative binomial)
Ntrials = y + rng.negative_binomial(n=y, p=prob)

df = pd.DataFrame({'y': y, 'z': np.round(z, 6), 'Ntrials': Ntrials})
# df.to_csv('dataset_nbinomial.csv', index=False)