Learn PyINLA

Log-Normal Distribution

The log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. It is used to model variables that are positively skewed, meaning they have a long right tail, such as income, stock prices, or survival times.

← Back to Likelihoods

Parametrization

The Log-Normal distribution for a vector of observations \(\pmb{y} = (y_1, y_2, \ldots, y_n)\) has the following probability density function (PDF):

\[f(y_i \mid \mu_i, \tau) = \frac{\sqrt{\tau}}{y_i \sqrt{2\pi}} \exp\left(-\frac{\tau}{2} (\log(y_i) - \mu_i)^2\right), \quad y_i > 0, \; i = 1, 2, \ldots, n\]

where:

  • \(\pmb{y} = (y_1, y_2, \ldots, y_n)\) represents the observed positive response values.

  • \(\pmb{\mu} = (\mu_1, \mu_2, \ldots, \mu_n)\) represents the location parameters (log-mean) for each observation.

  • \(\tau > 0\) is the precision parameter, controlling the spread of the distribution in the log scale.

Figure 1 illustrates how the Log-Normal PDF changes for different shape parameters (standard deviation \(\sigma = 1/\sqrt{\tau}\) in the log scale), while fixing \(\mu = 0\).

Log-Normal PDFs for different \(\sigma\) values at \(\mu=0\). As \(\sigma\) increases (i.e., \(\tau\) decreases), the distribution spreads out and becomes more right-skewed.

Mean and Variance

The mean of each observation \(y_i\) in the original scale is given by:

\[\text{E}(y_i) = \exp\left(\mu_i + \frac{1}{2\tau}\right)\]

The variance in the original scale is:

\[\text{Var}(y_i) = \exp\left(2\mu_i + \frac{1}{\tau}\right) \left(\exp\left(\frac{1}{\tau}\right) - 1\right)\]

The location parameter \(\mu_i\) is linked to the linear predictor \(\eta_i\) using the identity link (default):

\[\mu_i = \eta_i, \quad i = 1, 2, \ldots, n\]

In vector form:

\[\pmb{\mu} = \pmb{\eta}\]

Available link functions depend on the model variant:

  • Regression (lognormal): default, identity
  • Survival (lognormalsurv): default, identity

Hyperparameters

The Log-Normal likelihood has one hyperparameter controlling the precision \(\tau\) (inverse of the variance in the log scale). This hyperparameter appears as a log-transformed parameter to ensure it remains within valid (positive) ranges and to improve numerical stability during inference.

Hyperparameter \(\theta\) (precision)

Key: prec

The precision parameter \(\tau\) is represented internally as:

\[\theta = \log(\tau)\]

Thus, \(\theta\) can take any real value, ensuring \(\tau > 0\). The prior is defined on \(\theta\).

Default settings:

  • Name: log precision
  • Initial value: 0
  • Fixed: FALSE
  • Prior: loggamma
  • Parameters: [1, 5e-05]

When translated into control['family']['hyper'], the default entry looks like:

control = {
    'family': {
        'hyper': [{
            'id': 'prec',
            'prior': 'loggamma',
            'param': [1, 5e-05],
            'initial': 0,
            'fixed': False,
        }]
    }
}

Validation Rules

pyINLA enforces several validation rules for Log-Normal models to ensure correct specification:

Not Allowed Arguments

The following arguments are not allowed for lognormal likelihoods and will raise PyINLAError:

  • E (exposure) - Only allowed for poisson/nbinomial

  • scale - Only allowed for gaussian/nbinomial/xbinomial/gamma/beta/logistic/t

  • Ntrials - Only allowed for binomial/xbinomial/betabinomial/nbinomial2

  • control['family']['variant'] - Not supported for lognormal

Allowed Link Functions

Key: control['family']['link']

For lognormal:

  • default

  • identity

For lognormalsurv:

  • default

  • identity

# Using identity link explicitly
result = pyinla(model=model, family="lognormal", data=data,
                control={'family': {'link': 'identity'}})

Hyperparameters

Key: control['family']['hyper']

Hyperparameter configuration IS allowed for lognormal/lognormalsurv to customize the precision prior:

# Custom precision prior
control = {
    'family': {
        'hyper': [{
            'id': 'prec',
            'prior': 'loggamma',
            'param': [1, 0.001],
            'initial': 2,
            'fixed': False,
        }]
    }
}
result = pyinla(model=model, family="lognormal", data=data, control=control)

Response Values

Response variable \(\pmb{y}\) must be strictly positive (\(y_i > 0\)). pyINLA will raise PyINLAError if any response value is zero or negative.

Survival Response

When using lognormalsurv, the response must be created using inla_surv():

from pyinla import inla_surv

# Create survival response (event=1 observed, event=0 right-censored)
y_surv = inla_surv(time=df["time"], event=df["event"])
result = pyinla(model={'response': y_surv, 'fixed': ['1', 'x']},
                family="lognormalsurv", data=df)

inla_surv supports the following censoring types:

  • Right censoring: the event has not occurred by the end of observation; only a lower bound on the survival time is known.
  • Left censoring: the event occurred before observation began; only an upper bound is known.
  • Interval censoring: the event is known to have occurred within a known time interval.

Survival-only Hyperparameters (cure model)

In addition to the precision prec (which both forms share), lognormalsurv exposes 10 cure-model regression coefficients beta1, ..., beta10 (positional aliases theta2, ..., theta11) for stratified cure-model specifications. The default priors are \(\text{Normal}(-4, 100)\) on beta1 (initial \(-7\)) and \(\text{Normal}(0, 100)\) on beta2, ..., beta10 (initial \(0\)). These are only active when a cure structure is configured; most survival users leave them at the defaults and configure only the prec slot.

Note: lognormalsurv's allowed top-level links are default and identity only (the AFT neglog link used by other survival families is not available here).

Specification

  • family="lognormal" for regression models
  • family="lognormalsurv" for survival models
  • Required arguments:
    • For lognormal: \(\pmb{y}\) (response vector of positive values)
    • For lognormalsurv: \(\pmb{y}\) provided via inla_surv()

Model Comparison

FormFamilyResponseLinksHyperparameters
RegressionlognormalPlain yidentity (default)prec
Survivallognormalsurvinla_surv(time, event)identity (default)prec

Regression Example

import pandas as pd
from pyinla import pyinla

# Load data with positive response values
data = pd.read_csv('lognormal_data.csv')

# Define the model
model = {'response': 'y', 'fixed': ['1', 'x']}

# Fit with lognormal family
result = pyinla(
    model=model,
    family='lognormal',
    data=data
)

print(result.summary_fixed)
print(result.summary_hyperpar)

Survival Example

import pandas as pd
from pyinla import pyinla, inla_surv

# Load survival data
data = pd.read_csv('survival_data.csv')

# Create survival response object
y_surv = inla_surv(
    time=data['time'].to_numpy(),
    event=data['event'].to_numpy()
)

# Define the model
model = {'response': y_surv, 'fixed': ['1', 'x']}

# Fit with lognormalsurv family
result = pyinla(
    model=model,
    family='lognormalsurv',
    data=data
)

print(result.summary_fixed)
print(result.summary_hyperpar)

Notes

  • lognormalsurv can be used for right censored, left censored, and interval censored data. A general framework to represent time is given by inla_surv().
  • If the observed times \(y\) are large, this can cause numerical overflow. If you encounter this problem, try scaling the observations, e.g., time = time / max(time).