Parametrization
The Log-Normal distribution for a vector of observations \(\pmb{y} = (y_1, y_2, \ldots, y_n)\) has the following probability density function (PDF):
\[f(y_i \mid \mu_i, \tau) = \frac{\sqrt{\tau}}{y_i \sqrt{2\pi}} \exp\left(-\frac{\tau}{2} (\log(y_i) - \mu_i)^2\right), \quad y_i > 0, \; i = 1, 2, \ldots, n\]
where:
\(\pmb{y} = (y_1, y_2, \ldots, y_n)\) represents the observed positive response values.
\(\pmb{\mu} = (\mu_1, \mu_2, \ldots, \mu_n)\) represents the location parameters (log-mean) for each observation.
\(\tau > 0\) is the precision parameter, controlling the spread of the distribution in the log scale.
Figure 1 illustrates how the Log-Normal PDF changes for different shape parameters (standard deviation \(\sigma = 1/\sqrt{\tau}\) in the log scale), while fixing \(\mu = 0\).
Mean and Variance
The mean of each observation \(y_i\) in the original scale is given by:
\[\text{E}(y_i) = \exp\left(\mu_i + \frac{1}{2\tau}\right)\]
The variance in the original scale is:
\[\text{Var}(y_i) = \exp\left(2\mu_i + \frac{1}{\tau}\right) \left(\exp\left(\frac{1}{\tau}\right) - 1\right)\]
Link Function
The location parameter \(\mu_i\) is linked to the linear predictor \(\eta_i\) using the identity link (default):
\[\mu_i = \eta_i, \quad i = 1, 2, \ldots, n\]
In vector form:
\[\pmb{\mu} = \pmb{\eta}\]
Available link functions depend on the model variant:
- Regression (
lognormal): default, identity - Survival (
lognormalsurv): default, identity
Hyperparameters
The Log-Normal likelihood has one hyperparameter controlling the precision \(\tau\) (inverse of the variance in the log scale). This hyperparameter appears as a log-transformed parameter to ensure it remains within valid (positive) ranges and to improve numerical stability during inference.
Hyperparameter \(\theta\) (precision)
Key: prec
The precision parameter \(\tau\) is represented internally as:
\[\theta = \log(\tau)\]
Thus, \(\theta\) can take any real value, ensuring \(\tau > 0\). The prior is defined on \(\theta\).
Default settings:
- Name: log precision
- Initial value: 0
- Fixed: FALSE
- Prior: loggamma
- Parameters: [1, 5e-05]
When translated into control['family']['hyper'], the default entry looks like:
control = {
'family': {
'hyper': [{
'id': 'prec',
'prior': 'loggamma',
'param': [1, 5e-05],
'initial': 0,
'fixed': False,
}]
}
}
Validation Rules
pyINLA enforces several validation rules for Log-Normal models to ensure correct specification:
Not Allowed Arguments
The following arguments are not allowed for lognormal likelihoods and will raise PyINLAError:
E(exposure) - Only allowed for poisson/nbinomialscale- Only allowed for gaussian/nbinomial/xbinomial/gamma/beta/logistic/tNtrials- Only allowed for binomial/xbinomial/betabinomial/nbinomial2control['family']['variant']- Not supported for lognormal
Allowed Link Functions
Key: control['family']['link']
For lognormal:
defaultidentity
For lognormalsurv:
defaultidentity
# Using identity link explicitly
result = pyinla(model=model, family="lognormal", data=data,
control={'family': {'link': 'identity'}})
Hyperparameters
Key: control['family']['hyper']
Hyperparameter configuration IS allowed for lognormal/lognormalsurv to customize the precision prior:
# Custom precision prior
control = {
'family': {
'hyper': [{
'id': 'prec',
'prior': 'loggamma',
'param': [1, 0.001],
'initial': 2,
'fixed': False,
}]
}
}
result = pyinla(model=model, family="lognormal", data=data, control=control)
Response Values
Response variable \(\pmb{y}\) must be strictly positive (\(y_i > 0\)). pyINLA will raise PyINLAError if any response value is zero or negative.
Survival Response
When using lognormalsurv, the response must be created using inla_surv():
from pyinla import inla_surv
# Create survival response (event=1 observed, event=0 right-censored)
y_surv = inla_surv(time=df["time"], event=df["event"])
result = pyinla(model={'response': y_surv, 'fixed': ['1', 'x']},
family="lognormalsurv", data=df)
inla_surv supports the following censoring types:
- Right censoring: the event has not occurred by the end of observation; only a lower bound on the survival time is known.
- Left censoring: the event occurred before observation began; only an upper bound is known.
- Interval censoring: the event is known to have occurred within a known time interval.
Survival-only Hyperparameters (cure model)
In addition to the precision prec (which both forms share),
lognormalsurv exposes 10 cure-model regression coefficients
beta1, ..., beta10 (positional aliases
theta2, ..., theta11) for stratified cure-model
specifications. The default priors are
\(\text{Normal}(-4, 100)\) on
beta1 (initial \(-7\)) and
\(\text{Normal}(0, 100)\) on
beta2, ..., beta10 (initial \(0\)).
These are only active when a cure structure is configured; most survival
users leave them at the defaults and configure only the prec slot.
Note: lognormalsurv's allowed top-level links are
default and identity only (the AFT neglog
link used by other survival families is not available here).
Specification
family="lognormal"for regression modelsfamily="lognormalsurv"for survival models- Required arguments:
- For
lognormal: \(\pmb{y}\) (response vector of positive values) - For
lognormalsurv: \(\pmb{y}\) provided viainla_surv()
- For
Model Comparison
| Form | Family | Response | Links | Hyperparameters |
|---|---|---|---|---|
| Regression | lognormal | Plain y | identity (default) | prec |
| Survival | lognormalsurv | inla_surv(time, event) | identity (default) | prec |
Regression Example
import pandas as pd
from pyinla import pyinla
# Load data with positive response values
data = pd.read_csv('lognormal_data.csv')
# Define the model
model = {'response': 'y', 'fixed': ['1', 'x']}
# Fit with lognormal family
result = pyinla(
model=model,
family='lognormal',
data=data
)
print(result.summary_fixed)
print(result.summary_hyperpar)
Survival Example
import pandas as pd
from pyinla import pyinla, inla_surv
# Load survival data
data = pd.read_csv('survival_data.csv')
# Create survival response object
y_surv = inla_surv(
time=data['time'].to_numpy(),
event=data['event'].to_numpy()
)
# Define the model
model = {'response': y_surv, 'fixed': ['1', 'x']}
# Fit with lognormalsurv family
result = pyinla(
model=model,
family='lognormalsurv',
data=data
)
print(result.summary_fixed)
print(result.summary_hyperpar)
Notes
lognormalsurvcan be used for right censored, left censored, and interval censored data. A general framework to represent time is given byinla_surv().- If the observed times \(y\) are
large, this can cause numerical overflow. If you encounter this problem,
try scaling the observations, e.g.,
time = time / max(time).