IIDKD Random Effects

Overview

The IIDKD model (IID k-dimensional) extends the simple IID model to handle correlated multivariate random effects. Instead of independent scalar random effects, each subject has a k-dimensional random vector, and the covariance between dimensions is estimated from the data.

Key features:

Multivariate - each subject has \(k\) correlated random effects
Wishart prior - placed on the \(k \times k\) precision matrix
Flexible covariance - all correlations between dimensions are estimated
Cholesky parameterization - uses \(\mathbf{W} = \mathbf{L}\mathbf{L}^\top\) for numerical stability

This model is ideal when you have multiple random effects per subject that may be correlated (e.g., random intercepts and slopes, or multiple traits per individual).

Mathematical Formulation

For \(m\) subjects with \(k\)-dimensional random effects, the model specifies:

\[\mathbf{u}_i \stackrel{\text{iid}}{\sim} \mathcal{N}_k(\mathbf{0}, \mathbf{W}^{-1}), \quad i = 1, \ldots, m\]

where:

\(\mathbf{u}_i = (u_{i1}, \ldots, u_{ik})^\top\) is the \(k\)-dimensional random effect for subject \(i\)
\(\mathbf{W}\) is the \(k \times k\) precision matrix (inverse covariance)
\(\boldsymbol{\Sigma} = \mathbf{W}^{-1}\) is the covariance matrix

Terminology note: throughout this page, subject refers to one block of \(k\) consecutive positions in the latent field, holding that subject's k-dimensional random vector \(\mathbf{u}_i\). The latent field has total length \(n = k \cdot m\) and is laid out dimension-major (positions 1..\(m\) hold dim 1 across all subjects, positions \(m{+}1\)..\(2m\) hold dim 2, etc.). The separate spec key group is an additional axis of replication and is unrelated to subjects here.

Cholesky Parameterization

The precision matrix is parameterized via Cholesky decomposition:

\[\mathbf{W} = \mathbf{L}\mathbf{L}^\top\]

where \(\mathbf{L}\) is a lower triangular matrix with positive diagonal entries. The \(k(k+1)/2\) unique entries of \(\mathbf{L}\) are mapped one-to-one to the hyperparameters \(\boldsymbol{\theta} = (\theta_1, \ldots, \theta_{k(k+1)/2})\):

Diagonal entries (\(k\) of them, indexed by \(\theta_1, \ldots, \theta_k\)): \[L_{jj} = \exp(\theta_j), \quad j = 1, \ldots, k\] The exponential keeps the diagonal strictly positive.
Strictly-lower entries (\(k(k-1)/2\) of them, indexed by \(\theta_{k+1}, \ldots, \theta_{k(k+1)/2}\)): filled in column-major order. That means first the entries below the diagonal of column 1 (top to bottom), then column 2, and so on: \[L_{21}, L_{31}, \ldots, L_{k1}, \; L_{32}, L_{42}, \ldots, L_{k2}, \; \ldots, \; L_{k,k-1}.\]

For example, with \(k = 4\) the off-diagonals receive (in this order): \(L_{21}, L_{31}, L_{41}, L_{32}, L_{42}, L_{43}\).

Full Precision Matrix

The latent field is laid out dimension-major: positions \(1, \ldots, m\) hold \(u_{i1}\) across subjects \(i = 1, \ldots, m\); positions \(m+1, \ldots, 2m\) hold \(u_{i2}\); and so on. Writing the full latent vector this way, \[\mathbf{u} = (u_{11}, u_{21}, \ldots, u_{m1}, \; u_{12}, u_{22}, \ldots, u_{m2}, \; \ldots, \; u_{1k}, \ldots, u_{mk})^\top,\] the full \(n \times n\) precision matrix is the Kronecker product:

\[\mathbf{Q} \;=\; \mathbf{W} \otimes \mathbf{I}_m\]

where \(\mathbf{I}_m\) is the \(m \times m\) identity. Concretely, \(\mathbf{Q}\) is built from \(k \times k\) blocks, each block of size \(m \times m\): the \((j, \ell)\) block is the scalar \(W_{j\ell}\) times \(\mathbf{I}_m\).

Pattern of \(\mathbf{Q}\) as a function of \(\boldsymbol{\theta}\)

Since \(\mathbf{W} = \mathbf{L}\mathbf{L}^\top\) and \(\mathbf{L}\) is built from the \(\boldsymbol{\theta}\) entries, each block of \(\mathbf{Q}\) is a closed-form expression in \(\boldsymbol{\theta}\). The general entry is

\[W_{j\ell} \;=\; \sum_{p \,\leq\, \min(j,\ell)} L_{jp}\, L_{\ell p},\]

using the column-major \(\boldsymbol{\theta}\) layout established above. Worked out explicitly:

Case \(k = 2\) (three hyperparameters: \(\theta_1, \theta_2, \theta_3\)):

\[\mathbf{L} \;=\; \begin{pmatrix} e^{\theta_1} & 0 \\ \theta_3 & e^{\theta_2} \end{pmatrix}, \qquad \mathbf{W} \;=\; \mathbf{L}\mathbf{L}^\top \;=\; \begin{pmatrix} e^{2\theta_1} & \theta_3\, e^{\theta_1} \\ \theta_3\, e^{\theta_1} & \theta_3^2 + e^{2\theta_2} \end{pmatrix},\]

\[\mathbf{Q} \;=\; \begin{pmatrix} e^{2\theta_1}\,\mathbf{I}_m & \theta_3\,e^{\theta_1}\,\mathbf{I}_m \\ \theta_3\,e^{\theta_1}\,\mathbf{I}_m & (\theta_3^2 + e^{2\theta_2})\,\mathbf{I}_m \end{pmatrix}.\]

Case \(k = 3\) (six hyperparameters: \(\theta_1, \ldots, \theta_6\)):

\[\mathbf{L} \;=\; \begin{pmatrix} e^{\theta_1} & 0 & 0 \\ \theta_4 & e^{\theta_2} & 0 \\ \theta_5 & \theta_6 & e^{\theta_3} \end{pmatrix},\]

\[\mathbf{W} \;=\; \mathbf{L}\mathbf{L}^\top \;=\; \begin{pmatrix} e^{2\theta_1} & \theta_4\, e^{\theta_1} & \theta_5\, e^{\theta_1} \\ \theta_4\, e^{\theta_1} & \theta_4^2 + e^{2\theta_2} & \theta_4\theta_5 + \theta_6\, e^{\theta_2} \\ \theta_5\, e^{\theta_1} & \theta_4\theta_5 + \theta_6\, e^{\theta_2} & \theta_5^2 + \theta_6^2 + e^{2\theta_3} \end{pmatrix},\]

\[\mathbf{Q} \;=\; \begin{pmatrix} e^{2\theta_1}\,\mathbf{I}_m & \theta_4\,e^{\theta_1}\,\mathbf{I}_m & \theta_5\,e^{\theta_1}\,\mathbf{I}_m \\ \theta_4\,e^{\theta_1}\,\mathbf{I}_m & (\theta_4^2 + e^{2\theta_2})\,\mathbf{I}_m & (\theta_4\theta_5 + \theta_6\,e^{\theta_2})\,\mathbf{I}_m \\ \theta_5\,e^{\theta_1}\,\mathbf{I}_m & (\theta_4\theta_5 + \theta_6\,e^{\theta_2})\,\mathbf{I}_m & (\theta_5^2 + \theta_6^2 + e^{2\theta_3})\,\mathbf{I}_m \end{pmatrix}.\]

So \(\mathbf{Q}\) is sparse with a block-of-scaled-identities structure: within each \(m \times m\) block, only the diagonal is nonzero (equal to the block's \(\boldsymbol{\theta}\)-expression on each of its \(m\) positions), and off-diagonal positions inside each block are zero. The cross-subject independence inside one dimension is encoded by the \(\mathbf{I}_m\); the cross-dimension correlation for the same subject is encoded by the \(W_{j\ell}(\boldsymbol{\theta})\) values across blocks.

Fully expanded example: \(k = 2\), \(m = 2\)

With \(k = 2\) dimensions and \(m = 2\) subjects, the latent field has length \(n = k m = 4\), laid out as

\[\mathbf{u} \;=\; (u_{11},\; u_{21},\; u_{12},\; u_{22})^\top.\]

The precision matrix \(\mathbf{Q} = \mathbf{W} \otimes \mathbf{I}_2\) is \(4 \times 4\), with the four \(W_{j\ell}(\boldsymbol{\theta})\) entries each multiplied by the \(2 \times 2\) identity. Writing it out fully:

\[\mathbf{Q} \;=\; \begin{pmatrix} e^{2\theta_1} & 0 & \theta_3\, e^{\theta_1} & 0 \\ 0 & e^{2\theta_1} & 0 & \theta_3\, e^{\theta_1} \\ \theta_3\, e^{\theta_1} & 0 & \theta_3^2 + e^{2\theta_2} & 0 \\ 0 & \theta_3\, e^{\theta_1} & 0 & \theta_3^2 + e^{2\theta_2} \end{pmatrix}.\]

Nonzero entries lie on the main diagonal (positions \((1,1), (2,2), (3,3), (4,4)\)) and on the \(\pm 2\) off-diagonals (positions \((1,3), (3,1), (2,4), (4,2)\)); every other entry is zero. Each \(W_{j\ell}\) appears exactly \(m = 2\) times (once for each subject), and no two subjects are coupled inside the same dimension.

Layout note: the order of the Kronecker product depends on the stacking. With dimension-major (used here), \(\mathbf{Q} = \mathbf{W} \otimes \mathbf{I}_m\); a subject-major stacking would instead give \(\mathbf{I}_m \otimes \mathbf{W}\).

Number of Hyperparameters

The number of hyperparameters depends on the dimension \(k\):

\[\text{Number of hyperparameters} = \frac{k(k+1)}{2}\]

Dimension \(k\)	Hyperparameters	Use Case
2	3	Random intercept + slope
3	6	Three correlated traits
4	10	Four correlated outcomes
5	15	Five-dimensional effects

Computational Consideration

The number of hyperparameters grows quadratically with \(k\). For large \(k\), this can slow down inference, and for higher-dimensional cases the recommended integration strategy is empirical Bayes:

control = {'inla': {'int.strategy': 'eb'}}

The model supports dimensions from 2 to 24.

Wishart Prior

The precision matrix \(\mathbf{W}\) is assigned a Wishart prior:

\[\mathbf{W} \;\sim\; \text{Wishart}_k(r, \mathbf{R}^{-1})\]

with density

\[\pi(\mathbf{W}) \;=\; c^{-1}\, |\mathbf{W}|^{(r-k-1)/2}\, \exp\!\left\{-\tfrac{1}{2}\,\text{tr}(\mathbf{W}\mathbf{R})\right\}, \qquad r > k + 1,\]

and normalising constant

\[c \;=\; 2^{rk/2}\, |\mathbf{R}|^{-r/2}\, \pi^{k(k-1)/4}\, \prod_{j=1}^{k}\Gamma\!\left(\tfrac{r+1-j}{2}\right).\]

Useful moments:

\[\mathbb{E}(\mathbf{W}) \;=\; r\,\mathbf{R}^{-1}, \qquad \mathbb{E}(\mathbf{W}^{-1}) \;=\; \mathbf{R}/(r - k - 1).\]

Hyperparameters of the Prior

The prior is parameterised by \(1 + k(k+1)/2\) numbers:

\[(r,\; R_1, R_2, \ldots, R_k,\; R_{k+1}, \ldots, R_{k(k+1)/2})\]

where \(R_1, \ldots, R_k\) are the diagonal entries of \(\mathbf{R}\) and the remaining \(R_{k+1}, \ldots, R_{k(k+1)/2}\) fill the strictly-lower entries of \(\mathbf{R}\) (\(\mathbf{R}\) is symmetric, so the upper triangle is implied).

The default prior uses a fixed degrees-of-freedom \(r = 30\) (independent of \(k\)) and a scale matrix \(\mathbf{R} = \mathbf{I}_k\) (identity). This gives a weakly informative joint prior on the entries of \(\mathbf{W}\): the data dominates the posterior unless the dataset is very small. Verified by inspecting the parameters line that the engine actually reads.

Induced prior on \(\boldsymbol{\theta}\)

The C engine works with the \(\boldsymbol{\theta}\) hyperparameters, not directly with \(\mathbf{W}\). The Wishart prior on \(\mathbf{W}\) above induces a prior on \(\boldsymbol{\theta}\) by the change-of-variables \(\mathbf{W}(\boldsymbol{\theta}) = \mathbf{L}(\boldsymbol{\theta})\,\mathbf{L}(\boldsymbol{\theta})^\top\):

\[\pi_{\boldsymbol{\theta}}(\boldsymbol{\theta}) \;=\; \pi(\mathbf{W}(\boldsymbol{\theta})) \cdot \bigl|\det \mathbf{J}(\boldsymbol{\theta})\bigr|\]

where \(\mathbf{J}\) is the Jacobian of the map \(\boldsymbol{\theta} \mapsto \mathbf{W}(\boldsymbol{\theta})\).

Closed form under the default \(\mathbf{R} = \mathbf{I}_k\)

When the scale matrix is the identity (the default), Bartlett's decomposition gives an exceptionally clean form: the Cholesky entries are independent, with

Diagonal entries: \[L_{jj}^2 \;\sim\; \chi^2_{\,r - j + 1}, \qquad j = 1, \ldots, k.\] Since \(L_{jj} = e^{\theta_j}\), this translates to a log-chi-square prior on \(\theta_j\): equivalently, \(2\theta_j \sim \log \chi^2_{r-j+1}\), which is a log-gamma distribution with shape \((r-j+1)/2\) and rate \(1/2\). The mode is roughly \(\theta_j \approx \tfrac{1}{2}\log(r - j - 1)\).
Off-diagonal entries: \[L_{ij} \;\sim\; \mathcal{N}(0, 1), \qquad i > j,\] independently. Since the off-diagonal \(\theta\) entries equal the corresponding \(L_{ij}\) directly (no transform), this is the same as \[\theta_{k+1}, \theta_{k+2}, \ldots, \theta_{k(k+1)/2} \;\stackrel{\text{iid}}{\sim}\; \mathcal{N}(0, 1).\]

So under the default, the \(\boldsymbol{\theta}\) prior factorises into: \(k\) log-gamma marginals on the diagonal entries and \(k(k-1)/2\) independent standard normals on the off-diagonal entries. With the default \(r = 30\) and small \(k\), the diagonal priors centre \(\theta_j\) around roughly \(\tfrac{1}{2}\log(28) \approx 1.66\) (so \(L_{jj} \approx 5\) on the mode, giving precisions of order tens). This is a moderately informative reference scale that the data can override.

When \(\mathbf{R} \neq \mathbf{I}_k\)

For a general scale matrix, the closed form above does not hold: \(\boldsymbol{\theta}\) entries become dependent under the prior. The full density still factors as \(\pi(\mathbf{W}(\boldsymbol{\theta})) \cdot |\det \mathbf{J}|\) from the Wishart density, but evaluating it requires substituting the \(\boldsymbol{\theta}\)-parameterised \(\mathbf{W}\) into the matrix expression. The C engine does this computation internally; users typically pick the prior at the \(\mathbf{W}\) level (by setting \(r\) and \(\mathbf{R}\)) and let the engine handle the change of variables.

When to Use IIDKD

iidkd is a building block: it gives you a \(k\)-dimensional correlated random vector per subject with a Wishart-prior precision matrix. Turning that vector into a specific model (intercept-slope, growth curve, multivariate outcomes, …) usually requires additional wiring around the iidkd block: covariate weighting in the linear predictor, an A-matrix, copies of the field via the copy feature-key, or a multi-likelihood setup. The four motivations below describe what iidkd is useful for; the linking wiring is model-specific.

Random Intercepts and Slopes

When both intercept and slope vary by subject and may be correlated. Set \(k=2\) for a 2-dim random vector \((u_{i1}, u_{i2})\) per subject; you then wire \(u_{i1}\) as the intercept and weight \(u_{i2}\) by the slope-covariate via an A-matrix or weighted index.

Multivariate Outcomes

Multiple correlated outcomes measured on the same subjects. Each outcome consumes one dimension of \(\mathbf{u}_i\); typically combined with a multi-likelihood model and the copy feature-key to route the right dimension to the right outcome.

Growth Curve Models

Polynomial individual trajectories. Set \(k = 3\) for (intercept, linear, quadratic); each dimension of \(\mathbf{u}_i\) is weighted by 1, \(t\), \(t^2\) respectively in the linear predictor.

Correlated Traits

Multiple traits measured per individual where genetic or environmental correlations are expected. Each trait maps directly to one dimension of \(\mathbf{u}_i\); no covariate weighting needed when traits are responses (often combined with a multi-likelihood when traits have different distributions).

Specification in pyINLA

Allowed Parameters

The IIDKD random-effect spec dict accepts the following keys. The source of truth is model_defs/latent/iidkd.py's ALLOWED_KEYS set, mirrored 1:1 here.

Parameter	Required	Description
Core (model-specific)
`id`	Yes	Column name containing position indices (integers 1 to \(n = k \times m\)).
`model`	Yes	Must be `'iidkd'`.
`order`	Yes	Dimension \(k\) of the per-subject random vector (integer, 2 to 24).
`n`	Yes	Total latent length = \(k \times m\) where \(m\) is the number of subjects.
`hyper`	No	List of up to \(k(k+1)/2\) dicts overriding the Wishart prior on the \(\boldsymbol{\theta}\) hyperparameters. Default: weakly informative Wishart with \(r = 30\), \(\mathbf{R} = \mathbf{I}_k\).
`constr`	No	If `True`, apply a sum-to-zero constraint per dimension: \(k\) constraints total, one per dim. Default: `False`.
`diagonal`	No	Extra constant added to the precision diagonal for numerical stability (default: `0.0`).
Cross-model common keys
`weights`	No	Per-observation weighting vector applied to the random-effect contribution.

Index Structure

The index column must contain integers from 1 to \(n = k \times m\). The indexing follows dimension-major order: indices 1 to \(m\) correspond to dimension 1, indices \(m+1\) to \(2m\) correspond to dimension 2, and so on.

Missing values (NaN)

pyINLA accepts np.nan / NA in two places without error:

NaN in the index column (id): that observation contributes nothing to the iidkd component. The covariates file records a -1 sentinel for those rows; the rest of the latent field is unaffected. Combines cleanly with constr, diagonal, and weights.
NaN in the weights vector: also accepted. The affected rows contribute no weighted iidkd effect.

See the nan_in_idx_* and nan_in_weights_k2 cases in the iidkd parity suite for the verified combinations.

Per-Parameter Runnable Examples

Click any parameter to expand a minimal runnable snippet. All snippets assume the shared setup block below (\(k = 2\) dimensions, \(m = 30\) subjects, \(n = 60\) latent positions). Replace \(k\) and \(m\) as needed; the only constraint is \(2 \le k \le 24\).

# shared setup for every snippet below
import numpy as np
import pandas as pd
from pyinla import pyinla

rng = np.random.default_rng(42)
k    = 2            # dimensions per subject (2..24)
m    = 30           # subjects
n    = k * m        # total latent positions (= 60)
nobs = 120          # observations (typically more than n)
df = pd.DataFrame({
    'y':   rng.standard_normal(nobs),
    'x':   rng.standard_normal(nobs),
    'idx': rng.integers(1, n + 1, size=nobs),
})

id + model + order + n required four-tuple (minimal call)

# Bare iidkd: 'id' is the column with positions 1..n,
# 'order' = k, 'n' = k * m. All four are required.
formula = {
    'response': 'y',
    'random': [{'id': 'idx', 'model': 'iidkd',
                 'order': k, 'n': n}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

hyper override the Wishart prior

# The first hyper slot (theta1) carries the wishartkd prior. Its
# parameter vector is (r, R_1, ..., R_k, R_{k+1}, ..., R_{k(k+1)/2}):
#   r        = degrees of freedom
#   R_1..R_k = diagonal entries of the scale matrix R
#   rest     = strictly-lower entries of R
# For k=2: (r, R_11, R_22, R_21). Below: tighter prior with r=50, R=I.
formula = {
    'response': 'y',
    'random': [{'id': 'idx', 'model': 'iidkd',
                 'order': k, 'n': n,
                 'hyper': [{'prior': 'wishartkd',
                            'param': [50, 1.0, 1.0, 0.0]}]}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

constr sum-to-zero per dimension

# constr=True produces k sum-to-zero constraints, one per dimension:
# sum_i u_{i1} = 0, ..., sum_i u_{ik} = 0. Useful to remove the
# identifiability tradeoff between the per-dimension intercepts and
# fixed effects in the linear predictor.
formula = {
    'response': 'y',
    'random': [{'id': 'idx', 'model': 'iidkd',
                 'order': k, 'n': n,
                 'constr': True}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

diagonal extra constant on the precision diagonal

# Small jitter for numerical stability (rarely needed for iidkd
# since the Wishart prior already keeps W positive-definite).
formula = {
    'response': 'y',
    'random': [{'id': 'idx', 'model': 'iidkd',
                 'order': k, 'n': n,
                 'diagonal': 1e-4}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

weights per-observation weighting

# Per-observation weighting vector applied to the random-effect
# contribution. Length must equal the number of observations.
w = rng.uniform(0.5, 1.5, size=nobs)
formula = {
    'response': 'y',
    'random': [{'id': 'idx', 'model': 'iidkd',
                 'order': k, 'n': n,
                 'weights': w}],
}
result = pyinla(formula=formula, family='gaussian', data=df)

Hyperparameter Ordering

The hyperparameters map to entries of the lower triangular Cholesky factor \(\mathbf{L}\) in two passes: the first \(k\) entries are the log-diagonal (\(\theta_j = \log L_{jj}\)), and the remaining \(k(k-1)/2\) are the strictly-lower entries in column-major order.

For \(k=2\) (3 hyperparameters)

Index	Element	Transform	Description
\(\theta_1\)	\(L_{11}\)	\(\exp(\theta)\)	Scale for dimension 1
\(\theta_2\)	\(L_{22}\)	\(\exp(\theta)\)	Scale for dimension 2
\(\theta_3\)	\(L_{21}\)	identity	Strictly-lower entry, column 1

For \(k=3\) (6 hyperparameters)

Index	Element	Transform	Description
\(\theta_1\)	\(L_{11}\)	\(\exp(\theta)\)	Scale for dimension 1
\(\theta_2\)	\(L_{22}\)	\(\exp(\theta)\)	Scale for dimension 2
\(\theta_3\)	\(L_{33}\)	\(\exp(\theta)\)	Scale for dimension 3
\(\theta_4\)	\(L_{21}\)	identity	Strictly-lower entry, column 1
\(\theta_5\)	\(L_{31}\)	identity	Strictly-lower entry, column 1
\(\theta_6\)	\(L_{32}\)	identity	Strictly-lower entry, column 2

Interpreting Results

Hyperparameter Summary

The hyperparameters are reported on the internal scale (log for diagonal, identity for off-diagonal). To reconstruct the precision matrix:

import numpy as np

# For k=2, extract 3 hyperparameter means
# Ordering (see Hyperparameter Ordering table):
#   theta[0] = log(L_11)  theta[1] = log(L_22)  theta[2] = L_21
theta = result.summary_hyperpar['mean'].values

# Reconstruct L matrix
L = np.array([
    [np.exp(theta[0]), 0],
    [theta[2], np.exp(theta[1])]
])

# Precision matrix
W = L @ L.T

# Covariance matrix
Sigma = np.linalg.inv(W)

print("Precision matrix W:")
print(W)
print("Covariance matrix Sigma:")
print(Sigma)
print("Correlation:", Sigma[0,1] / np.sqrt(Sigma[0,0] * Sigma[1,1]))

Random Effect Estimates

Random effects are stored in dimension-major order. For \(k=2\) and \(m=30\):

Indices 1-30: Random effects for dimension 1 (e.g., intercepts)
Indices 31-60: Random effects for dimension 2 (e.g., slopes)

# Get random effects
re = result.summary_random['idx']

# Split by dimension
m = 30
intercepts = re.iloc[:m]  # dimension 1
slopes = re.iloc[m:]       # dimension 2

IID - Simple 1-dimensional version (k=1)
Z Model - Alternative for correlated effects via design matrix
Generic0 - Custom precision structure when Wishart prior is not appropriate