Restricted page

Enter password to continue

This visual reference is currently private. Enter the access password to view it.

Incorrect password. Try again.

← Back to Random Effects
Random Effects

Anatomy: how does u attach to y?

For each of eight common spec patterns, this page writes out 10 observations (i = 1, ..., 10) and shows exactly which latent value u... picks up onto each yi. Observations sharing a color share the same latent value.

CASE 1 OF 10

IID, one latent value per observation

Spec: {'id': 'row_id', 'model': 'iid'} with 10 distinct row_id values

1 hyperparamlog τ(one precision for all 10 latent values)
yi = μ + ui,   ui iid~ N(0, τ−1),   i = 1..10

One latent value per observation. Classic use: absorbing overdispersion. n_levels = n_obs = 10.

i=1y1=μ+u1
i=2y2=μ+u2
i=3y3=μ+u3
i=4y4=μ+u4
i=5y5=μ+u5
i=6y6=μ+u6
i=7y7=μ+u7
i=8y8=μ+u8
i=9y9=μ+u9
i=10y10=μ+u10
CASE 2 OF 10

IID, grouped (the hierarchical case)

Spec: {'id': 'school_id', 'model': 'iid'} with 4 schools, 10 students

1 hyperparamlog τ(one precision for the 4 school effects)
yi = μ + ug(i),   uk iid~ N(0, τ−1),   k = 1..4

Many observations share the same latent value (students in schools). n_levels (4) is smaller than n_obs (10), so the same u appears on several rows.

i=1y1=μ+u1
i=2y2=μ+u1
i=3y3=μ+u1
i=4y4=μ+u2
i=5y5=μ+u2
i=6y6=μ+u2
i=7y7=μ+u3
i=8y8=μ+u3
i=9y9=μ+u4
i=10y10=μ+u4
CASE 3 OF 10

IID, replicated (nrep independent copies of the same iid block)

Spec: {'id': 'level', 'model': 'iid', 'replicate': rep, 'nrep': 2}

1 hyperparamlog τ(shared across both replicates)
yi = μ + u(i), r(i),   uk,r iid~ N(0, τ−1),   k = 1..5, r = 1..2

The whole iid block is repeated nrep times, independently. Each observation carries two integer indices: its level (1..5) and which replicate it belongs to (1..2). Together they pick out one of 5 × 2 = 10 latent values. The two replicate blocks share one precision τ (estimated jointly), but the realized u-draws across reps are uncorrelated: same level in different reps gives a different u.

The two index vectors (each length 10)
i 12345 678910
level 12345 12345
rep 11111 22222
Both vectors have length 10. level has 5 unique values (each appearing twice); rep has 2 unique values. Together they index 5 × 2 = 10 independent latent values.
rep = 1
i=1y1=μ+u1,1
i=2y2=μ+u2,1
i=3y3=μ+u3,1
i=4y4=μ+u4,1
i=5y5=μ+u5,1
rep = 2
i=6y6=μ+u1,2
i=7y7=μ+u2,2
i=8y8=μ+u3,2
i=9y9=μ+u4,2
i=10y10=μ+u5,2

Why colors split into two families? Cool colors (indigo → teal) mark rep = 1; warm colors (pink → amber) mark rep = 2. Observations 1 and 6 both have level = 1 but their badges have different colors, because they pick up different latent values (u1,1 vs u1,2).

CASE 4 OF 10

IID + control.group (correlation across a second axis)

Spec: {'id': 'region', 'model': 'iid', 'group': week, 'ngroup': 2, 'control.group': {'model': 'ar1'}}

2 hyperparamslog τ  +  AR1 ρ(precision shared across regions; second hyperparam is the across-group AR1 correlation)
yi = μ + u(i), g(i)
within group gu·,g iid~ N(0, τ−1)  |  across g:  corr(uk,1, uk,2) = ρ (AR1)

Same 2D layout as case 3 (5 regions × 2 weeks = 10 latent values), but now the second axis carries correlation, not independent copies. Within each week the 5 regions are still iid. Between the two weeks, the matching latent values are tied together: uk,1 and uk,2 share a one-step AR1 correlation ρ. So obs 1 (region 1, week 1) and obs 6 (region 1, week 2) are not the same value, but they are correlated: knowing one tells you something about the other.

The two index vectors (each length 10)
i 12345 678910
region 12345 12345
week 11111 22222
Same shape as case 3 (10 obs, 5 regions, 2 weeks, 5 × 2 = 10 latent values). The difference is the control.group dict: setting 'model': 'ar1' turns the across-group axis from iid into AR1.
g = 1 (week 1)
i=1y1=μ+u1,1
i=2y2=μ+u2,1
i=3y3=μ+u3,1
i=4y4=μ+u4,1
i=5y5=μ+u5,1
g = 2 (week 2), AR1-correlated with g = 1
i=6y6=μ+u1,2
i=7y7=μ+u2,2
i=8y8=μ+u3,2
i=9y9=μ+u4,2
i=10y10=μ+u5,2

Why colors come in light/dark pairs? Each region gets one hue. The saturated shade marks week 1, the lighter shade marks week 2. Same hue says "same region, AR1-correlated"; different shade says "different week so different draw." Contrast with case 3, where rep 1 and rep 2 use unrelated cool / warm families because they are independent.

case 3 vs case 4: identical data layout, identical level and group vectors, identical 5 × 2 = 10 latent values. The only difference is the control.group dict (absent in case 3, set to {'model': 'ar1'} here). That single key swaps independence for correlation along the second axis.
CASE 5 OF 10

RW1 (ordered, neighbors are correlated)

Spec: {'id': 'time', 'model': 'rw1'} with 10 ordered time points

1 hyperparamlog τ(one precision for the RW1 increments)
yi = μ + ui,   uiui−1 iid~ N(0, τ−1)

One latent value per ordered position, like IID, but neighbors are tied together by an increment penalty. Colors shade smoothly to remind you of the neighbor coupling.

i=1y1=μ+u1
i=2y2=μ+u2
i=3y3=μ+u3
i=4y4=μ+u4
i=5y5=μ+u5
i=6y6=μ+u6
i=7y7=μ+u7
i=8y8=μ+u8
i=9y9=μ+u9
i=10y10=μ+u10
CASE 6 OF 10

The group key on its own (what does it actually do?)

Spec: {'id': 'region', 'model': 'iid', 'group': week, 'ngroup': 2}   (no control.group set)

1 hyperparamlog τ(default control.group is iid, so no second hyperparam)
yi = μ + u(i), g(i)
within group gu·,g iid~ N(0, τ−1)  |  across gindependent (default control.group is iid)

group introduces a second indexing axis on top of id. By itself, with no control.group dict, the across-group axis defaults to iid: all 5 × 2 = 10 latent values are independent draws. Mathematically, this case is identical to case 3 (replicate). The point of group is not what it does on its own; it is what it lets you do next: add a control.group dict to make the across-group axis correlated (case 4).

The two index vectors (each length 10)
i 12345 678910
region 12345 12345
week 11111 22222
Same shape as case 3 and case 4 (10 obs, 5 regions, 2 weeks, 5 × 2 = 10 latent values). What changes between the three cases is only the spec dict.
g = 1 (week 1)
i=1y1=μ+u1,1
i=2y2=μ+u2,1
i=3y3=μ+u3,1
i=4y4=μ+u4,1
i=5y5=μ+u5,1
g = 2 (week 2), independent of g = 1 (default iid)
i=6y6=μ+u1,2
i=7y7=μ+u2,2
i=8y8=μ+u3,2
i=9y9=μ+u4,2
i=10y10=μ+u5,2

Why colors split into two families (cool / warm)? Same scheme as case 3: cool for g = 1, warm for g = 2. Independent groups get unrelated hue families. Compare with case 4, where the same hue (saturated vs light) marks the AR1 tie between matched regions.

The three sister cases at a glance:
  • Case 3 (replicate + nrep): two independent copies. Posterior output is indexed by (region, rep).
  • Case 6 (group + ngroup, no control.group): mathematically the same as case 3. The keyword choice mostly affects output labeling and what extensions you can add later.
  • Case 4 (case 6 + 'control.group': {'model': 'ar1'}): adds AR1 correlation across groups. This is what group exists for.
CASE 7 OF 10

RW1 cyclic (neighbors wrap around: position n ties back to position 1)

Spec: {'id': 'hour', 'model': 'rw1', 'cyclic': True}

1 hyperparamlog τ(same as RW1; cyclic adds an edge to the graph, not a hyperparameter)
yi = μ + ui
uiui−1 iid~ N(0, τ−1) for i = 2..n  |  plus  u1un iid~ N(0, τ−1)

Same setup as case 5 (linear RW1): one latent value per ordered position, smooth-with-neighbors prior. The only difference is the graph topology: setting cyclic: True adds one extra increment u1un to the prior, so the start and end of the index are tied together. Use it when the index wraps naturally (hour-of-day, day-of-week, month-of-year, any angular variable).

Neighbor graph

12345678910wrap edge: 10 ↔ 1

Position 10 wraps to position 1 (pink dashed edge).

i=1y1=μ+u1
i=2y2=μ+u2
i=3y3=μ+u3
i=4y4=μ+u4
i=5y5=μ+u5
i=6y6=μ+u6
i=7y7=μ+u7
i=8y8=μ+u8
i=9y9=μ+u9
i=10y10=μ+u10

Precision matrix Q = τR: linear vs cyclic

The structure matrix R for cyclic RW1 is a circulant tridiagonal: 2 on every diagonal entry, −1 on every nearest-neighbor off-diagonal, plus −1 in the two corners for the wrap edge between positions 1 and n. Compared with linear RW1, the only differences are the four highlighted cells: the boundary diagonals jump from 1 to 2 (positions 1 and n now have two neighbors each), and the two corners pick up −1 entries (the wrap edge).

Linear RW1 (cyclic = False): pure tridiagonal, rank n−1

12345678910123456789101−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−11

Diagonal: 1, 2, 2, ..., 2, 2, 1. Off-diagonal: −1. Corners: 0.

Cyclic RW1: circulant tridiagonal, rank still n−1

12345678910123456789102−10000000−1−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−100000000−12−1−10000000−12

Diagonal: 2, 2, ..., 2. Off-diagonal: −1. Corners: −1 (dashed, wrap edge).

Summary of changes (4 cells out of 100):
  • R[1,1] and R[n,n]: 1 → 2 (endpoints gain a neighbor)
  • R[1,n] and R[n,1]: 0 → −1 (the wrap edge)
  • Every other cell is unchanged.
Does the wrap edge remove the need for a constraint? No. The null space of R is still the constant vector (1, 1, ..., 1): adding the same number to every ui leaves every increment (including the wrap one) unchanged. Rank of R stays at n − 1. Cyclic RW1 needs a sum-to-zero constraint just like linear RW1, and constr=True is on by default in pyINLA. What the wrap edge does remove is the freedom for the start and end of the path to drift apart; what it does not remove is the freedom to shift the whole curve up or down.

Why the colors form a wheel? The 10 badges trace the hue circle in steps of 36°. Position 10 (purple-red) sits visually next to position 1 (red) just like its prior neighbor relationship. Compare case 5 where colors run from teal to violet along a line: in linear RW1 position 10 has no special tie to position 1.

Linear (case 5) vs cyclic (case 7):
  • Both have 1 hyperparameter (log τ), both have rank n − 1 (one null direction: the overall level), both default to constr=True.
  • Linear RW1 has n−1 = 9 neighbor pairs; cyclic RW1 has n = 10 (one extra wrap edge).
  • Cyclic samples are tied at the ends; linear samples are free to drift.
CASE 8 OF 10

IID with weights (per-observation design scaling)

Spec: {'id': 'group_id', 'model': 'iid', 'weights': w}   (w is a 1-D numpy array of length n_obs)

1 hyperparamlog τ(weights are fixed data, not parameters: they do not add a hyperparameter)
yi = μ + wi · ug(i)
uk iid~ N(0, τ−1), k = 1..3  |  wi are fixed inputs (not estimated)

weights scales the random-effect contribution per observation: wi multiplies ug(i) in the linear predictor. The latent vector is the same as case 2 (one u per group). What changes is the design matrix: instead of putting a 1 wherever observation i picks up group g, it puts wi. So obs sharing a group keep the same colour (same u), but the badge label carries the weight factor.

This is not a likelihood weight. The values wi here do not change how much each y contributes to the log-likelihood; they multiply the random effect on the linear-predictor side. If you want likelihood weights (importance / inverse-probability weighting), set the family-level weights argument instead.
Index + weight vectors (each length 10)
i12345678910
group_id1112223333
weights1.02.00.51.51.02.01.00.51.52.0
3 unique groups (3, 3, 4 observations each), so the iid block still has just 3 latent values u1, u2, u3. The weights are not estimated: they enter the design matrix directly.
i=1y1=μ+1.0 · u1
i=2y2=μ+2.0 · u1
i=3y3=μ+0.5 · u1
i=4y4=μ+1.5 · u2
i=5y5=μ+1.0 · u2
i=6y6=μ+2.0 · u2
i=7y7=μ+1.0 · u3
i=8y8=μ+0.5 · u3
i=9y9=μ+1.5 · u3
i=10y10=μ+2.0 · u3

Reading the badges. Colour = group (indigo / green / amber). The "w · ug" label inside the badge is the literal contribution to that row's ηi. Rows 1, 2, 3 all share u1 but multiply it by 1.0, 2.0, and 0.5 respectively. Same u, different pull.

Three practical uses for weights:
  • Exposure-style scaling on an RE: a group's effect is multiplied per row by population, time at risk, or area.
  • Linear slope inside an f() block: set id = 1 (one level), weights = x_i. Then u1 plays the role of a slope (ηi += xi · u1).
  • Custom design contrasts: fractional or signed shares of a group's u (positive and negative weights are allowed).
CASE 9 OF 10

IID with values (reserve slots for unobserved levels)

Spec: {'id': 'school_id', 'model': 'iid', 'values': [1, 2, 3, 4, 5]}   (10 students across 4 schools; school 3 has no data yet)

1 hyperparamlog τ(values is a fixed list, not a parameter; declares 5 latent slots regardless of what is observed)
yi = μ + uschool_id(i)
uk iid~ N(0, τ−1),  k = 1..5  |  level set fixed by values, not inferred from the data

values declares the complete set of allowed levels for the id column. Without it, pyINLA infers the level set from sort(unique(id)) in the data, so the latent vector has one entry per observed level. With values, the latent vector has one entry per declared level, even those that never appear in the data. Those extra slots are governed purely by the prior; there are no yi rows that would update them.

What the data has, vs what values declares
i 12345678910
school_id 1112244555
values declares: [1, 2, 3, 4, 5]
The data has 4 distinct schools (1, 2, 4, 5). values declares 5 levels. School 3 is declared but no observation refers to it.
i=1y1=μ+u1school 1
i=2y2=μ+u1school 1
i=3y3=μ+u1school 1
i=4y4=μ+u2school 2
i=5y5=μ+u2school 2
i=6y6=μ+u4school 4
i=7y7=μ+u4school 4
i=8y8=μ+u5school 5
i=9y9=μ+u5school 5
i=10y10=μ+u5school 5
The full latent vector (length 5, set by values)
u1school 1data-informed
u2school 2data-informed
u3no dataprior only
u4school 4data-informed
u5school 5data-informed
Why reserve slots for unobserved levels?
  • Prediction at unseen groups: you want a posterior for school 3 even though you have no rows for it yet. With values, the posterior is just the prior u3 ∼ N(0, τ−1); τ is still informed by the observed schools.
  • Stable indexing across runs: same values list across fits guarantees the same k means the same school in every posterior table.
  • Document the level set in the spec: makes the model self-describing instead of depending on whatever happens to be in df['school_id'] today.
CASE 10 OF 10

Seasonal (period m: windowed sums of length m are iid)

Spec: {'id': 'time', 'model': 'seasonal', 'season.length': 5}   (10 ordered time points = 2 full cycles)

1 hyperparamlog τ(precision of each windowed sum; season.length is fixed metadata, not a parameter)
yi = μ + ui
ui + ui+1 + … + ui+m−1 iid~ N(0, τ−1),  i = 1, …, nm + 1

The seasonal model puts the prior on sliding sums of length m, not on neighbor differences. Every consecutive window of m latent values is constrained to look like a small Gaussian. Patterns that repeat with period m (so each cycle of length m roughly averages to zero) sit comfortably under this prior; non-periodic drift accumulates inside the windows and is penalized. The model is intrinsic: it has rank deficiency m − 1, identifiable once you have an intercept or other fixed effects.

The single index vector (length 10), period m = 5
i 12345 678910
time 12345 678910
position 12345 12345
10 ordered observations span 2 full cycles of length 5. Position 1 of cycle 1 (i=1) and position 1 of cycle 2 (i=6) share a hue but are not forced equal: the seasonal model constrains sums, not individual matches.
cycle 1 (i = 1..5)
i=1y1=μ+u1pos 1, cycle 1
i=2y2=μ+u2pos 2, cycle 1
i=3y3=μ+u3pos 3, cycle 1
i=4y4=μ+u4pos 4, cycle 1
i=5y5=μ+u5pos 5, cycle 1
cycle 2 (i = 6..10), same positions, not forced equal
i=6y6=μ+u6pos 1, cycle 2
i=7y7=μ+u7pos 2, cycle 2
i=8y8=μ+u8pos 3, cycle 2
i=9y9=μ+u9pos 4, cycle 2
i=10y10=μ+u10pos 5, cycle 2
The seasonal constraints: every consecutive sum of length m = 5 must look Gaussian
window 1
u1u2u3u4u5u6u7u8u9u10
sum ∼ N(0, τ−1)
window 2
u1u2u3u4u5u6u7u8u9u10
sum ∼ N(0, τ−1)
window 3
u1u2u3u4u5u6u7u8u9u10
sum ∼ N(0, τ−1)
window 4
u1u2u3u4u5u6u7u8u9u10
sum ∼ N(0, τ−1)
window 5
u1u2u3u4u5u6u7u8u9u10
sum ∼ N(0, τ−1)
window 6
u1u2u3u4u5u6u7u8u9u10
sum ∼ N(0, τ−1)

For n = 10 obs and m = 5, there are nm + 1 = 6 sliding windows. Each one constrains a 5-element sum to be small. This favours patterns whose values inside any cycle of length 5 roughly cancel.

The resulting precision matrix   Q = τ R   (structure matrix R for n = 10, m = 5)

Stack the six window constraints into the (n − m + 1) × n design matrix D (row i is [0,…,0, 1,1,1,1,1, 0,…,0] with the five 1s at positions i..i+4), then R = DTD. Entry Rab counts how many windows contain both positions a and b.

  1 2 3 4 5 6 7 8 9 10
11111100000
21222210000
31233321000
41234432100
51234543210
60123454321
70012344321
80001233321
90000122221
100000011111
  • Bandwidth = 2m − 1 = 9: Rab = 0 whenever |ab| ≥ m (no length-m window can contain both).
  • Interior diagonal = m = 5: positions in the middle of the index sit inside all m overlapping windows; positions near the boundary sit in fewer (a triangular ramp 1, 2, 3, 4, 5 from each corner).
  • Interior off-diagonals decay linearly: row 5 reads 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, the autocorrelation of the all-ones vector of length m with itself.
  • Rank deficiency = m − 1 = 4: there are m − 1 directions that all windowed sums miss. (Concretely: shifting every value at slot k ≡ const (mod m) by the same amount while keeping the within-period sum at zero leaves every length-m sum unchanged.) An intercept absorbs one direction; the rest are pinned by data.
Side-by-side: RW1 (case 5) vs RW1 cyclic (case 7) vs Seasonal (this case)
Aspect RW1 (case 5) RW1 cyclic (case 7) Seasonal (case 10)
Prior is on neighbor increments uiui−1 same as RW1, plus the wrap pair unu1 sliding sums of m consecutive values
Built-in period none, just smoothness the full length n (one big loop) configurable m via season.length
Forces ui+m = ui? no period at all only at the wrap (i = n ↔ i = 1) no, only the sums are constrained
Typical sample smooth random walk smooth walk that closes back on itself oscillation around zero with cycle m
When to reach for it monotone trend, drift, no recurring shape angle, day-of-year, anything truly periodic with one full cycle of data monthly seasonality across years, weekly across weeks, etc.
Mnemonic.
  • RW1 ties neighbors together: good for smoothness, no period.
  • RW1 cyclic ties neighbors and closes the loop end-to-start: good when the index itself is a circle (angles, day-of-year).
  • Seasonal ties every windowed sum of length m: good when the pattern repeats every m steps and you have several cycles of data.