CASE 1 OF 10IID, one latent value per observation
Spec: {'id': 'row_id', 'model': 'iid'} with 10 distinct row_id values
One latent value per observation. Classic use: absorbing overdispersion. n_levels = n_obs = 10.
| i=1 | y1 | = | μ | + | u1 |
| i=2 | y2 | = | μ | + | u2 |
| i=3 | y3 | = | μ | + | u3 |
| i=4 | y4 | = | μ | + | u4 |
| i=5 | y5 | = | μ | + | u5 |
| i=6 | y6 | = | μ | + | u6 |
| i=7 | y7 | = | μ | + | u7 |
| i=8 | y8 | = | μ | + | u8 |
| i=9 | y9 | = | μ | + | u9 |
| i=10 | y10 | = | μ | + | u10 |
CASE 2 OF 10IID, grouped (the hierarchical case)
Spec: {'id': 'school_id', 'model': 'iid'} with 4 schools, 10 students
Many observations share the same latent value (students in schools). n_levels (4) is smaller than n_obs (10), so the same u appears on several rows.
| i=1 | y1 | = | μ | + | u1 |
| i=2 | y2 | = | μ | + | u1 |
| i=3 | y3 | = | μ | + | u1 |
| i=4 | y4 | = | μ | + | u2 |
| i=5 | y5 | = | μ | + | u2 |
| i=6 | y6 | = | μ | + | u2 |
| i=7 | y7 | = | μ | + | u3 |
| i=8 | y8 | = | μ | + | u3 |
| i=9 | y9 | = | μ | + | u4 |
| i=10 | y10 | = | μ | + | u4 |
CASE 3 OF 10IID, replicated (nrep independent copies of the same iid block)
Spec: {'id': 'level', 'model': 'iid', 'replicate': rep, 'nrep': 2}
The whole iid block is repeated nrep times, independently. Each observation carries two integer indices: its level (1..5) and which replicate it belongs to (1..2). Together they pick out one of 5 × 2 = 10 latent values. The two replicate blocks share one precision τ (estimated jointly), but the realized u-draws across reps are uncorrelated: same level in different reps gives a different u.
| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
level |
1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 |
rep |
1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
level has 5 unique values (each appearing twice); rep has 2 unique values. Together they index 5 × 2 = 10 independent latent values.
| rep = 1 | |||||
| i=1 | y1 | = | μ | + | u1,1 |
| i=2 | y2 | = | μ | + | u2,1 |
| i=3 | y3 | = | μ | + | u3,1 |
| i=4 | y4 | = | μ | + | u4,1 |
| i=5 | y5 | = | μ | + | u5,1 |
| rep = 2 | |||||
| i=6 | y6 | = | μ | + | u1,2 |
| i=7 | y7 | = | μ | + | u2,2 |
| i=8 | y8 | = | μ | + | u3,2 |
| i=9 | y9 | = | μ | + | u4,2 |
| i=10 | y10 | = | μ | + | u5,2 |
Why colors split into two families? Cool colors (indigo → teal) mark rep = 1; warm colors (pink → amber) mark rep = 2. Observations 1 and 6 both have level = 1 but their badges have different colors, because they pick up different latent values (u1,1 vs u1,2).
CASE 4 OF 10IID + control.group (correlation across a second axis)
control.group (correlation across a second axis)Spec: {'id': 'region', 'model': 'iid', 'group': week, 'ngroup': 2, 'control.group': {'model': 'ar1'}}
within group g: u·,g iid~ N(0, τ−1) | across g: corr(uk,1, uk,2) = ρ (AR1)
Same 2D layout as case 3 (5 regions × 2 weeks = 10 latent values), but now the second axis carries correlation, not independent copies. Within each week the 5 regions are still iid. Between the two weeks, the matching latent values are tied together: uk,1 and uk,2 share a one-step AR1 correlation ρ. So obs 1 (region 1, week 1) and obs 6 (region 1, week 2) are not the same value, but they are correlated: knowing one tells you something about the other.
| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
region |
1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 |
week |
1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
control.group dict: setting 'model': 'ar1' turns the across-group axis from iid into AR1.
| g = 1 (week 1) | |||||
| i=1 | y1 | = | μ | + | u1,1 |
| i=2 | y2 | = | μ | + | u2,1 |
| i=3 | y3 | = | μ | + | u3,1 |
| i=4 | y4 | = | μ | + | u4,1 |
| i=5 | y5 | = | μ | + | u5,1 |
| g = 2 (week 2), AR1-correlated with g = 1 | |||||
| i=6 | y6 | = | μ | + | u1,2 |
| i=7 | y7 | = | μ | + | u2,2 |
| i=8 | y8 | = | μ | + | u3,2 |
| i=9 | y9 | = | μ | + | u4,2 |
| i=10 | y10 | = | μ | + | u5,2 |
Why colors come in light/dark pairs? Each region gets one hue. The saturated shade marks week 1, the lighter shade marks week 2. Same hue says "same region, AR1-correlated"; different shade says "different week so different draw." Contrast with case 3, where rep 1 and rep 2 use unrelated cool / warm families because they are independent.
level and group vectors, identical 5 × 2 = 10 latent values. The only difference is the control.group dict (absent in case 3, set to {'model': 'ar1'} here). That single key swaps independence for correlation along the second axis.
CASE 5 OF 10RW1 (ordered, neighbors are correlated)
Spec: {'id': 'time', 'model': 'rw1'} with 10 ordered time points
One latent value per ordered position, like IID, but neighbors are tied together by an increment penalty. Colors shade smoothly to remind you of the neighbor coupling.
| i=1 | y1 | = | μ | + | u1 |
| i=2 | y2 | = | μ | + | u2 |
| i=3 | y3 | = | μ | + | u3 |
| i=4 | y4 | = | μ | + | u4 |
| i=5 | y5 | = | μ | + | u5 |
| i=6 | y6 | = | μ | + | u6 |
| i=7 | y7 | = | μ | + | u7 |
| i=8 | y8 | = | μ | + | u8 |
| i=9 | y9 | = | μ | + | u9 |
| i=10 | y10 | = | μ | + | u10 |
CASE 6 OF 10The group key on its own (what does it actually do?)
group key on its own (what does it actually do?)Spec: {'id': 'region', 'model': 'iid', 'group': week, 'ngroup': 2} (no control.group set)
control.group is iid, so no second hyperparam)within group g: u·,g iid~ N(0, τ−1) | across g: independent (default
control.group is iid)
group introduces a second indexing axis on top of id. By itself, with no control.group dict, the across-group axis defaults to iid: all 5 × 2 = 10 latent values are independent draws. Mathematically, this case is identical to case 3 (replicate). The point of group is not what it does on its own; it is what it lets you do next: add a control.group dict to make the across-group axis correlated (case 4).
| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
region |
1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 |
week |
1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
| g = 1 (week 1) | |||||
| i=1 | y1 | = | μ | + | u1,1 |
| i=2 | y2 | = | μ | + | u2,1 |
| i=3 | y3 | = | μ | + | u3,1 |
| i=4 | y4 | = | μ | + | u4,1 |
| i=5 | y5 | = | μ | + | u5,1 |
| g = 2 (week 2), independent of g = 1 (default iid) | |||||
| i=6 | y6 | = | μ | + | u1,2 |
| i=7 | y7 | = | μ | + | u2,2 |
| i=8 | y8 | = | μ | + | u3,2 |
| i=9 | y9 | = | μ | + | u4,2 |
| i=10 | y10 | = | μ | + | u5,2 |
Why colors split into two families (cool / warm)? Same scheme as case 3: cool for g = 1, warm for g = 2. Independent groups get unrelated hue families. Compare with case 4, where the same hue (saturated vs light) marks the AR1 tie between matched regions.
- Case 3 (
replicate+nrep): two independent copies. Posterior output is indexed by (region, rep). - Case 6 (
group+ngroup, nocontrol.group): mathematically the same as case 3. The keyword choice mostly affects output labeling and what extensions you can add later. - Case 4 (case 6 +
'control.group': {'model': 'ar1'}): adds AR1 correlation across groups. This is whatgroupexists for.
CASE 7 OF 10RW1 cyclic (neighbors wrap around: position n ties back to position 1)
Spec: {'id': 'hour', 'model': 'rw1', 'cyclic': True}
ui − ui−1 iid~ N(0, τ−1) for i = 2..n | plus u1 − un iid~ N(0, τ−1)
Same setup as case 5 (linear RW1): one latent value per ordered position, smooth-with-neighbors prior. The only difference is the graph topology: setting cyclic: True adds one extra increment u1 − un to the prior, so the start and end of the index are tied together. Use it when the index wraps naturally (hour-of-day, day-of-week, month-of-year, any angular variable).
Neighbor graph
Position 10 wraps to position 1 (pink dashed edge).
| i=1 | y1 | = | μ | + | u1 |
| i=2 | y2 | = | μ | + | u2 |
| i=3 | y3 | = | μ | + | u3 |
| i=4 | y4 | = | μ | + | u4 |
| i=5 | y5 | = | μ | + | u5 |
| i=6 | y6 | = | μ | + | u6 |
| i=7 | y7 | = | μ | + | u7 |
| i=8 | y8 | = | μ | + | u8 |
| i=9 | y9 | = | μ | + | u9 |
| i=10 | y10 | = | μ | + | u10 |
Precision matrix Q = τR: linear vs cyclic
The structure matrix R for cyclic RW1 is a circulant tridiagonal: 2 on every diagonal entry, −1 on every nearest-neighbor off-diagonal, plus −1 in the two corners for the wrap edge between positions 1 and n. Compared with linear RW1, the only differences are the four highlighted cells: the boundary diagonals jump from 1 to 2 (positions 1 and n now have two neighbors each), and the two corners pick up −1 entries (the wrap edge).
Linear RW1 (cyclic = False): pure tridiagonal, rank n−1
Diagonal: 1, 2, 2, ..., 2, 2, 1. Off-diagonal: −1. Corners: 0.
Cyclic RW1: circulant tridiagonal, rank still n−1
Diagonal: 2, 2, ..., 2. Off-diagonal: −1. Corners: −1 (dashed, wrap edge).
- R[1,1] and R[n,n]: 1 → 2 (endpoints gain a neighbor)
- R[1,n] and R[n,1]: 0 → −1 (the wrap edge)
- Every other cell is unchanged.
constr=True is on by default in pyINLA. What the wrap edge does remove is the freedom for the start and end of the path to drift apart; what it does not remove is the freedom to shift the whole curve up or down.
Why the colors form a wheel? The 10 badges trace the hue circle in steps of 36°. Position 10 (purple-red) sits visually next to position 1 (red) just like its prior neighbor relationship. Compare case 5 where colors run from teal to violet along a line: in linear RW1 position 10 has no special tie to position 1.
- Both have 1 hyperparameter (log τ), both have rank n − 1 (one null direction: the overall level), both default to
constr=True. - Linear RW1 has n−1 = 9 neighbor pairs; cyclic RW1 has n = 10 (one extra wrap edge).
- Cyclic samples are tied at the ends; linear samples are free to drift.
CASE 8 OF 10IID with weights (per-observation design scaling)
weights (per-observation design scaling)Spec: {'id': 'group_id', 'model': 'iid', 'weights': w} (w is a 1-D numpy array of length n_obs)
uk iid~ N(0, τ−1), k = 1..3 | wi are fixed inputs (not estimated)
weights scales the random-effect contribution per observation: wi multiplies ug(i) in the linear predictor. The latent vector is the same as case 2 (one u per group). What changes is the design matrix: instead of putting a 1 wherever observation i picks up group g, it puts wi. So obs sharing a group keep the same colour (same u), but the badge label carries the weight factor.
weights argument instead.
| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
group_id | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |
weights | 1.0 | 2.0 | 0.5 | 1.5 | 1.0 | 2.0 | 1.0 | 0.5 | 1.5 | 2.0 |
| i=1 | y1 | = | μ | + | 1.0 · u1 |
| i=2 | y2 | = | μ | + | 2.0 · u1 |
| i=3 | y3 | = | μ | + | 0.5 · u1 |
| i=4 | y4 | = | μ | + | 1.5 · u2 |
| i=5 | y5 | = | μ | + | 1.0 · u2 |
| i=6 | y6 | = | μ | + | 2.0 · u2 |
| i=7 | y7 | = | μ | + | 1.0 · u3 |
| i=8 | y8 | = | μ | + | 0.5 · u3 |
| i=9 | y9 | = | μ | + | 1.5 · u3 |
| i=10 | y10 | = | μ | + | 2.0 · u3 |
Reading the badges. Colour = group (indigo / green / amber). The "w · ug" label inside the badge is the literal contribution to that row's ηi. Rows 1, 2, 3 all share u1 but multiply it by 1.0, 2.0, and 0.5 respectively. Same u, different pull.
weights:
- Exposure-style scaling on an RE: a group's effect is multiplied per row by population, time at risk, or area.
- Linear slope inside an f() block: set
id = 1(one level),weights = x_i. Then u1 plays the role of a slope (ηi += xi · u1). - Custom design contrasts: fractional or signed shares of a group's u (positive and negative weights are allowed).
CASE 9 OF 10IID with values (reserve slots for unobserved levels)
values (reserve slots for unobserved levels)Spec: {'id': 'school_id', 'model': 'iid', 'values': [1, 2, 3, 4, 5]} (10 students across 4 schools; school 3 has no data yet)
values is a fixed list, not a parameter; declares 5 latent slots regardless of what is observed)uk iid~ N(0, τ−1), k = 1..5 | level set fixed by
values, not inferred from the data
values declares the complete set of allowed levels for the id column. Without it, pyINLA infers the level set from sort(unique(id)) in the data, so the latent vector has one entry per observed level. With values, the latent vector has one entry per declared level, even those that never appear in the data. Those extra slots are governed purely by the prior; there are no yi rows that would update them.
values declares| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
school_id |
1 | 1 | 1 | 2 | 2 | 4 | 4 | 5 | 5 | 5 |
values declares: [1, 2, 3, 4, 5]
values declares 5 levels. School 3 is declared but no observation refers to it.
| i=1 | y1 | = | μ | + | u1 | school 1 |
| i=2 | y2 | = | μ | + | u1 | school 1 |
| i=3 | y3 | = | μ | + | u1 | school 1 |
| i=4 | y4 | = | μ | + | u2 | school 2 |
| i=5 | y5 | = | μ | + | u2 | school 2 |
| i=6 | y6 | = | μ | + | u4 | school 4 |
| i=7 | y7 | = | μ | + | u4 | school 4 |
| i=8 | y8 | = | μ | + | u5 | school 5 |
| i=9 | y9 | = | μ | + | u5 | school 5 |
| i=10 | y10 | = | μ | + | u5 | school 5 |
values)- Prediction at unseen groups: you want a posterior for school 3 even though you have no rows for it yet. With
values, the posterior is just the prior u3 ∼ N(0, τ−1); τ is still informed by the observed schools. - Stable indexing across runs: same
valueslist across fits guarantees the same k means the same school in every posterior table. - Document the level set in the spec: makes the model self-describing instead of depending on whatever happens to be in
df['school_id']today.
CASE 10 OF 10Seasonal (period m: windowed sums of length m are iid)
Spec: {'id': 'time', 'model': 'seasonal', 'season.length': 5} (10 ordered time points = 2 full cycles)
season.length is fixed metadata, not a parameter)ui + ui+1 + … + ui+m−1 iid~ N(0, τ−1), i = 1, …, n − m + 1
The seasonal model puts the prior on sliding sums of length m, not on neighbor differences. Every consecutive window of m latent values is constrained to look like a small Gaussian. Patterns that repeat with period m (so each cycle of length m roughly averages to zero) sit comfortably under this prior; non-periodic drift accumulates inside the windows and is penalized. The model is intrinsic: it has rank deficiency m − 1, identifiable once you have an intercept or other fixed effects.
| i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
time |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| position | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 |
| cycle 1 (i = 1..5) | ||||||
| i=1 | y1 | = | μ | + | u1 | pos 1, cycle 1 |
| i=2 | y2 | = | μ | + | u2 | pos 2, cycle 1 |
| i=3 | y3 | = | μ | + | u3 | pos 3, cycle 1 |
| i=4 | y4 | = | μ | + | u4 | pos 4, cycle 1 |
| i=5 | y5 | = | μ | + | u5 | pos 5, cycle 1 |
| cycle 2 (i = 6..10), same positions, not forced equal | ||||||
| i=6 | y6 | = | μ | + | u6 | pos 1, cycle 2 |
| i=7 | y7 | = | μ | + | u7 | pos 2, cycle 2 |
| i=8 | y8 | = | μ | + | u8 | pos 3, cycle 2 |
| i=9 | y9 | = | μ | + | u9 | pos 4, cycle 2 |
| i=10 | y10 | = | μ | + | u10 | pos 5, cycle 2 |
For n = 10 obs and m = 5, there are n − m + 1 = 6 sliding windows. Each one constrains a 5-element sum to be small. This favours patterns whose values inside any cycle of length 5 roughly cancel.
Stack the six window constraints into the (n − m + 1) × n design matrix D (row i is [0,…,0, 1,1,1,1,1, 0,…,0] with the five 1s at positions i..i+4), then R = DTD. Entry Rab counts how many windows contain both positions a and b.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | 1 | 2 | 2 | 2 | 2 | 1 | 0 | 0 | 0 | 0 |
| 3 | 1 | 2 | 3 | 3 | 3 | 2 | 1 | 0 | 0 | 0 |
| 4 | 1 | 2 | 3 | 4 | 4 | 3 | 2 | 1 | 0 | 0 |
| 5 | 1 | 2 | 3 | 4 | 5 | 4 | 3 | 2 | 1 | 0 |
| 6 | 0 | 1 | 2 | 3 | 4 | 5 | 4 | 3 | 2 | 1 |
| 7 | 0 | 0 | 1 | 2 | 3 | 4 | 4 | 3 | 2 | 1 |
| 8 | 0 | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 2 | 1 |
| 9 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 2 | 2 | 1 |
| 10 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
- Bandwidth = 2m − 1 = 9: Rab = 0 whenever |a − b| ≥ m (no length-m window can contain both).
- Interior diagonal = m = 5: positions in the middle of the index sit inside all m overlapping windows; positions near the boundary sit in fewer (a triangular ramp 1, 2, 3, 4, 5 from each corner).
- Interior off-diagonals decay linearly: row 5 reads
1, 2, 3, 4, 5, 4, 3, 2, 1, 0, the autocorrelation of the all-ones vector of length m with itself. - Rank deficiency = m − 1 = 4: there are m − 1 directions that all windowed sums miss. (Concretely: shifting every value at slot k ≡ const (mod m) by the same amount while keeping the within-period sum at zero leaves every length-m sum unchanged.) An intercept absorbs one direction; the rest are pinned by data.
| Aspect | RW1 (case 5) | RW1 cyclic (case 7) | Seasonal (case 10) |
|---|---|---|---|
| Prior is on | neighbor increments ui − ui−1 | same as RW1, plus the wrap pair un − u1 | sliding sums of m consecutive values |
| Built-in period | none, just smoothness | the full length n (one big loop) | configurable m via season.length |
| Forces ui+m = ui? | no period at all | only at the wrap (i = n ↔ i = 1) | no, only the sums are constrained |
| Typical sample | smooth random walk | smooth walk that closes back on itself | oscillation around zero with cycle m |
| When to reach for it | monotone trend, drift, no recurring shape | angle, day-of-year, anything truly periodic with one full cycle of data | monthly seasonality across years, weekly across weeks, etc. |
- RW1 ties neighbors together: good for smoothness, no period.
- RW1 cyclic ties neighbors and closes the loop end-to-start: good when the index itself is a circle (angles, day-of-year).
- Seasonal ties every windowed sum of length m: good when the pattern repeats every m steps and you have several cycles of data.