Runtime & Threads
A small set of pyINLA arguments control how the fit is executed: how many threads to use, where intermediate files live, whether to keep them after the run, and how verbose the engine should be. None of these change the model. They only affect speed and bookkeeping.
What counts as a runtime parameter
Runtime parameters are infrastructure: they govern execution, file placement, and output verbosity. The numerical fit is identical regardless of the values you choose. So you can safely tune these for performance or convenience without worrying about reproducibility of the results.
Quick reference
The four runtime knobs documented here are num_threads, working_directory, keep, and verbose. All four are pure infrastructure: they affect only speed and bookkeeping, never the inferred posterior.
Parallelism: num_threads
pyINLA can run the inner C engine with a configurable number of threads. The argument accepts two forms: a plain integer (num_threads=4) or a colon-separated string (num_threads='4:2'). The string form is two-level parallelism: A outer threads, each of which can spawn up to B inner threads. For most real models the combination form is faster than a single integer with the same total budget.
from pyinla import pyinla
result = pyinla(
model={'response': 'y', 'fixed': ['1', 'x']},
family='gaussian',
data=df,
num_threads='4:2', # 4 outer, 2 inner -- typically the sweet spot
)
What the two levels do: outer parallelises the hyperparameter (theta) evaluations the optimiser explores; inner parallelises the linear algebra (Cholesky / BLAS) inside each evaluation. Spending all your threads on one side wastes the other side's parallelism. num_threads=4 is shorthand for '4:1' (4 outer, 1 inner): convenient, but it leaves the inner level entirely sequential.
The A:B[:C] form
The optional C is passed straight through to the engine for a deeper level of concurrency. Some examples:
num_threads = '4:2' # 4 outer threads, 2 inner each <-- often the best
num_threads = '4:2:1' # 4 outer, 2 inner, 1 at the deepest level
num_threads = ':2' # use all CPUs at the outer level, 2 inner each
num_threads = '4:' # 4 outer, 1 inner (same as 4)
Measured speedups
Benchmark on a generic0 random-effect model (N=3000 levels, NOBS=30000 observations), median of 3 trials, on a 22-core machine. Speedup is relative to '1:1':
| outer \ inner | 1 | 2 | 4 | 8 |
|---|---|---|---|---|
| 1 | 1.00x | 1.14x | 1.27x | 1.23x |
| 2 | 1.18x | 1.36x | 1.33x | 0.80x |
| 4 | 1.40x | 1.43x | 1.23x | -- |
| 8 | 1.38x | 1.21x | -- | -- |
| 16 | 1.26x | -- | -- | -- |
Reading this: num_threads=4 (i.e. '4:1') gets 1.40x. The same total thread budget split as '2:2' gets 1.36x; pushing it to '4:2' tops out at 1.43x. Pure inner forms like '1:4' and '1:8' trail (1.27x, 1.23x). Past '8:1' there is no further gain, and over-subscribing the box (e.g. '2:8') actively hurts. Numbers vary by problem size and hardware; this is the shape, not a universal table.
num_threads='4:2' as a default starting point on a multi-core machine, then tune A and B by hand if you care about the last 5-10%. Avoid A*B larger than the number of physical cores: thread contention can make things slower. The plain integer form (num_threads=4) is fine when you don't want to think about it, but it leaves the inner level on the table.
Flexible spelling
The parser is forgiving with separators: spaces, tabs, commas, and colons are interchangeable. "4, 2, 1", "4 2 1", and "4:2:1" all mean the same thing. An empty string, a bare ":", or omitting num_threads entirely resolves to "16:1", the same default R-INLA uses.
Where files live: working_directory
pyINLA writes intermediate artifacts (the engine's working files and log) to a temporary location by default. To pin them somewhere stable, pass working_directory:
result = pyinla(
model=..., family='gaussian', data=df,
working_directory='/tmp/my_inla_run',
)
Use this when you want to:
- Inspect what pyINLA actually sent to the engine.
- Compare one fit against another from disk.
- Keep a per-run audit trail when iterating on a model.
Holding on to artifacts: keep
By default pyINLA cleans up its working_directory when the call returns. Pass keep=True to preserve everything:
result = pyinla(
model=..., family='gaussian', data=df,
working_directory='/tmp/my_inla_run',
keep=True,
)
You then get a populated artifacts directory that you can re-open, diff, or pass to downstream tooling. Combined with working_directory, this is the standard pattern for inspecting or reproducing a run after the fact.
Engine output: verbose
Pass verbose=True to have pyINLA stream the C engine's progress to stdout (iteration counts, theta updates, gradient norms). This is essential when a fit hangs or fails to converge:
result = pyinla(
model=..., family='gaussian', data=df,
verbose=True,
)
When set, you'll see lines like:
Compute initial values...
Iter[0] RMS(err) = 1.000, update with step-size = 0.979
Iter[1] RMS(err) = 0.437, update with step-size = 1.059
Optimise using DEFAULT METHOD
maxld= -440.8797 fn= 1 theta= -0.0050 20.0000 ...
Off by default to keep notebooks readable.
Why this matters: the no-effect guarantee
Every runtime knob on this page is exercised in pyINLA's test suite to confirm it has no effect on the fit. The posterior summaries at num_threads=1, num_threads=4, and num_threads=8 are identical. The same is true for verbose, keep, and working_directory: changing them changes only the runtime experience, never the inferred model.
Summary
| Parameter | Type | Default | Effect |
|---|---|---|---|
| num_threads | int or str A:B[:C] |
16:1 | Outer / inner / engine-level parallelism. No effect on the fit. |
| working_directory | str / Path | auto temp dir | Where intermediate artifacts are written. |
| keep | bool | False | Preserve working_directory after the call. |
| verbose | bool | False | Stream the C engine's progress to stdout. |