Performance

Runtime & Threads

A small set of pyINLA arguments control how the fit is executed: how many threads to use, where intermediate files live, whether to keep them after the run, and how verbose the engine should be. None of these change the model. They only affect speed and bookkeeping.

What counts as a runtime parameter

Runtime parameters are infrastructure: they govern execution, file placement, and output verbosity. The numerical fit is identical regardless of the values you choose. So you can safely tune these for performance or convenience without worrying about reproducibility of the results.

Quick reference

The four runtime knobs documented here are num_threads, working_directory, keep, and verbose. All four are pure infrastructure: they affect only speed and bookkeeping, never the inferred posterior.

Parallelism: `num_threads`

pyINLA can run the inner C engine with a configurable number of threads. The argument accepts two forms: a plain integer (num_threads=4) or a colon-separated string (num_threads='4:2'). The string form is two-level parallelism: A outer threads, each of which can spawn up to B inner threads. For most real models the combination form is faster than a single integer with the same total budget.

from pyinla import pyinla

result = pyinla(
    model={'response': 'y', 'fixed': ['1', 'x']},
    family='gaussian',
    data=df,
    num_threads='4:2',     # 4 outer, 2 inner -- typically the sweet spot
)

What the two levels do: outer parallelises the hyperparameter (theta) evaluations the optimiser explores; inner parallelises the linear algebra (Cholesky / BLAS) inside each evaluation. Spending all your threads on one side wastes the other side's parallelism. num_threads=4 is shorthand for '4:1' (4 outer, 1 inner): convenient, but it leaves the inner level entirely sequential.

The `A:B[:C]` form

The optional C is passed straight through to the engine for a deeper level of concurrency. Some examples:

num_threads = '4:2'      # 4 outer threads, 2 inner each  <-- often the best
num_threads = '4:2:1'    # 4 outer, 2 inner, 1 at the deepest level
num_threads = ':2'       # use all CPUs at the outer level, 2 inner each
num_threads = '4:'       # 4 outer, 1 inner (same as 4)

Measured speedups

Benchmark on a generic0 random-effect model (N=3000 levels, NOBS=30000 observations), median of 3 trials, on a 22-core machine. Speedup is relative to '1:1':

outer \ inner	1	2	4	8
1	1.00x	1.14x	1.27x	1.23x
2	1.18x	1.36x	1.33x	0.80x
4	1.40x	1.43x	1.23x	--
8	1.38x	1.21x	--	--
16	1.26x	--	--	--

Reading this: num_threads=4 (i.e. '4:1') gets 1.40x. The same total thread budget split as '2:2' gets 1.36x; pushing it to '4:2' tops out at 1.43x. Pure inner forms like '1:4' and '1:8' trail (1.27x, 1.23x). Past '8:1' there is no further gain, and over-subscribing the box (e.g. '2:8') actively hurts. Numbers vary by problem size and hardware; this is the shape, not a universal table.

Practical guidance: reach for num_threads='4:2' as a default starting point on a multi-core machine, then tune A and B by hand if you care about the last 5-10%. Avoid A*B larger than the number of physical cores: thread contention can make things slower. The plain integer form (num_threads=4) is fine when you don't want to think about it, but it leaves the inner level on the table.

Flexible spelling

The parser is forgiving with separators: spaces, tabs, commas, and colons are interchangeable. "4, 2, 1", "4 2 1", and "4:2:1" all mean the same thing. An empty string, a bare ":", or omitting num_threads entirely resolves to "16:1", the same default R-INLA uses.

Where files live: `working_directory`

pyINLA writes intermediate artifacts (the engine's working files and log) to a temporary location by default. To pin them somewhere stable, pass working_directory:

result = pyinla(
    model=..., family='gaussian', data=df,
    working_directory='/tmp/my_inla_run',
)

Use this when you want to:

Inspect what pyINLA actually sent to the engine.
Compare one fit against another from disk.
Keep a per-run audit trail when iterating on a model.

Holding on to artifacts: `keep`

By default pyINLA cleans up its working_directory when the call returns. Pass keep=True to preserve everything:

result = pyinla(
    model=..., family='gaussian', data=df,
    working_directory='/tmp/my_inla_run',
    keep=True,
)

You then get a populated artifacts directory that you can re-open, diff, or pass to downstream tooling. Combined with working_directory, this is the standard pattern for inspecting or reproducing a run after the fact.

Engine output: `verbose`

Pass verbose=True to have pyINLA stream the C engine's progress to stdout (iteration counts, theta updates, gradient norms). This is essential when a fit hangs or fails to converge:

result = pyinla(
    model=..., family='gaussian', data=df,
    verbose=True,
)

When set, you'll see lines like:

Compute initial values...
    Iter[0] RMS(err) = 1.000, update with step-size = 0.979
    Iter[1] RMS(err) = 0.437, update with step-size = 1.059
Optimise using DEFAULT METHOD
maxld= -440.8797 fn=  1 theta= -0.0050 20.0000 ...

Off by default to keep notebooks readable.

Why this matters: the no-effect guarantee

Every runtime knob on this page is exercised in pyINLA's test suite to confirm it has no effect on the fit. The posterior summaries at num_threads=1, num_threads=4, and num_threads=8 are identical. The same is true for verbose, keep, and working_directory: changing them changes only the runtime experience, never the inferred model.

Practical implication: you can crank threads in production, drop them in CI, and route output to a temp folder, all without re-validating the fit. The model the engine sees is the same.

Summary

Parameter	Type	Default	Effect
num_threads	int or str `A:B[:C]`	16:1	Outer / inner / engine-level parallelism. No effect on the fit.
working_directory	str / Path	auto temp dir	Where intermediate artifacts are written.
keep	bool	False	Preserve `working_directory` after the call.
verbose	bool	False	Stream the C engine's progress to stdout.

What counts as a runtime parameter

Quick reference

Parallelism: num_threads

The A:B[:C] form

Measured speedups

Flexible spelling

Where files live: working_directory

Holding on to artifacts: keep

Engine output: verbose

Why this matters: the no-effect guarantee

Summary

Parallelism: `num_threads`

The `A:B[:C]` form

Where files live: `working_directory`

Holding on to artifacts: `keep`

Engine output: `verbose`