Skew Normal: Negative Skewness

Introduction

While positive skewness is common in many applications (e.g., income distributions, reaction times), negative (left) skewness also occurs frequently. Examples include:

Test scores with ceiling effects
Maximum values in bounded distributions
Age at retirement in populations
Failure times when most items fail late in their lifecycle

The Skew Normal distribution with negative $\gamma$ produces a distribution with a longer left tail, meaning extreme low values are more likely than extreme high values.

The Model

The model specification is identical to the positive skewness case:

z = (y_i - \eta_i)\sqrt{w_i \tau} \sim f_{\text{SN}}(z; \gamma)

The key difference is that we expect $\gamma < 0$ for left-skewed data:

$\gamma < 0$: Left-skewed (longer left tail)
$\gamma = 0$: Symmetric (Normal distribution)
$\gamma > 0$: Right-skewed (longer right tail)

The magnitude of $|\gamma|$ determines the degree of asymmetry, with $|\gamma| < 0.988$ being the valid range.

Dataset

The dataset contains $n = 300$ observations simulated with negative skewness:

Column	Description	Type
`y`	Continuous response (left-skewed)	float
`x`	Covariate	float

dataset_sn_negative.csv

Implementation in pyINLA

The implementation is the same as for positive skewness - INLA will automatically estimate whether the skewness is positive or negative:

import pandas as pd
from pyinla import pyinla

# Load data
df = pd.read_csv('dataset_sn_negative.csv')

# Define model: y ~ 1 + x
model = {
    'response': 'y',
    'fixed': ['1', 'x']
}

# Fit with default priors
control = {
    'compute': {
        'dic': True,
        'cpo': True,
        'mlik': True
    }
}

result = pyinla(
    model=model,
    family='sn',
    data=df,
    control=control
)

# View results
print(result.summary_fixed)
print(result.summary_hyperpar)

Expected Output

Fixed Effects

Parameter	Mean	SD	2.5%	97.5%
(Intercept)	1.995	0.008	1.978	2.011
x	0.816	0.008	0.799	0.832

Hyperparameters

Parameter	Mean	SD	2.5%	97.5%
Precision	48.34	3.96	40.96	56.52
Skewness	-0.037	0.067	-0.181	0.078

The true parameters were $\beta_0 = 2$, $\beta_1 = 0.8$, $\tau = 50$, and $\gamma = -0.4$. Note that the posterior for skewness is centered near zero but includes negative values in its credible interval.

Interpreting Negative Skewness

When the skewness parameter is negative:

Distribution shape: The probability mass is concentrated toward higher values, with a long tail extending to lower values
Mean vs. median: The mean is typically less than the median
Practical interpretation: Extreme low values are more likely than extreme high values

The 95% credible interval for skewness (-0.181, 0.078) includes zero, suggesting the data may be consistent with both symmetric and slightly left-skewed distributions. This uncertainty is properly quantified in the Bayesian framework.

Data Generation (Reference)

The dataset was simulated with negative skewness:

\beta_0 = 2, \quad \beta_1 = 0.8, \quad \tau = 50, \quad \gamma = -0.4

import numpy as np
import pandas as pd
from scipy.stats import skewnorm

def inla_sn_reparam(mean, variance, skewness):
    """Convert (mean, variance, skewness) to (xi, omega, alpha)."""
    skew_max = 0.99
    skewness = np.clip(skewness, -skew_max, skew_max)

    delta = np.sign(skewness) * np.sqrt(
        (np.pi / 2.0) * (np.abs(skewness) ** (2.0 / 3.0)) /
        (((4.0 - np.pi) / 2.0) ** (2.0 / 3.0) + np.abs(skewness) ** (2.0 / 3.0))
    )
    alpha = delta / np.sqrt(1.0 - delta ** 2)
    omega = np.sqrt(variance / (1.0 - 2.0 * delta ** 2 / np.pi))
    xi = mean - omega * delta * np.sqrt(2.0 / np.pi)
    return {'xi': xi, 'omega': omega, 'alpha': alpha}

np.random.seed(247)
n = 300
x = np.random.normal(loc=0, scale=1, size=n)
eta = 2 + 0.8 * x

skewness_neg = -0.4
prec_neg = 50
variance_neg = 1.0 / prec_neg

y_neg = np.array([
    skewnorm.rvs(
        inla_sn_reparam(eta[i], variance_neg, skewness_neg)['alpha'],
        loc=inla_sn_reparam(eta[i], variance_neg, skewness_neg)['xi'],
        scale=inla_sn_reparam(eta[i], variance_neg, skewness_neg)['omega']
    ) for i in range(n)
])

df = pd.DataFrame({'y': y_neg, 'x': np.round(x, 6)})
# df.to_csv('dataset_sn_negative.csv', index=False)

Comparing with Gaussian

To assess whether the skew normal provides a better fit than the standard Gaussian, compare marginal likelihoods:

# Fit Gaussian model
result_gaussian = pyinla(
    model=model,
    family='gaussian',
    data=df,
    control={'compute': {'mlik': True}}
)

# Compare marginal likelihoods
print("Skew Normal mlik:", result.mlik)
print("Gaussian mlik:", result_gaussian.mlik)

# Log Bayes factor (positive favors Skew Normal)
log_bf = result.mlik['log marginal-likelihood (integration)'] - \
         result_gaussian.mlik['log marginal-likelihood (integration)']
print(f"Log Bayes Factor: {log_bf:.2f}")

Results and Diagnostics

The posterior summaries recover the true parameters: $\beta_0 = 2$, $\beta_1 = 0.8$, $\tau = 50$, and $\gamma = -0.4$. The diagnostic plots below compare observed values with fitted values and show the residual pattern.

Observed values versus fitted values with identity line — Observed values $y_i$ versus fitted values $\hat{\eta}_i$.

Residuals versus fitted values — Residuals $y_i - \hat{\eta}_i$ versus fitted values.

To reproduce these figures locally, download the render_sn_negative_plots.py script and run it alongside the CSV dataset.