Skew Normal: Negative Skewness

A tutorial on modeling left-skewed data using the Skew Normal distribution with negative skewness parameter.

Introduction

While positive skewness is common in many applications (e.g., income distributions, reaction times), negative (left) skewness also occurs frequently. Examples include:

The Skew Normal distribution with negative $\gamma$ produces a distribution with a longer left tail, meaning extreme low values are more likely than extreme high values.

The Model

The model specification is identical to the positive skewness case:

$$ z = (y_i - \eta_i)\sqrt{w_i \tau} \sim f_{\text{SN}}(z; \gamma) $$

The key difference is that we expect $\gamma < 0$ for left-skewed data:

The magnitude of $|\gamma|$ determines the degree of asymmetry, with $|\gamma| < 0.988$ being the valid range.

Dataset

The dataset contains $n = 300$ observations simulated with negative skewness:

ColumnDescriptionType
yContinuous response (left-skewed)float
xCovariatefloat

Implementation in pyINLA

The implementation is the same as for positive skewness - INLA will automatically estimate whether the skewness is positive or negative:

import pandas as pd
from pyinla import pyinla

# Load data
df = pd.read_csv('dataset_sn_negative.csv')

# Define model: y ~ 1 + x
model = {
    'response': 'y',
    'fixed': ['1', 'x']
}

# Fit with default priors
control = {
    'compute': {
        'dic': True,
        'cpo': True,
        'mlik': True
    }
}

result = pyinla(
    model=model,
    family='sn',
    data=df,
    control=control
)

# View results
print(result.summary_fixed)
print(result.summary_hyperpar)

Expected Output

Fixed Effects

ParameterMeanSD2.5%97.5%
(Intercept)1.9950.0081.9782.011
x0.8160.0080.7990.832

Hyperparameters

ParameterMeanSD2.5%97.5%
Precision48.343.9640.9656.52
Skewness-0.0370.067-0.1810.078

The true parameters were $\beta_0 = 2$, $\beta_1 = 0.8$, $\tau = 50$, and $\gamma = -0.4$. Note that the posterior for skewness is centered near zero but includes negative values in its credible interval.

Interpreting Negative Skewness

When the skewness parameter is negative:

The 95% credible interval for skewness (-0.181, 0.078) includes zero, suggesting the data may be consistent with both symmetric and slightly left-skewed distributions. This uncertainty is properly quantified in the Bayesian framework.

Data Generation (Reference)

The dataset was simulated with negative skewness:

$$ \beta_0 = 2, \quad \beta_1 = 0.8, \quad \tau = 50, \quad \gamma = -0.4 $$
import numpy as np
import pandas as pd
from scipy.stats import skewnorm

def inla_sn_reparam(mean, variance, skewness):
    """Convert (mean, variance, skewness) to (xi, omega, alpha)."""
    skew_max = 0.99
    skewness = np.clip(skewness, -skew_max, skew_max)

    delta = np.sign(skewness) * np.sqrt(
        (np.pi / 2.0) * (np.abs(skewness) ** (2.0 / 3.0)) /
        (((4.0 - np.pi) / 2.0) ** (2.0 / 3.0) + np.abs(skewness) ** (2.0 / 3.0))
    )
    alpha = delta / np.sqrt(1.0 - delta ** 2)
    omega = np.sqrt(variance / (1.0 - 2.0 * delta ** 2 / np.pi))
    xi = mean - omega * delta * np.sqrt(2.0 / np.pi)
    return {'xi': xi, 'omega': omega, 'alpha': alpha}

np.random.seed(247)
n = 300
x = np.random.normal(loc=0, scale=1, size=n)
eta = 2 + 0.8 * x

skewness_neg = -0.4
prec_neg = 50
variance_neg = 1.0 / prec_neg

y_neg = np.array([
    skewnorm.rvs(
        inla_sn_reparam(eta[i], variance_neg, skewness_neg)['alpha'],
        loc=inla_sn_reparam(eta[i], variance_neg, skewness_neg)['xi'],
        scale=inla_sn_reparam(eta[i], variance_neg, skewness_neg)['omega']
    ) for i in range(n)
])

df = pd.DataFrame({'y': y_neg, 'x': np.round(x, 6)})
# df.to_csv('dataset_sn_negative.csv', index=False)

Comparing with Gaussian

To assess whether the skew normal provides a better fit than the standard Gaussian, compare marginal likelihoods:

# Fit Gaussian model
result_gaussian = pyinla(
    model=model,
    family='gaussian',
    data=df,
    control={'compute': {'mlik': True}}
)

# Compare marginal likelihoods
print("Skew Normal mlik:", result.mlik)
print("Gaussian mlik:", result_gaussian.mlik)

# Log Bayes factor (positive favors Skew Normal)
log_bf = result.mlik['log marginal-likelihood (integration)'] - \
         result_gaussian.mlik['log marginal-likelihood (integration)']
print(f"Log Bayes Factor: {log_bf:.2f}")

Results and Diagnostics

The posterior summaries recover the true parameters: $\beta_0 = 2$, $\beta_1 = 0.8$, $\tau = 50$, and $\gamma = -0.4$. The diagnostic plots below compare observed values with fitted values and show the residual pattern.

Observed values versus fitted values with identity line
Observed values $y_i$ versus fitted values $\hat{\eta}_i$.
Residuals versus fitted values
Residuals $y_i - \hat{\eta}_i$ versus fitted values.

To reproduce these figures locally, download the render_sn_negative_plots.py script and run it alongside the CSV dataset.