Pareto distribution

The Pareto distribution models phenomena where a small fraction of observations accounts for most of the total effect. It is the mathematical foundation of the 80/20 rule and describes wealth, city sizes, earthquake magnitudes, and internet traffic, in contexts where extreme events are far more common than the normal distribution would predict.

Definition

A random variable \(X\) follows a Pareto distribution with scale parameter \(x_m > 0\) (minimum possible value) and shape parameter \(\alpha > 0\) (tail index), written \(X \sim \text{Pareto}(x_m, \alpha)\), if:

\[f(x) = \frac{\alpha\, x_m^\alpha}{x^{\alpha+1}}, \quad x \geq x_m\]

The CDF has a clean closed form:

\[F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha, \quad x \geq x_m\]

The survival function \(P(X > x) = (x_m/x)^\alpha\) decays as a power law: halving \(\alpha\) squares the probability of exceeding any threshold. This is what makes the Pareto “heavy-tailed”: the tail decays much more slowly than the exponential \(e^{-\lambda x}\).

PDF and CDF of the Pareto distribution for different shape parameters

Properties

For \(X \sim \text{Pareto}(x_m, \alpha)\):

  1. Expected Value (Mean)

\[E(X) = \frac{\alpha\, x_m}{\alpha - 1}, \quad \text{only defined for } \alpha > 1\]

For \(\alpha \leq 1\), the mean is infinite.

  1. Variance

\[\text{Var}(X) = \frac{\alpha\, x_m^2}{(\alpha-1)^2(\alpha-2)}, \quad \text{only defined for } \alpha > 2\]

For \(1 < \alpha \leq 2\), the mean exists but the variance is infinite.

  1. Skewness

\[\text{Skewness} = \frac{2(1+\alpha)}{\alpha - 3}\sqrt{\frac{\alpha-2}{\alpha}}, \quad \text{for } \alpha > 3\]

Undefined for \(\alpha \leq 3\). Always positive (right-skewed).

  1. Kurtosis

\[g_2 = \frac{6(\alpha^3 + \alpha^2 - 6\alpha - 2)}{\alpha(\alpha-3)(\alpha-4)}, \quad \text{for } \alpha > 4\]

  1. Median

\[\text{Median} = x_m \cdot 2^{1/\alpha}\]

  1. Mode

\[\text{Mode} = x_m\]

The distribution is always decreasing: the most likely value is the minimum \(x_m\).

  1. Quantile Function

\[Q(p) = \frac{x_m}{(1-p)^{1/\alpha}}\]

⚠️ Mean and variance may not exist

The Pareto distribution is unusual in that its moments can fail to exist:

  • \(\alpha \leq 1\): the mean is infinite. Averages computed from samples will grow without bound as \(n\) increases.
  • \(1 < \alpha \leq 2\): the mean exists but the variance is infinite. The sample variance will keep growing with sample size.
  • \(\alpha > 2\): both mean and variance exist.

In practice, \(\alpha\) for wealth distributions is often estimated around 1.5, meaning wealth has a finite mean but infinite variance. This is why “the average wealth” can be a misleading statistic.

The 80/20 rule

The Pareto principle states that roughly 80% of effects come from 20% of causes. This is not a law of nature but a consequence of the Pareto distribution when \(\alpha = \log(5)/\log(4) \approx 1.161\).

More generally, the fraction of the population \(p\) that accounts for fraction \(F\) of the total (for a Pareto-distributed quantity with \(\alpha > 1\)) is:

\[F = 1 - \left(\frac{x_m}{Q(1-p)}\right)^{\alpha-1} = 1 - (1-p)^{(\alpha-1)/\alpha}\]

The 80/20 rule in numbers

For \(\alpha = 1.161\) (the exact Pareto exponent for the 80/20 rule):

  • The top 20% of earners hold 80% of total income.
  • The top 4% hold 64% of total income (\(0.2^2 = 0.04\), \(0.8^2 = 0.64\)).
  • The top 1% hold about 55% of total income.

These numbers are not coincidences: they follow directly from the power-law structure of the Pareto distribution.

Example icon

Step-by-step example

Insurance claims above a minimum threshold of 10,000 USD follow a Pareto distribution with \(x_m = 10{,}000\) and \(\alpha = 2.5\).

Expected claim size:

\[E(X) = \frac{2.5 \times 10{,}000}{2.5 - 1} = \frac{25{,}000}{1.5} \approx 16{,}667 \text{ USD}\]

Probability of a claim exceeding 50,000 USD:

\[P(X > 50{,}000) = \left(\frac{10{,}000}{50{,}000}\right)^{2.5} = (0.2)^{2.5} \approx 0.0179\]

About 1.8% of claims exceed 50,000 USD.

90th percentile (claims that only 10% of policies exceed):

\[Q(0.90) = \frac{10{,}000}{(1-0.90)^{1/2.5}} = \frac{10{,}000}{0.1^{0.4}} \approx \frac{10{,}000}{0.251} \approx 39{,}800 \text{ USD}\]

Variance:

\[\text{Var}(X) = \frac{2.5 \times 10{,}000^2}{(1.5)^2 \times (0.5)} = \frac{250{,}000{,}000}{1.125} \approx 222{,}222{,}222 \text{ USD}^2\]

Standard deviation \(\approx 14{,}907\) USD, which is almost as large as the mean.

⚠️ Power-law tails vs exponential tails

The key difference between the Pareto and distributions like the exponential or normal is how fast the tail decays:

  • Exponential tail: \(P(X > x) = e^{-\lambda x}\). Decays extremely fast. Doubling \(x\) squares the tail probability only roughly.
  • Power-law tail: \(P(X > x) = (x_m/x)^\alpha\). Decays slowly. An earthquake 10 times larger is only \(10^\alpha\) times rarer, not exponentially rarer.

This is why 100-year floods, billion-dollar insurance losses, and mega-viral social media posts exist: exponential-tailed models would predict them as essentially impossible, but power-law models assign them non-negligible probability.

💡 Relationship with other distributions

  • Exponential: the Pareto is the continuous analogue of the geometric in the sense of memorylessness, but with a power-law rather than exponential tail.
  • Lomax (Pareto Type II): \(Y = X - x_m\) where \(X \sim \text{Pareto}(x_m, \alpha)\) gives a Lomax distribution starting at 0.
  • Log-uniform: if \(\ln(X)\) is uniform on \((\ln x_m, \infty)\), \(X\) is Pareto.
  • Generalized Pareto: used in extreme value theory to model exceedances above a threshold, generalizing the standard Pareto.