Lecture 4
Lecture 4
Lecture 4
Contents
1 Introduction
2 General notions
2.1 Distributions . . . . . . . . . . . .
2.2 Densities . . . . . . . . . . . . . .
2.3 Quantiles . . . . . . . . . . . . . .
2.4 Some useful math results . . . . . .
2.5 Truncated distributions and moments
.
.
.
.
.
3
3
4
5
6
6
3 Lorenz curves
3.1 A partial moment function . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 A mathematical characterisation . . . . . . . . . . . . . . . . . . . . . . . . . .
8
8
8
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
12
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
21
26
29
30
32
32
35
10 Exercises
10.1 Empirics . . . .
10.2 Gini coefficient
10.3 LogNormal . .
10.4 Uniform . . . .
10.5 Singh-Maddala
10.6 Logistic . . . .
10.7 Weibull . . . .
38
38
38
38
38
39
39
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Hypergeometric series
41
A.1 The family of the generalised Beta II . . . . . . . . . . . . . . . . . . . . . . . . 42
A.2 The Burr system as a useful particular case . . . . . . . . . . . . . . . . . . . . . 43
A.3 Playing with the Burr system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1 Introduction
Some authors like Sen (1976) prefer to use a discrete representation of income, which is based
on the assumption that the population is finite. Atkinson (1970) and his followers prefer to
suppose that income is a continuous variable. It implies that the population is implicitly infinite,
but the sample can be finite. Discrete variables and finite population are at first easy notions to
understand while continuous variables and infinite population are more difficult to accept. But as
far as computations and derivations are concerned, continuous variables lead to integral calculus
which is an easy topic once we know some elementary theorems. Considering a continuous
random variable opens the way for considering special parametric densities such as the Pareto
2
or the lognormal which have played an important role in studying income distribution. Discrete
mathematics are quite complicated.
2 General notions
We are interesting in the income distribution. Income is supposed to be a continuous random
variable with distribution F (.).
2.1 Distributions
Definition 1 The distribution function F (x) gives the proportion of individuals of the population
having a standard of living below or equal to x.
F is a non-decreasing function of its argument x. We suppose that F (0) = 0 while F () = 1.
F (x) gives the percentage of individual with an income below x. We usually call p that proportion.
A natural estimator is obtained for F (.) by considering
n
1X
F (x) =
1I(xi x).
n 1=1
where 1I(.) is the indicator function. This estimator is easy to implement. The resulting graph of
the density might seem discontinuous for very small sample sizes, but get rapidly smoother as
soon as n > 30. So there is in general no need for non-parametric smoothing.
Let us now order the observations by increasing order from the smallest to the largest and
call x[j] the observation which has rank j. We can write the natural estimator of F as
F (x[j] ) = j/n.
It is common to call x[j] an order statistics. They will play an important role for estimation. We
can give a short example written in R. We draw two samples from a normal distribution of size
n = 100 and then n = 1000.
n = 100
xr = rnorm(n)
x = sort(xr)
y = seq(0,1,length=n)
plot(x,y,type="l",xlim=c(-3,3),ylab="Cummulative",xlab="X")
n = 1000
xr = rnorm(n)
x = sort(xr)
y = seq(0,1,length=n)
lines(x,y,col=2)
3
1.0
0.8
0.6
0.4
0.0
0.2
Cummulative
n=1000
n=100
Figure 1: Natural estimator of the cumulative distribution for a Gaussian random variable
The abscises of the cumulative are obtained by ordering the draws, while the ordinates are simply
an ordered index between 0 and 1. The curve in black corresponds to the sample of size 100. It
is rather rough. But the curve in red, corresponding to a sample of 1000 is perfectly smooth.
2.2 Densities
We shall suppose that F is continuously differentiable so that there exist a density defined by
f (x) = F 0 (x).
So, for a given x, the value of p such that X < x can be defined alternatively as
p=
f (t) dt = F (x).
Densities are much more complicated to estimate. There exist no natural estimator as for distributions, simply because F (.) is not differentiable. Some kind of non-parametric smoothing is
needed. Non-parametric density estimation will be detailed in Lecture 6, Modelling the Income
Distribution.
If f (x) is the density, then the probability that the random variable X belongs to the interval
[xk1 , xk ] is given by
p(xk1 < x < xk ) ' f (xk )xk
4
f (xk )xk .
k=2
m
X
f (xk )xk =
k=2
b
a
f (x)dx.
Of course this limit exists only if F (.) is sufficiently smooth, i.e. it has no jumps or kinks.
2.3 Quantiles
Once a distribution is given, it is always possible to compute its quantiles (this is not the case
for moments that exists only under specific conditions). Deciles are a convenient way of slicing
a distribution in intervals of equal probability, each interval being of probability 1/10. More
generally, a quantile is a function x = q(p) that gives the value of x such that F (x) = p.
Quantiles are implicitly defined by the relation
x = q(p) = F 1 (p).
q(p) is thus the living standard level below which we find a proportion p of the population. The
median of a population is the value of x such that half of the population is below x and half of the
population is above x = q(0.50). Using quantiles is also a way to normalize the characteristics
of a population between 0 and 1. This facilitates comparisons between two populations, ignoring
thus scale problems.
Quantiles are rather easy to estimate once we know the order statistics. Suppose that we have
an ordered sample of size n. The estimator of a quantile comes directly from the natural estimator
of the distribution. The p quantile is simply the observation that has rank [p n]. Quantiles are
directly estimated in R using the instruction
quantile(x,p),
where x is a vector containing the sample and p the level of the quantile.
Piketty (2000) in his book on the history of high incomes in France makes an extensive use
of quantiles to study the French income distribution and particularly its right tail concerning
high incomes. High incomes concern the last decile, which means q0.90 . That decile however
cover a variety of situations where wages, mixed incomes and capital incomes have a varying
importance. The interval q0.90 q0.95 concerns what he calls the middle class, formed mainly
by salaried executives. The interval q0.95 q0.99 is the upper middle class, formed mainly by
holder of intermediate incomes like layers, doctors. The really rich persons correspond to the
last centile q0.99 and over. It corresponds to holders of capital income.
u0v du +
uv 0 dv.
u v du = uv
uv 0 dv.
2. Compound derivatives.
f (u(x))/x = f 0 (u(x))u0(x).
3. Change of variable and densities. Let x f (x) and a transformation y = h(x) with
inverse x = g(y). Then the density of y is given by
(y) = |J(x y)|f (g(y)),
where J is the absolute value of the Jacobian of the transformation
J(x y) = |xi /yi |.
4. Change of variable and integrals. Consider the integral
Z
f (x) dx
and the change of variable x = h(u) with reciprocal u = h1 (x). Then the original integral
can be expressed as
Z
f (x) dx =
h1 (b)
h1 (a)
f (x)dx.
where n is the size of the total population. The average standard of living in the total population
is given by the total mean
Z
Z
xdF (x) =
xf (x)dx.
xf (x)dx,
X(a, b)
H(a, b)
xf (x)dx
a
=
.
Z b
Z b
n
f (x)dx
f (x)dx
a
xf (x)dx
When b tends to infinity and a to 0, we recover the average standard of living of the entire
population.
We now consider a threshold z and the population which is below that threshold, sometimes
the population over that threshold. Starting from the previous equation, X(a, b)/H(a, b), we
can compute the average standard of living of the first group, the one which is below z. This is
equivalent to the expectation of a truncated distribution
Z
1 =
z
0
xf (x)dx
.
F (z)
Using integration by parts with u = x and v 0 = f (x), we can rewrite the integral in the numerator
as:
Z z
Z z
xf (x)dx = [xF (x)]z0
F (x)dx
0
Noting that z =
Rz
0
= zF (z)
xf (x)dx
=
F (z)
F (x)dx.
F (x)
F (z)
dx.
Incidently, if we now let z tend to infinity, we arrive at an alternative expression for the the mean
=
[1 F (x)]dx.
Note also that another expression of the mean can be obtained as follows, using the quantiles.
Start from
Z
=
xf (x)dx.
0
xf (x)dx =
1
0
F 1 (p)dp =
q(p)dp.
3 Lorenz curves
The Lorenz curve is a graphical representation of the cumulative income distribution. It shows
for the bottom p1 % of households, what percentage p2 % of the total income they have. The
percentage of households is plotted on the xaxis, the percentage of income on the yaxis. It
was developed by Max O. Lorenz in 1905 for representing inequality in the wealth distribution.
As a matter of fact, if p1 = p2 , the Lorenz curve is a straight line which says for instance that
50% of the households have 50% of the total income. Thus the straight line represents perfect
equality. And any departure from this 45 line represents inequality.
f (t)dt
1 z
L(p) =
t f (t) dt.
0
So the Lorenz curve is an unscaled partial moment function. Unscaled, because it is not divided
by F (z).
A notation popularised by Gastwirth (1971) used the fact that z = F 1 (p) to write the Lorenz
curve in a direct way, using a change of variable:
Z
L(p) =
Alternatively, the relation =
R1
0
1Z
q(t) dt =
1Z
F 1 (t) dt.
L(p) = Z01
0
q(t) dt
.
q(t) dt
The numerator sums the incomes of the bottom p proportion of the population. The denominator
sums the incomes of all the population. L(p) thus indicates the cumulative percentage of total
income held by a cumulative proportion p of the population, when individuals are ordered in
increasing income values.
3.2 Properties
The Lorenz curve has several interesting properties.
8
1. It is entirely contained into a square, because p is defined over [0,1] and L(p) is at value
also in [0,1]. Both the xaxis and the yaxis are percentages.
2. The Lorenz curve is not defined is is either 0 or .
3. If the underlying variable is positive and has a density, the Lorenz curve is a continuous
function. It is always below the 45 line or equal to it.
4. L(p) is an increasing convex function of p. Its first derivative
dL(p)
q(p)
x
=
= with x = F 1 (p)
dp
is always positive as incomes are positive. And so is its second order derivative (convexity).
The Lorenz curve is convex in p, since as p increases, the new incomes that are being
added up are greater than those that have already been counted. (Mathematically, a curve
is convex when its second derivative is positive).
5. The Lorenz curve is invariant with positive scaling. X and cX have the same Lorenz curve.
6. The mean income in the population is found at that percentile at which the slope of L(p)
equals 1, that is, where q(p) = and thus at percentile F () (as shown on Figure 2). This
can be shown easily because the first derivative of the Lorenz curve is equal to x/.
7. The median as a percentage of the mean is given by the slope of the Lorenz curve at
p = 0.5. Since many distributions of incomes are skewed to the right, the mean often
exceeds the median and q(p = 0.5)/ will typically be less than one.
The convexity of the Lorenz curve is revealing of the density of incomes at various percentiles. The larger the density of income f (q(p)) at a quantile q(p), the less convex the Lorenz
curve at L(p). On Figure 2, the density is thus visibly larger for lower values of p since this is
where the slope of the L(p) changes less rapidly as p increases.
By observing the slope of the Lorenz curve at a particular value of p, we know the pquantile
relative to the mean, or, in other words, the income of an individual at rank p as a proportion of
the mean income. An example of this can be seen on Figure 2 for p = 0.5. The slope of L(p) at
that point is q(0.5)/, the ratio of the median to the mean. The slope of L(p) thus portrays the
whole distribution of mean-normalised incomes.
Theorem 1 Suppose L(p) is defined and continuous on [0,1] with second derivative L00 (p). The
function L(p) is a Lorenz curve if and only if L(0) = 0, L(1) = 1, L0 (0+) 0, L00 (p) 0 in
(0,1).
If a curve is a Lorenz curve, it determines the distribution of X up to a scale factor which is the
mean . How could we find it? Let us take the definition of the Lorenz curve
Z
1 p 1
LX (p) =
F (t) dt
X 0 X
and express it as:
Z x
L(F (x)) =
ydF (y)).
0
t0
which simplifies greatly the computation of some integrals when considering an infinite bound.
The next computations owe to the survey of Yitzhaki (1998) and to that of Xu (2003).
(p L(p)) dp = 1 2
L(p)dp,
which is nothing but the usual Gini coefficient. Xu (2003) gives a good account of the algebra of
the Gini index. We have given above an interpretation of the Gini index as a surface. The initial
definition we gave was in term of a mean of absolute differences in the previous chapter. There
are other formula too. All of these formula are equivalent. We have to prove this. A large survey
of the literature can also be found in the article Gini coefficient of Wikipedia.
L(p)dp
2 [pL(p)]10
Z 1
= 1 + 2
+2
pL0 (p) dp
We are then going to apply a change of variable p = F (y) and use the fact proved above that
L0 (p) = y/. We have
2
G=
2
yF (y)f (y)dy 1 =
11
Z
yF (y)f (y)dy
.
2
This formula opens the way to an interpretation of the Gini coefficient in term of covariance as
Cov(y, F (y)) = E(yF (y)) E(y)E(F (y)).
Using this definition, we have immediately that
G=
2
Cov(y, F (y)),
which means that the Gini coefficient is proportional to the covariance between a variable and its
rank. The covariance interpretation of the Gini coefficient open the way to numerical evaluation
using a regression.
R
Meanwhile, noting that Cov(y, F (y)) = y(F (y) 1/2)dF (y), using integration by parts,
we get
1Z
Cov(y, F (y)) =
F (x)[1 F (x)]dx,
2
so that we arrive at the integral form
1Z
F (x)[1 F (x)]dx.
G=
We can remark that F (x)(1 F (x)) is largest at F (x) = 0.5, which explains why the Gini index
is often said to be most sensitive to changes in incomes occurring around the median income.
The above integral form can also be written as
1
G=1
[1 F (x)]2 dx.
We shall prove this equivalence by considering the last interpretation of the Gini which is the
scaled mean of absolute differences.
4.3 S-Gini
We underlined that the Gini coefficient was very sensitive to changes in the middle of the income
distribution. A generalization of the Gini coefficient, obtained by adding a aversion for inequality
parameter as in the Atkinson index, was proposed in the literature by Donaldson and Weymark
(1980) and other papers following this contribution. Starting from
y
G = 2Cov( , 1 F (y)),
the S-Gini is found by introducing so as to modify the shape of the income distribution
y
G = Cov( , (1 F (y))1).
For = 2, of course, we recover the usual Gini index. With a value of greater than 2, a greater
weight is attached to low incomes.
We can run a small experiment, generating n = 1000 observations of a lognormal distribution
and then computing the Gini according to the above formula, with various values of . We then
compare the result to the Gini computed using the usual formula corresponding to = 2.
12
-Gini
Usual Gini
1.2 0.2077537
2.0 0.5288477
0.5277905
3.0 0.6692843
4.0 0.7362263
n = 10000
x = sort(rlnorm(n))
y = seq(0,1,length=n)
for (alpha in c(1.2,2,3,4)){
g = -alpha*cov(x/mean(x),(1-y)(alpha-1))
cat("Gini = ",g," alpha = ",alpha,"\n")}
Gini(x)
For = 1, the modified Gini is equal to zero. For = 2, this method based on the empirical
covariance is only approximate. In small samples, the difference can be substantial. For n = 100,
the covariance method gives G = 0.5413686, while the correct methods gives G = 0.5305954.
y d(1 F (y))2.
13
The last integral can be transformed using integration by parts with u = y and v = (1 F (y))2:
Z
i
F (y))2
0
1
IG =
=1
2
[1 F (y)]2dy.
[1 F (x)]2 dx,
X
n+1
2
(n + 1 i)x[i] .
n 1 n(n 1)
Note that this formula points out that there are n(n 1) distinct pairs.
Sen (1973) uses a slight simplification of this with
G=
n+1
2 X
2
(n + 1 i)x[i] .
n
n
The interpretation of the Gini coefficient in term of covariance between the variable and
its rank implies that a simple routine can be used
G=
2
Cov(y[i] , i).
n
For the covariance approach, we note that the mean of the ranks is given by
X
n+1
i = 1
i=
.
n
2
1X
1X
n+1
(i i)y[i] =
i y[i]
,
n
n
2
and the Gini coefficient is obtained as:
2 X
n+1
G= 2
i y[i]
.
n
n
Cov(i, y[i] ) =
14
2 n + 1
,
n
n
(1)
(2)
where x[i] is an order statistics and i its rank. An appropriate standard error for the Gini coefficient
is then
q
2 Var()
SE(IG ) =
.
(3)
n
This estimation is biased because the usual regression assumptions are not verified in the above
regression. For instance the residuals are dependent.
Davidson (2009) gives an alternative expression for the variance of the Gini which is not
based on a regression, but simply on the properties of the empirical estimate of F (x). If we note
IG the numerical evaluation of the sample Gini, we have:
V ar(IG ) =
1 X
2,
(Zi Z)
(n
)2
(4)
P
where Z = (1/n) ni=1 Zi is an estimate of E(Zi ) and
i
2i 1
2X
x[i]
x[j] .
Zi = (IG + 1)x[i] +
n
n j=1
This is however an asymptotic result which is general gives lower values than those obtained with
the regression method of Giles. Small sample results can be obtained if we adjust a parametric
density for y and use a Bayesian approach.
This index is called the Schutz coefficient in Duclos and Araar (2006), but is also known under
the name of the Pietra index. In a stricter mathematical framework and following Sarabia (2008),
the Pietra index is defined as the maximal deviation between the Lorenz curve and the egalitarian
line
PX = max {p LX (p)}.
0p1
If we assume that F is strictly increasing on its support, the function p LX (p) will be differentiable everywhere on (0, 1) and its maximum will be reached when its first derivative in
p
1 F 1 (x)/
is zero, that is, when x = F (). The value of p LX (p) at this point is given by
PX = F ()
F ()
0
[ F 1 (t)]dt =
1
2
|t |dF (t).
Consequently
E|X |
,
2
which is an alternative formula for the Pietra index.
PX =
Z
(x/)
dF (x)
1/(1)
, > 0,
where is the parameter that controls inequality aversion. The limiting case 1 is
1
IA (1) = 1 exp
Z
log(x)dF (x) .
c 6= 0, 1
log(/x) dF (x),
These two indices can be written in terms of the Lorenz Curve. We have for the Atkinson
index
Z 1
1/(1
1
0
IA () = 1
[LX (p)] dp
, > 0.
0
These formulas allow these indices to be obtained directly from the Lorenz curve without the
necessity of knowing the underlying cumulative distribution function.
x
xm
x > xm .
The use of the survival function comes from the intuitive characterisation of the Pareto. The
cumulative function is simply 1 F which implies
x
F (x) = P (X < x) = 1
xm
x > xm .
Moments are given in Table 2. We can already see that this density has a special shape. It is
Table 2: Moments of the Pareto distribution
parameters
value
domain
scale
xm
xm > 0
shape
>0
support
median
x [xm ; +)
xm 2
xm
mode
xm
mean
variance
x2m
( 1)2 ( 2)
>1
>2
always decreasing. So it is valuable only to model high or medium incomes. Its moments are
restricted to exist only for certain values of . This is the price to pay for its long tails. In Figure
3, we give the graph of the density for xm = 1 and various plausible values of . The Gini index
(see Table 4 for its expression) is very sensitive to the value of . Table 3 shows that the most
Table 3: Gini and Pietra indices for the Pareto
Pietra
18
3.0
2.5
2.0
1.5
alpha=2
alpha=1
0.0
0.5
1.0
alpha=3
1.0
1.5
2.0
2.5
3.0
Coefficient of variation
>2
L(p) = 1 (1 p)(1)/
Lorenz curve
>1
Pietra index
( 1)
Gini index
(2 1)1
>1
Atkinson
> 1/2
+1
1/(1)
>1
0.6
0.8
1.0
Generalised entropy
alpha=3.2
0.4
alpha=2.2
0.2
alpha=1.6
0.0
alpha=1.2
0.0
0.2
0.4
0.6
0.8
1.0
20
>1
Lp = function(p,alpha) {1-(1-p)((alpha-1)/alpha)}
p = seq(0,1,0.01)
plot(p,p,type="l")
lines(p,Lp(p,1.2),col=2)
lines(p,Lp(p,1.6),col=3)
lines(p,Lp(p,2.2),col=4)
lines(p,Lp(p,3.2),col=5)
text(0.8,0.15,"alpha=1.2",col=2)
text(0.8,0.35,"alpha=1.6",col=3)
text(0.8,0.48,"alpha=2.2",col=4)
text(0.8,0.58,"alpha=3.2",col=5)
y
log x
1
=
=
x
x
x
x 2
exp
(ln x )2
, x > 0.
2 2
The cumulative distribution function has no analytical form and requires an integral evaluation:
"
1
ln x
ln x
,
FX (x; , ) = erfc
=
2
2
where erfc is the complementary error function, and is the standard normal cdf. However,
these integrals are easy to evaluate on a computer and built-in functions are standard.
The moment are easily obtained as functions of and . If X is a lognormally distributed
variable, its expected value, variance, and standard deviation are
1
E[X] = e+ 2 ,
2
Var[X] = (e 1)e2+ ,
s.d[X] =
Var[X] = e+ 2
21
e2 1.
Equivalently, the parameters and can be obtained if the values of the mean and the variance
are known:
Var[X]
,
= ln(E[X]) 12 ln 1 +
2
E[X]
Var[X]
.
2 = ln 1 +
E[X]2
The mode is
Mode[X] = e .
The median is
Med[X] = e .
1.0
1.5
The above graph was made for = 0. The two densities have the same median, but of course
0.5
y1
sigma = 0.25
0.0
sigma = 1
text(1.8,1,"sigma = 0.25")
text(3,0.20,"sigma = 1")
The log-normal has some nice properties.
1. Suppose that all incomes are changed proportionally by a random multiplicative factor,
which is different for everybody and that follows a gaussian process. Then the distribution
of the population income will converge to a log-normal, if the process is active for a long
enough period.
2. The log normal fits well to many data sets
3. Lorenz curves associated to the log-normal are symmetric around a line which is given by
the points corresponding to the mean of x. This is a good visual test to see if the log-normal
fits well to a data set.
4. Inequality depends on a single parameter which uniquely determines the shape of the
Lorenz curves. The latter do not intersect. The Gini coefficient also depends uniquely on
this parameter.
5. Close form under certain transformations
We know that if X N(, 2 ), then Y = a + bX is also normal with Y N(a + b, b2 2 ).
Let us now consider a log-normal random variable Y (, 2) and the transformation Y =
aX b . Then Y (log(a) + b, b2 2 ). There is a nice application for this property. It has been
observed in many countries that the tax scheduled can be approximated by
t = x axb
The disposable income is given by
y = axb
So if the pre-tax income follows a log-normal, the disposable income will also follow a lognormal.
The right tail of the lognormal density behaves very differently from the Pareto tail, just
because the log normal has got all its moment when the Pareto in general has no finite moment
when is too small. However, for large values of , the two distributions might have quite
similar tails. This can be seen on a log-log graph. Let us take the log of the density
log f (x) = log x log
=
' (
log2 x
2 2
+(
(log x )2
2 2
1) log x log
1) log x log
2
2 2
2
2 2
for large
The left tail of the log density behaves like a straight line for a large range of x when is large
enough.
23
1.0
0.8
0.6
0.50
0.4
1.00
0.2
45 line
1.50
0.0
0.25
0.0
0.2
0.4
0.6
0.8
24
1.0
library(ineq)
p = seq(0,1,0.01)
plot(p,Lc.lognorm(p, parameter=0.25),type="l",col="brown")
lines(p,Lc.lognorm(p, parameter=0.5),col="red")
lines(p,Lc.lognorm(p, parameter=1.0),col="blue")
lines(p,Lc.lognorm(p, parameter=1.5),col="green")
lines(p,p)
text(0.42,0.5,"45 line")
text(0.8,0.68,"0.25")
text(0.8,0.58,"0.50")
text(0.8,0.40,"1.00")
text(0.8,0.20,"1.50")
We can give some more details on this distribution, concerning Gini coefficient and the Lorenz
curve. Let us call (x) the standard normal distribution with (x) = P rob(X < x). From
Cowell (1995), we have Table 5. The Pietra index was found in Moothathua (1989).
Table 5: Various coefficients for the Log-Normal distribution
q
exp( 2 ) 1
Coefficient of variation
(1 (p) )
Lorenz curve
2( 2 /2) 1
2(/ 2) 1
Pietra index
Gini index
1 exp(1/2 2)
Atkinson
exp((2 ) 2 /2) 1
Generalised entropy
Lognormal distributions are usually generated by multiplicative models. The first explanation
of this type was proposed by Gibrat (1930). We start with an initial value for income X0 . In the
next period, this income can grow or diminish according to a multiplicative and positive random
variable Ft
Xt = Ft Xt1 .
Taking the logs and using a recurrence, we have
log Xt = log(X0 ) +
log(Fk ).
By the central limit theorem, we get a log normal distribution. Note that the mechanism designed by Champernowne (1953) was very similar. We got a Pareto distribution only because a
minimum value was imposed.
25
1.0
0.8
0.6
0.4
0.2
0.0
0
xa2 1
.
[1 + a1 xa2 ]a3 +1
Let us plot this density for various values of the parameters. First of all, a1 is just a scale
parameter and we set it equal to 1. Then we use the following code in R:
x = seq(0,5,0.1)
f_SM = function(x,a_2,a3){
26
f = a_2*a_3*(x(a_2-1))/(1+x(a_2))(a_3+1)
}
a_2 = 1
a_3 = 1
plot(x,f_SM(x,a_2,a3),type="l",ylab="",xlab="")
a_2=2
lines(x,f_SM(x,a_2,a3),col=2)
a_3 = 2
lines(x,f_SM(x,a_2,a3),col=3)
The parameter of the Pareto distribution could easily be estimated using a linear regression
of log(1 F ) over log(x) where F is the natural estimator of the cumulative distribution. Here
a non linear regression can be applied which minimised
X
The uncentered moments of order h and the Gini coefficient are expressed in term of the Gamma
function and can be found in McDonald and Ranson (1979) and McDonald (1984):
E(X h ) = bh
All the moment do not exist in this distribution. For a moment of order h, we must have
a3 >
h
.
a2
1Z
0
ba3 Z
= Iz (1 + 1/a2 , a3 1/a2 )
where z = 1 (1 a3 )1/a3 and Iz (a, b) denotes the incomplete beta function ratio defined by
Z
ta1 (1 t)b1 dt
ta1 (1 t)b1 dt
27
1.0
0.6
0.4
LCsingh(p, a, 0.7)
0.8
1.0
0.8
0.6
0.4
a2=3.5
a2=2.5
a3=0.7
a2=2.0
0.0
0.2
a3=0.9
0.0
0.2
LCsingh(p, a, 0.7)
a3=2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
The Singh-Maddala distribution admit two limiting distributions, depending on the value of a3 .
For a3 = 1, we have the Fisk (1961) distribution. For a3 , we have the Weibull distribution,
to be detailed later on. So, depending on the value of a3 , the associated Lorenz curves are
supposed to cover a wide range of shapes. In the left panel, we kept a2 = 2 and let a3 vary
between 0.7 and 2. In the right panel, we kept a3 = 0.7 and let a2 vary between 2 and 3.5. The
two black curves are identical. In one case the modification is more in the right part and in the
other case more in the left part. However, we note that the flexibility is not very strong.
The corresponding code using R is:
LCsingh <- function(p,a,q){
pbeta((1 - (1 - p)(1/q)), (1 + 1/a), (q-1/a))}
p = seq(0,1,0.01)
a = 2
plot(p,LCsingh(p, a,0.7),type="l")
lines(p,LCsingh(p, a,0.9),type="l",col="red")
lines(p,LCsingh(p, a,2),type="l",col="blue")
lines(p,p)
text(0.8,0.24,"a3=0.7")
text(0.8,0.34,"a3=0.9",col="red")
text(0.8,0.48,"a3=2",col="blue")
28
1.2
0.6
0.8
1.0
alpha=0.5
0.2
0.4
alpha=2.0
0.0
alpha=1.2
The Weibull distribution is a nice two parameter distribution where all moments exists. It is
obtained as a special case of the three parameter Singh Maddala distribution, for a3 . This
relation explains that the cumulative distribution has an analytical form:
F (x) = 1 exp((kx) ).
By differentiation, we get the density
f (x) = k (k x)1 exp (k x) .
We have a plot of this density in Figure 9. For < 1, the density has the shape of the Pareto
density, which means that it has no finite maximum. For = 1, it cuts the y axis. As grows,
there is less and less inequality and the function concentrates around its mean. Plausible values
for corresponding to usual income distributions are [1.5 2.5].
The h th moments around zero are given by
h =
(1 + h/)
kh
ua exp(u) du
29
The coefficient of variation (the ratio between the standard deviation and the mean) is equal to:
cv =
(( + 2)/) ( + 1)/)2
(( + 1)/)
As we have the direct expression of the distribution, the Gini coefficient and the Lorenz curves
are directly available. We find the expression of the Lorenz curve and the Gini index for instance
in Krause (2014):
( log(1 p), 1 + 1/)
LC = 1
,
(1 + 1/)
where (x, ) is the incomplete Gamma function.
We regroup in Table 6 some of these results.
Table 6: Several indices for
the Weibull distribution
((+2)/)(+1)/)2
Coefficient of variation
((+1)/)
1
Lorenz curve
( log(1p),1+1/)
(1+1/)
Pietra index
1 21/
Gini index
Atkinson
Generalised entropy
Note that there are various ways of writing the density of the Weibull, concerning the scale
parameter k. Either (kx) or (x/k) . For inference, it might even be convenient to consider kx .
So be careful. In R, the density is available as dweibull(x, shape, scale = 1) using
the parameterization (x/k) .
The Weibull distribution shares with the Pareto, the Sing-Maddala distribution a common
feature which is to have an analytical cumulative distribution. If we rearrange its expression and
take logs, we get
log( log(1 F )) = log(kx).
So that it is easy to check if a sample has a Weibull distribution. And by the way gives a method
to estimate the parameter .
xk1 e
f (x; k, ) = k
(k)
30
1.0
DF = 1
0.8
DF = 2
DF = 3
DF = 4
0.6
0.4
0.0
0.2
Density
DF = 5
10
f (u; k, ) du =
k, x
(k)
The skewness is equal to 2/ k, it depends only on the shape parameter k and approaches
a normal distribution when k is large (approximately when k > 10). The mean is k and the
variance k2 .
Rather easy to estimate. Bayesian inference. In R, dgamma, pgamma, qgamma, rgamma
using the same parameterisation.
32
ity. This is because such transfers do not affect the value of L(p) for all p up to pp and for all p
greater than pr , but they increase L(p) for all p between pp and pr .
Let us consider two income distributions A and B, where distribution B is obtained by applying Pigou-Dalton transfers to A. Hence, the Lorenz curve LB (p) of distribution B will be
everywhere above the Lorenz curve LA (p) of distribution A. Inequality indices which obey the
principle of transfers will unambiguously indicate more inequality in A than in B. We will also
say that if
LB (p) LA (p) 0
p
then B Lorenz dominates A.
Lorenz curve
1.0
x_A
x_B
x_C
0.8
L(p)
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Gini
xA
1.76
0.55
xB
1.76
0.36
xC
1.70
0.39
We finally draw n values of xC from a lognormal with = 1 and having the same theoretical
mean as xA or xB . In Table 7, we compute the mean and the Gini coefficient of each distribution.
We illustrate these numbers in Figure 11 where we have drawn the Lorenz curve of xA in black.
It is the farthest away from the diagonal. Inequality is rather large in this income distribution.
Pigou-Dalton transfers do not change the mean, order the ordering, but reduce greatly the Gini
coefficient. The Lorenz curve corresponding to xB is in red. It does not intersect LA even if the
distribution of xB cannot be a lognormal. However, it intersects the Lorenz curve of xC which
corresponds to a similar Gini coefficient and results from totaly different transfers.
p [0, 1].
If B L A, then B exhibits less inequality than A in the Lorenz sense. Note that the Lorenz
order is a partial order and is invariant with respect to scale transformation.
It is fairly possible now to characterize Lorenz dominance by restrictions over the parameter
space if the two random variables have the same class of distributions. For some parametric
families the restrictions will be very simple, and by the way will imply rather simple parametric
statistical tests. We present first results for the Pareto and the log-normal
Pareto: Let Xi P (i , xmi ). Then
FX1 L FX2 1 2
34
Note that GLC(0) = 0 and GLC(1) = . A distribution with a dominating GLC provides
greater welfare according to all concave increasing social welfare functions defined on individual
incomes (Kakwani 1984 and Davies et al. 1998). On the other hand, the GLC is no longer scalefree and in consequence it determines any distribution with finite mean.
The order induced by GLC is the second-order stochastic dominance that we shall study in a
next chapter. This order is a new partial ordering, and sometimes it allows a bigger percentage
of curves to be ordered than in the Lorenz ordering case.
The usual Lorenz curve when one focusses his attention on inequality only. The Generalized Lorenz curves mixes concerns for inequality and for the mean, so it is related to welfare
comparisons.
L(p) = 1 (1 p)11/
Lognormal
L(p) = (1 (p) )
1
2 1
2(/ 2) 1
1
(q)(2q 1/a)
(q 1/a)(2q)
Sarabia (2008), but not all the details. The first parametric form which was given in the literature
is
L(p) = p exp((1 p))
If = 1, we have the asymmetric Lorenz curve of the Pareto. If = 1/(1 1/), we obtain a
symmetric Lorenz curve, thus having a similar property to that of the Lognormal. The underlying
density to this Lorenz curve combines properties of the Pareto and of the Lognormal. More
general expressions are given in Sarabia (2008).
Let us explore these functional forms using R.
LCgen <- function(p,alpha,beta){
smlc <- (1-(1-p)(1-1/alpha))beta
smlc}
p = seq(0,1,0.01)
plot(p,LCgen(p, 1.5,1),type="l")
text(0.93,0.45,"1.5, 1.0")
lines(p,LCgen(p,3,1.5),type="l",col="red")
text(0.7,0.50,"3.0, 1.5")
lines(p,LCgen(p,4,2),type="l",col="blue")
text(0.5,0.10,"4.0, 2.0")
lines(p,p)
text(0.42,0.5,"45 line")
It is remarkable that play playing with two parameters, we can obtain very different shapes and
in particular many points of intersection in a much simpler way than with the Singh Maddala
distribution. The Gini coefficient has a simple expression and is equal to
G=1
2
B(1/(1 1/), + 1)
1 1/
36
1.0
0.8
0.6
0.4
1.5, 1.0
0.2
LCgen(p, 1.5, 1)
3.0, 1.5
45 line
0.0
4.0, 2.0
0.0
0.2
0.4
0.6
0.8
1.0
37
10 Exercises
10.1 Empirics
Using the previous FES data set, the software R and the package ineq, compare the empirical
Lorenz curve to those obtained for the Pareto and Log-normal. Say which distribution would fit
the best. Redo the same exercise limiting the data to high incomes.
As Cov(y, F (y)) =
1
2
F (x)[1 F (x)]dx,
and give the corresponding form of the Gini. Give the value of F for which the Gini is maximum.
What can you deduce of this result as a property of the Gini index?
10.3 LogNormal
Compute the value of the Generalised Entropy index for = 0 and = 1. Comment your result.
Does it hold in the general case of a general distribution. Do the same calculation for the Pareto
density.
10.4 Uniform
The uniform density between 0 and xm is sometimes used in theoretical economic paper to
describe the income distribution. It writes:
f (x) =
1
1I(x xm )
xm
L(p)dp
xm
[1 F (t)]2 dt
10.5 Singh-Maddala
Find an example where two Lorenz curves associated to the Singh-Maddala distribution intersect.
Use the graphs produced by R for this. Mind that the parametrisation adopted in R for the
function Lc.singh is awkward. Use the function provided in the text.
10.6 Logistic
The logistic density is very close to the normal density, but it has nicer properties, such as in
particular an analytical cumulative distribution. We have
f (x) =
e(x)/s
s(1 + e(x)/s )2
1
F (x) =
e(x)/s
1+
with mean and variance s /3. Find the log logistic distribution using the adequate transformation. Find the Gini coefficient. This is the Fisk distribution.
2 2
10.7 Weibull
Show that when a3 in the Singh-Maddala distribution, we get the Weibull.
References
Arnold, B. C. (2008). Pareto and generalized pareto distributions. In Chotikapanich, D., editor,
Modeling Income Distribuions and Lorenz Curves, volume 5 of Economic Studies in Equality,
Social Exclusion and Well-Being, chapter 7, pages 119145. Springer, New-York.
Atkinson, A. (1970). The measurement of inequality. Journal of Economic Theory, 2:244263.
Champernowne, D. (1953). A model of income distribution. Economic Journal, 63:318351.
39
Chotikapanich, D. (2008). Modeling Income Distribuions and Lorenz Curves, volume 5 of Economic Studies in Equality, Social Exclusion and Well-Being. Springer, New-York.
Cowell, F. (1995). Measuring Inequality. LSE Handbooks on Economics Series. Prentice Hall,
London.
Davidson, R. (2009). Reliable inference for the gini index. Journal of Econometrics, 150:3040.
Davies, J. B., Green, D. A., and Paarsch, H. J. (1998). Economic statistics and social welfare
comparisons: A review. In Ullah, A. and Giles, D. E. A., editors, Handbook of Applied Economic Statistics, volume 155 of Statistics: Textbooks and Monographs, pages 138. Dekker,
New York, Basel and Hong-Kong.
Deaton, A. (1997). The Analysis of Household Surveys. The John Hopkins University Press,
Baltimore and London.
Donaldson, D. and Weymark, J. (1980). A single-parameter generalization of the gini indices of
inequality. Journal of Economic Theory, 22(1):6786.
Duclos, J.-Y. and Araar, A. (2006). Poverty and Equity: Measurement, Policy and Estimation
with DAD. Springer, Newy-York.
Fisk, P. (1961). The graduation of income distributions. Econometrica, 29:171185.
Gibrat, R. (1930). Une loi des reparations e conomiques: leffet proportionnel. Bulletin de
Statistique General, France, 19:469.
Giles, D. E. A. (2004). Calculating a standard error for the gini coefficient: Some further results.
Oxford Bulletin of Economics and Statistics, 66(3):425433.
Kakwani, N. (1984). Welfare ranking of income distributions. In Basmann, R. and Rhodes, G.,
editors, Advances in Econometrics, volume 3, pages 191213. JAI Press.
Krause, M. (2014). Parametric Lorenz curves and the modality of the income density function.
Review of Income and Wealth, 60(4):905929.
Lubrano, M. and Protopopescu, C. (2004). Density inference for ranking european research
systems in the field of economics. Journal of Econometrics, 123(2):345369.
McDonald, J. (1984). Some generalised functions for the size distribution of income. Econometrica, 52(3):647663.
McDonald, J. B. and Ranson, M. R. (1979). Functional forms, estimation techniques and the
distribution of income. Econometrica, 47(6):15131525.
Mitzenmacher, M. (2004). A brief history of generative models for power law and lognormal
distributions. Internet Mathematics, 1(2):226251.
40
A Hypergeometric series
On a vu que les moments partiels ou certaines distributions cumulees sexprimaient en fonction
de series hypergeometriques dont levaluation et la manipulation sont difficiles. Voir le papier
de McDonald (1984). Or il se trouve quil existe une relation entre certaines de ces series hypergeometriques et les fonctions Gamma et Beta incompl`etes.
La fonction Gamma incompl`ete est donnee par
G(x; p) =
1
(p)
tp1 et dt
On trouve cette formule dans les textes habituels comme Johnson et Kotz. Mais dans
http : //mathworld.wolf ram.com/IncompleteGammaF unction.html
41
t1 et dt = 1 x ex 1 F1 (1; 1 + ; x)
= 1 x 1 F1 (1; 1 + ; x)
On remarque donc quil suffit de diviser des deux cotes par () pour retrouver les formulations
usuelles.
Passons maintenant a` la Beta incompl`ete
IB(x, p, q) =
1
Beta(p, q)
up1(1 u)q1 du
On a que
IB(x, p, q) =
x
xp (1 x)q1
)
2 F1 (1 q, 1; p + 1;
p Beta(p, q)
1x
On en deduit donc que la cumulee de la Beta2 generalisee et ses moments partiels peuvent
sexprimer au moyen de la Beta incompl`ete qui comme la Gamma incompl`ete sont programmees
dans R.
A.1
The most general distribution which has been proposed in the literature for fitting income data is
the generalised Beta II (GB2) introduced in McDonald (1984) and developed in McDonald and
Xu (1995). Its density is:
fGB2 (x|a, b, p, q) =
axap1
bap B(p, q)[1 + (x/b)a ]p+q
(5)
tp1
(1 + t)p+q
dt,
p, q > 0
The generalised Beta II has the advantage that many densities can be obtained as a particular
case and thus it constitutes a nice framework for discussion. In particular, we have the analytical
expression of its uncentred moments
E(X h ) = 0h = bh
B(p, q)
(p + h/a)(q h/a)
= bh
(p)(q)
42
(6)
FGB2 (x) =
2 F1
pB(p, q)
p, 1 q
p+1
(x/b)a
1+(x/b)a
A.2
Many of the interesting particular cases we can consider when starting from the GB2 are related
to the system of distributions introduced by Burr (1942). Burr built a system of distributions F (x)
by reference to a differential equation dF/dx = A(F )g(x) which generalises Pearsons system.
The object was to fit a distribution function to the data (rather than a density as in Pearsons
system) and to obtain the density by differentiation. Burr reviews twelve different distributions.
Two are of a particular interest to us. They are now named Burr III and Burr XII by reference to
the number they occupy in the main Table of Burrs original paper. Kleiber (1996) pointed out
that Burr III was nothing but Dagum (1977) distribution. It corresponds to the GB2 with q = 1.
Burr XII is nothing but the distribution promoted some thirty years after by Singh and Maddala
(1976) for fitting income data. It received a considerable interest both in the statistical literature
and in the economic literature. It corresponds to imposing p = 1 in the GB2. Many distributions
can be found as restrictions of the Burr XII.
A.3
The Burr XII, also named the Singh-Maddala density function is obtained as a first particular
case, imposing p = 1 in the GB2.
fSM (x|a, b, q) = fGB2 (x|a, b, p = 1, q)
= ba qaxa1 [1 + (x/b)a ](q+1)
Its cumulative distribution has an analytical form:
FSM (x) = 1
and its moments are
E(X h ) = bh
1
(1 + (x/b)a )q
(1 + h/a)(q h/a)
(q)
(q)(2q 1/a)
(q 1/a)(2q)
43
This distribution has been widely studied in the statistical literature, see e.g. Rodriguez (1977)
for an interesting discussion.
The Pareto density function, is obtained as a restriction of the GB2 with a = b = p = 1 and
q = (or more simply starting directly from the Burr XII)
fP 2 (y|) = fGB2 (y|a = 1, b = 1, p = 1, q = )
= [1 + y](+1)
The obtained distribution is the Pareto of the second kind (or Lomax distribution). The usual
Pareto is obtained after a change in variable x/ = 1 + y (with Jacobian 1/) so that
f (x) = x(+1) 1I(x > )
(7)
This is the notation of Arnold and Press (1983) (see also Johnson, Kotz and Balakrishnan 1994).
The cumulative distribution function is
F (x) = 1 (/x) .
(8)
if h <
h
They are obtained as a particular case of the moments of GB2 using the change of variable
x/ = 1 + y and a recursion over h. Or they can be computed directly from the expression of
the density (which is a simpler solution).
The Weibull distribution is obtained from the Burr XII for q , see Rodriguez (1977) for
a derivation. Its density is
a
fW (x|a, b) = (x/b)a1 exp((x/b)a )
b
and the cdf is
FW (x) = 1 exp((x/b)a )
The uncentered moments are
E(X h ) = bh (1 + h/a)
Rodriguez (1977) notes that Burr type XII can be obtained as a smooth mixture of a two parameter Weibull distribution FW (x) = 1 exp(xa ) compounded with respect to , where has a
standard Gamma distribution with parameter q. Note finally that the Weibull is connected to the
exponential distribution p(y) = ey because it can be obtained from it by the change of variable
Y = (X/b)a .
44
The Burr III is the Dagum (1977) distribution and can be derived from the GB2 imposing
the restriction q = 1 and rearanging the terms
fD (x|a, b, p) = fGB2 (x|a, b, p, q = 1) = apba x(a+1) [1 + (b/x)a ](p+1)
with cumulative distribution
FD (x) = [1 + (b/x)a ]p
The uncentered moments are
E(X h ) = bh
(p + h/a)(1 h/a)
(p)
Kleiber (1996) showed that if X SM(a, b, q), then 1/X D(a, 1/b, p).
axap1
ap (p)
exp[(x/)a ]
Recalling that the Weibull distribution was obtained considering the same limit for the Burr XII,
the generalised gamma is thus a generalisation of the Weibull which is obtained for p = 1. The
cumulative distribution function is
F (x) = G[(x/)a ; p]
where G is the incomplete Gamma function
G(x; p) =
1
(p)
tp1 et dt
The uncentered moments are best derived using the properties of the gamma distribution,
E[X h ] = h
(p + h/a)
(p)
One common and alternative way to define the generalised gamma density is to start from a random variable Z
with a standard gamma distribution and make the change of variable X = bZ 1/a with Jacobian (a/b)(x/b)a1 .
45