Testing For Convergence Clubs in Income Per Capita: A Predictive Density Approach
Testing For Convergence Clubs in Income Per Capita: A Predictive Density Approach
Testing For Convergence Clubs in Income Per Capita: A Predictive Density Approach
BY FABIO CANOVA1
The article proposes a technique, based on the predictive density of the data,
conditional on the parameters of the model, to jointly tests for groups of unknown
size in a panel and to estimate the parameters of each group. The procedure is ap-
plied to the problem of identifying convergence clubs in scaled income per capita
data. The steady-state distribution of European regional data clusters around four
poles of attraction with different economic features. The distribution of income
per capita of OECD countries has two poles of attraction and each group clearly
identifiable economic characteristics.
1. INTRODUCTION
Recent theories of growth and development have suggested that the distribution
of income per capita of countries and/or regions may display convergence clubs,
i.e., a tendency for the steady-states distribution to cluster around a small number
of poles of attraction (see e.g., Ben David, 1994; Quah, 1996a; Galor, 1996). This
tendency may be induced by several factors: the existence of some threshold level
in the endowment of strategic factors of production; nonconvexities or increasing
returns; similarities in preferences and technologies; and government policies,
which become more similar over time within certain groups (e.g., EU or East
Asian countries). Although there is anecdotal evidence supporting the view that
clustering is an important feature of world income, to the best of my knowledge,
only Durlauf and Johnson (1995), Paap and Van Djik (1998), and Desdoigts (1998)
have attempted to formally document whether this tendency exists in the data.
This article proposes a new technique to formally examine whether the distri-
bution of income per capita displays convergence clubs. The approach is general,
determines the number of groups and the location of the break points when the
49
50 CANOVA
appropriate ordering of the units in the cross section is unknown, and, at the same
time, allows one to estimate the parameters of each group in a unified manner.
The approach is based on the predictive density (marginal likelihood) of the data
and has appealing features for both Bayesian and classical analysts.
The technique can be viewed as a natural extension of the standard approach
used to determine the number of heterogeneous groups in a cross section (see
e.g., the Goldfeld and Quandt test) when the number of groups, the location of
the breaks, and the ordering of units are unknown. However, instead of assuming
that the regression coefficients are the same for all units belonging to one group, I
allow for a further layer of heterogeneity within groups. This second layer of het-
erogeneity takes the form of a prior that restricts the coefficients of the units in a
group to have the same distribution, but allows the distribution of the coefficients
of units in different groups to differ. Such a restriction implies that the distribution
of steady states may display multiple basins of attraction. Since a similar restric-
tion on the coefficients of the entire cross section implies that the distribution of
steady states only has one attractor, testing for convergence clubs is equivalent to
checking which of these two assumptions is more appropriate.
Once the optimal ordering, the number of groups, and the location of the break
points in the cross section have been established, I provide a simple way to es-
timate the parameters of each group and to conduct inference. The approach I
employ lies within the Empirical Bayes (EB) tradition: I use predictive densities
to estimate the parameters and posterior analysis to draw conclusions about func-
tions of the coefficients of the model. Posterior inference is appealing because it
gives us a compact way to summarize both subjective and objective uncertainty
about economically interesting functions of the coefficients of the model (con-
vergence rates, long run multipliers, steady-states distribution, etc.). EB methods
are simpler than standard hierarchical Bayesian approaches, since they do not re-
quire numerical integration, and advantageous when there is interest in estimating
aspects of prior distribution, which is precisely the case considered here.
The methodological contribution of this article is linked to a number of articles,
both in the classical and the Bayesian tradition, testing for the existence of a un-
known break point in time series (see e.g., Ploberger et al., 1989; Bai, 1997; Polasek
and Ren, 1997) and to the EB tradition of constructing posterior estimates of the
coefficients of a model by plugging in estimates of the parameters of the prior
(see Morris, 1983; Berger, 1985; Efron, 1996). The idea behind the grouping ap-
proach is related to Forni and Reichlin (1997), who attempt to estimate a reduced
number of common latent factors from large dynamic cross-sectional data, and
to Hansen (1999, 2000), who examines inferential problems in threshold models
for cross sections, time series, or static panels. However, while the latter author
characterizes the asymptotic distribution of the threshold parameter and studies
hypotheses testing, given an ordering of the cross section when there is only one
threshold, our interest is in designing a technique to find multiple thresholds when
the correct ordering is unknown. Contrary to Durlauf and Johnson (1995), we al-
low for heterogeneity within groups (so that the steady-state distribution of a club
need not be degenerate) at the costs of imposing restrictions on the time series
properties of the data. Our approach is more formal than Quah’s (1996a, 1996b),
CONVERGENCE CLUBS IN INCOME PER CAPITA 51
but it requires more stringent assumptions on the structure of the dynamic model
than his. Finally, the grouping procedure we employ has roots in the Bayesian
literature on mixture densities (see Titterington et al., 1985 or Paap and Van Djik,
1998) and shares similarities with classification/cluster analyses (see e.g., Mardia
et al., 1980). Four features distinguish the proposed approach from existing ones:
the use of serially correlated data, the possibility that groups have different co-
variance matrices, the lack of knowledge about number of break points, and the
criteria used to assign units to groups (predictive ability vs. within group variance).
I employ European regional income per capita data from the NUTS2 data set
of Eurostat and OECD national income per capita from the Summer and Heston
data set to determine whether the income distribution shows any tendency toward
club convergence. Recent theories of economic growth have suggested a number
of indicators that may determine the club a unit will join: for example, the initial
conditions of income per capita and of the average human capital, the dispersion
of the distribution of income and education within units, and the geographical
location of a unit may be crucial to determine the pole of attraction around which
it will gravitate in the long run. Human capital and policy variables are not available
at the regional level. Therefore a search for clubs in these data is conducted
using initial conditions and geographical and threshold externalities measures
as grouping devices. At the country level, indicators for access to technologies,
government policies, human capital, geography, and threshold externalities are
available and all of them are used to search for clubs.
I find that the ordering based on the ranking of scaled income per capita in the
presample period is the one that maximizes the predictive power of the model
for both data sets. With this ordering, there is a natural clustering of units in
four groups of regional income per capita and two groups of national income per
capita. No further break is detected when other variables are used to reorder units
within these groups. In both cases clubs are characterized by different parameters
controlling the speed of adjustment to the steady state and the mean level of per
capita steady-state income relative to the average. More precisely, poor units con-
verge faster to their steady state than rich ones and they tend to cluster around a
pole of attraction that is substantially below the average (see also Quah, 1996b).
The dispersion of steady states around each basin of attraction is significant, sug-
gesting that clustering is more prevalent than convergence even within groups. I
show that even though groups have different long-run mobility indices, there is
substantial immobility in the ranking of units within groups, confirming the strong
persistence in inequality found by Canova and Marcet (1995). As a consequence
of the persistence of the initial income characteristics and of the immobility in
ranking, the steady-state distribution of income per capita will become polarized.
Since poor units are also those featuring low initial average human capital, dis-
tributions of income and education are more polarized and are geographically
located in the “South” or in the periphery of the industrialized world, the results
provide a bleak picture over the possibility of equalizing income per capita both
in EU and in OECD countries in the near future.
The rest of the article is organized as follows. The next section describes the
details of the testing approach. Section 3 provides a technique to estimate the
52 CANOVA
The starting point of the analysis is the a priori belief that there may be significant
heterogeneities in the cross section of a panel and a natural clustering of units
around certain poles of attraction, in the sense that the coefficients of the statistical
model are more similar within each group than across groups. For example, if units
i and j belong to a group, the vector of coefficients of the model for the two units
may have the same mean and the same dispersion. However, if units i and j do
not belong to the same group, the vector of coefficients of the two units may have
different means and different dispersions.
For the sake of generality, I assume that the ordering of cross-sectional units,
which naturally gives rise to clustering, is unknown. In practice, clustering in in-
come per capita may be linked to geographical, economic, or sociopolitical factors
and modern growth theory provides a restricted set of ordering, which is worth
examining. Let N be the size of the cross section, T the size of the time series,
and m = 1, 2, . . . N! the particular ordering of the units of the cross section. It is
assumed that there may be q = 1, 2, . . . Q break points in the cross section, Q being
given. Each of the resulting q + 1 groups is characterized by a statistical model of
the form
(3) βi = β + i i = 1, . . . N
where i ∼ N(0, ). In other words, in the alternative β and are the same for
all i, so that there is an exchangeable structure for all units of the cross section.
The limiting case of this alternative is a pooled model, which can be obtained by
setting i = 0, ∀i.
Within this general setup, I study two issues. First, I am interested in providing a
framework for verifying the hypothesis that there are heterogeneities in the cross
section in a situation where the number of groups, the location of the breaks, and
the permutation, which naturally give rise to the clustering, are unknown. Once
I have established the “submodel” of interest, i.e., the number of groups, the
location of the breaks, and the ordering of the cross section, I will be concerned,
at a second stage, with the problem of estimating the hyperparameters and σu2i for
each i, which are unknown to the investigator and needed to construct posterior
estimates of important functions of the β i .
Let Y be a (N ∗ T ∗ s) × 1 vector of the LHS variables in (1) ordered to have the
N cross sections for each t = 1, . . . T, s times, X be a (N ∗ T ∗ s) × (N ∗ k) matrix
of the regressors, k = s ∗ r + v ∗ d + 1, β be a (N ∗ k) × 1 vector of coefficients,
U be a (N ∗ T ∗ s) × 1 vector of disturbances, β 0 be a (q + 1) ∗ k × 1 vector of
means of β, A be a (N ∗ k) × (q + 1) ∗ k matrix, A = diag{Ap }, where Ap has the
form ι ⊗ Ik where I k is a k × k identity matrix and ι is a np (m) × 1 vector of ones.
Given an ordering m, the number of groups q, and the location of the break point
hp (m) we can rewrite (1)–(2) as
(4) Y = Xβ + U U ∼ (0, u )
(5) β = Aβ0 + E E ∼ (0, E )
assumption that there are q break points. These predictive densities can be easily
obtained from (5) once distributional assumptions for the error term are made.
Define the following quantities:
r L+ (Y | Hq , m) ≡ sup q L(Y | Hq , i, m),
r L† (Y | Hq ) ≡ sup i∈I +
j∈J L (Y | Hq , j),
r LAq (Y | Hq , m) ≡ q π p(m)L(Y | Hq , i, m),
i∈I i
p
where π i (m) is the prior probability that, for group p of ordering m, there is a
break at location i. The first expression gives the maximized value of the predictive
density with respect to the location of break points for each q and m; the second,
the maximized value of the predictive density, for each q, once the location of the
break point and the ordering of the data are chosen optimally. The last expression
CONVERGENCE CLUBS IN INCOME PER CAPITA 55
gives the average predictive density under the assumption that there are q breaks:
The average is calculated over all possible locations of the break points, using the
prior probability that there is a break point in each location as weight. In general,
p
ignorance about the location of the break points leads us to assume that π i (m) is
uniform over each p, m.
To examine the hypothesis that the dynamics of the cross section are group-
based we will use a posterior odds (PO) ratio.2 I consider first the null that there
are no break points against the alternative that there are at most Q breaks and
then, if the alternative is more likely, sequentially verify a series of hypotheses
where the null is that there are q − 1 break points and the alternative that there
are q break points, q = 1, . . . Q. Given m, the statistics to verify the first hypothesis
are
π0 L(Y | H0 )
(7) PO(m) = Q
q=1 πq LAq (Y | Hq , m)P1 (N)
where π 0 is the prior probability that there are no breaks and π q is the prior
probability that there are q breaks and P1 (N) a penalty function that accounts for
the fact that a model with Q breaks is more densely parametrized than a model
with no breaks. H0 is preferred to H1 when PO(m) 1. The statistics for the
hypotheses that there are q − 1 versus q breaks in the cross section are
πq−1 L+ (Y | Hq−1 , m)
(9) PO(m, q − 1∗ ) = p
πq πi L+ (Y | Hq , m)P3 (N)
When π q = π q−1 = 0.5; P3 (N) = 1, (9) is the PIC criterion of Phillips and
Ploberger (1994).
To put the testing problem in an alternative perspective, one can ask what is
the prior probability on the null hypothesis one must entertain so that his/her
beliefs will not be overturned by the data. For example, it may be of interest to
know how much confidence one should have on the hypothesis that the sample is
distributionally homogeneous so that an overall exchangeable prior is sufficient
to characterize the data. This prior probability, which I call π̂ , can be found for
any of the hypotheses considered by setting PO in (7)–(9) equal to 1 and solving
for π̂0 , π̂q−1 , π̂q−1 , respectively.
2 As an alternative one could use a Wilks likelihood ratio (WL) criterion (see e.g., Efron, 1996) or
The testing procedure I have described leaves the value of Q unspecified. Fol-
lowing Hartigan’s (1975) rule of thumb, I set Q
(N/2).
To find the location of the break point, given that there are q breaks, I assign
units to groups so as to provide the highest total predictive density, i.e., I compute
L+ (Y | H q , m). Since there are m possible permutations of the cross section over
which to search for clustering, I take the optimal permutation rule of units in the
cross section to be the one that achieves L† (Y | H q ).
Bai (1997) shows that proceeding sequentially in testing for breaks, i.e., test
first for one break against no breaks; then conditional on the results of the first
test, test for the existence of one break in each of the two subsamples and so
on, produces consistent estimates of the number and the location of the breaks.
However, when there are multiple groups and one tests for the presence of two
groups only, the estimated break point is consistent for any of the existing break
points and its location depends on which of the breaks is “stronger.” If this is the
case Bai suggests refining of the estimate of the break points. That is, if two breaks
are identified at i 1 and i 2 , it is convenient to reestimate i 1 over [1, i 2 ] and i 2 over
[i 1 , N]. Each refined estimator of the location of the break has then the same
properties as the estimator obtained in the case the sample has a single point.
The major stumbling block to the application of the procedure I have described
is the dimensionality of maximization problem. When no information is available
on the ordering of the units in the cross section and N is moderately large, the
maximization problem may constitute a formidable task. However, this is not a
binding constraint for many applications since economic theory guides the search
for orderings and this considerably reduces the computational complexity of the
problem. Note also that even in the case economic theory is silent and one en-
gages in an unstructured search, the maximization problem requires a considerably
smaller number of evaluation than N!, since many orderings are equivalent from
the point of view of the predictive density. That is, once a particular grouping is
found, searching for groups can be shrewdly conducted by reassigning units across
groups around this local maxima3 .
Suppose the initial ordering is 1234 and two groups are found: 1 and 234. Then all permutations of 234
with unit 1 coming ahead, i.e., 1243, 1342, etc., give the same predictive density (see Appendix A for
a confirmation of this result in a Monte Carlo context). Similarly, permutations that leave unit 1 last
need not to be examined, i.e., 2341, 2431, etc. This reduces the number of orderings to be examined
to 13. By trying another ordering, say 4213, and finding, for example, two groups: 42 and 13, we can
further eliminate all the orderings that consist of permutations of the elements of each group, i.e.,
4132, 2341, etc.. It is easy to verify that once four carefully selected orderings have been tried and, say,
two groups found in each trial, we have exhausted all possible combinations, as far as the predictive
density is concerned. The example is rigged so that at each stage we find two groups. When this is not
the case, the number of orderings to be examined is larger, but it does not exceed hN where h is the
maximum number of breaks found with any of the permutation.
CONVERGENCE CLUBS IN INCOME PER CAPITA 57
n p
(m)
1 j
(10) β̂ p = β
n p (m) j=1 ols
n p
n p
1
(m)
j j 1
(m)
(11) ˆ p = p βols − β̂ p βols − β̂ p − p (x j x j )−1 σ̂ j2
n (m) − 1 j=1 n (m) j=1
1 j
(12) σ̂ j2 = y y j − yj x j βols
T−k j
n(m) p
1
(14) β̂ = p
p
β∗
n (m) j=1 j
n
p
1
(m)
∗ ∗
p
(15) ˆ p = R+ β j − β̂ β j − β̂
p
n p (m) − k − 1 j=1
58 CANOVA
1
(16) σ̂ j2 = y j − x j β ∗j y j − x j β ∗j
T+2
−1
1 1
β ∗j = x x j + ˆ −1 x x β + ˆ −1
j
p A0 β̂
p
(17) 2 j j ols
σ̂ j2 j p
σ̂ j
Modern growth theory has suggested many mechanisms that may lead to conver-
gence clubs. Galor (1996) provides a thoughtful and compact summary of the ma-
jor implications of various theoretical models, stressing which economic indicators
CONVERGENCE CLUBS IN INCOME PER CAPITA 59
This section attempts to shed light on two issues. First, I would like to examine
whether income per capita data is consistent with the multiple steady-state version
of modern growth theory.4 Second, I would like to better understand the statistical
properties of income per capita data. In particular, I am interested in examining
the kind of heterogeneities the data displays: whether the average adjustment
properties to the steady state and the average steady state are group dependent;
and whether different groups display difference persistence of inequalities, in the
sense that the relative ranking in the initial distribution is more important in
determining the relative ranking in the steady-state distribution for some groups
than for others. I study these issues using two different data sets: European regional
income per capita from the Eurostat database and OECD national income per
capita from the Summer and Heston database.
5.1. European Regional Income Per Capita. The data set used covers 144
European NUTS2 units5 and refers to the period 1980–1992. Given that N =
144 I allow, at most, six groups (i.e., Q = 5). Income per capita is scaled by the
European average to reduce both serial correlation and the effect of outliers, and
logarithms are taken. AIC and BIC selection procedures indicate that an AR(1)
with unit-specific parameters captures sufficiently well the dynamics of the scaled
data and makes the residuals well behaved. Hence, r = 1 and no Wt−1 variables are
used.
For regional data there are few usable indicators to order units according to the
suggestions of recent growth theories. For example, no measures of the average
regional human capital (or its distribution) at the beginning of the sample are
available, nor do I have regional measures of dispersions of income per capita.
Furthermore, neither EU nor national expenditure (either for regional income
support or for regional infrastructure and capital formation) are available on a
consistent basis for all countries. Since the sample covers the 1980s and the regions
4 What I examine here is a somewhat strong version of the convergence club hypothesis. A weaker
version would predict the existence of convergence clubs in the distribution of growth rates of income
per capita (see, e.g., Boldrin and Canova, 2001).
5 Roughly speaking, the NUTS2 classification corresponds to regions. NUTS1 refers to larger ter-
ritorial units (the “North,” the “Centre,” and the “South”) whereas NUTS3 provides data at the
provincial level.
CONVERGENCE CLUBS IN INCOME PER CAPITA 61
belong to the EU, I conjecture that differences along these dimensions are unlikely
to provide information for grouping units into convergence clubs.6
Given these limitations, I search for clubs ordering the cross section according
to: (i) the magnitude of per capita income relative to the European average in 1979,
with poor regions coming first; (ii) the magnitude of per capita income relative to
the national average in 1979, with poor regions coming first; (iii) the magnitude
of locally scaled income per capita in 1979 (Mediterranean regions and Ireland
are scaled by their average and other regions by their average), with poor regions
coming first; (iv) the magnitude of the average share of per capita income relative
to the European average in the sample, with poor regions coming first; and (v)
the magnitude of the average growth rate of per capita income in the sample, with
regions growing slower coming first.
The first ordering attempts to capture the effect that initial conditions may have
on the steady-state distribution of income per capita; the next two orderings try
to verify whether geographical externalities (either at the country or at the south–
north level) may be important to determine the basin of attraction of a unit; the last
two classifications attempt to study the importance of threshold externalities, here
proxied by the size of the share of income per capita in Europe or its growth rate. If
geographical externalities are important, any tendency toward convergence clubs
that may appear with (i) should be weakened or disappear with (ii) or (iii). Note
also that, because of immobility in the initial ranking, the results obtained for
orderings (i), (ii), and (iii) are insensitive to the choice of 1979 or earlier years as
presample date.
Ordering units according to the initial distribution of income per capita
maximizes the predictive density of the data. Given this ordering, I identify
three breaks, corresponding to units 15, 23, and 120, and, consequently, four
groups in the data. Within the first group there are 10 regions of Greece, 4 of
Portugal, and 1 of Spain; in the second group there are 4 regions of Greece, 3
of Spain, and 1 of Italy; finally, the last group includes regions from 9 different
countries but the majority are German (9) and Northern Italian (5). The exact
composition of each group is given in Appendix B. The fourth and fifth order-
ings produce four and three groups, whose composition is very similar to these
groups. Hence, the splitting produced is highly suggestive of the fact that Euro-
pean regions cluster into homogeneous groups along the poor–rich, south–north
dimensions.
Figure 1 provides graphical evidence of the existence of groups. Using the ini-
tial conditions of income per capita as the ordering device, I plot the log of the
predictive density as a function of the location of the break point, together with
the log-predictive density obtained assuming no breaks (the dotted line). The
first panel refers to the full sample, and the next two to the subsamples obtained
6 As an informal check of this conjecture, I separately examined the case of regions in Italy and
Spain, for which either educational data or government expenditure for infrastructures are available.
I find that Italian regional differences in average human capital and in the distribution of human
capital are small and typically unrelated to the time path of income per capita in the sample. Simi-
larly, differences in goverment expnditure for infrastructures in Spain are unimportant as a grouping
device.
62 CANOVA
4950
4940
Log predictive density
4930
4920
4910
4900
4890
4880
4870
4860
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140
677.5
675.0
672.5
670.0
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4320
Log predictive density
4310
4300
4290
4280
4270
4260
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140
FIGURE 1
separating units according to the first optimal splitting. To interpret the graphs
note that the horizontal entries give the location of the break and the vertical
entries the value of the log predictive density. For example, entry 23 on the hori-
zontal axis in the first panel indicates that assigning units 1–23 to the first group
and units 24–144 in the second gives a value for the log of L+ of 4943 (as compared
to log L+ = 4863 when no breaks are allowed). Similarly, the second panel indi-
cates that splitting group 1 in two subgroups (1–15 and 16–23) could be beneficial
(this produces a log L+ = 679 as compared to log L+ = 670 when no breaks are
allowed). Finally, splitting group 2 in two subgroups (24–120, 121–144) gives log
L+ = 4325 (as compared to log L+ = 4272 when no breaks are allowed).
CONVERGENCE CLUBS IN INCOME PER CAPITA 63
Next, I examine whether these differences are significant using the poste-
rior odds ratio. In each case, I use a penalty function of the form P(N) =
exp{−0.5 ln(N)} that resembles the one employed by the Schwartz approximation
to the PO ratio, and assign equal prior probability to the null and the alternative.
The results overwhelmingly suggest the presence of (at least) three breaks: These
corresponding to units 23 and 120 have PO ratios in excess of 100, whereas the
posterior probability for the break at unit 15 has a PO ratio of 91.38. In general,
the fit of a model with three breaks is substantially better than the one without
breaks: The log-predictive density is of an order of magnitude larger and a pos-
terior odds ratio decisively favors the hypothesis of heterogeneities. Hence, we
need very strong a priori expectation on the null for the data not to overturn our
convictions (prior odds needs to be about 100 to 1). Also, these expectations do
not substantially change as the number of break points we are testing for increases.
Economic differences among the groups also appear to be relevant. I present
estimates of β p for the whole sample and for each of the four selected groups in
Table 1. It is clear that the four groups can be identified by both the value of the
intercept and of the slope of the model. For example, the first group displays very
low average persistence in relative income per capita (low ρ p ) and below average
mean intercept (low and negative α p ). At the opposite end, the last group fea-
tures higher average persistence and above average mean intercept (high ρ p and
positive α p ). Interestingly, the central group, which contains the largest number
of units, has a mean value for the persistence parameter that is higher than that
of the last group.
Dispersion measures are significant in three of the four groups, stressing the
need to control for residual within-group heterogeneity, but vary substantially
across groups. For example, differences in the persistence parameter are small in
the second group (0.04) but large in the last one (0.64). In three of the four groups
the dispersion is substantially smaller than the dispersion obtained by (weakly)
pooling together all units with an exchangeable prior, suggesting a reduction of the
residual heterogeneity once groups are identified. The last group, which includes
TABLE 1
ESTIMATES OF THE HYPERPARAMETERS AND OF STEADY STATES
NOTES: The columns labeled “Hyperparameter estimates” report estimates of the hyperparameters
obtained maximizing the predictive density of the data, viewed as function of the hyperparameters.
T+1
The steady state for each region is computed as limT→∞ αi ∗ (1−ρi)
1−ρi + αiT yi0 where α i and ρ i are
posterior estimates. The columns named “Mean SS” and “Dispersion SS” report the mean and the
standard deviation of steady states.
64 CANOVA
7 Changing the threshold from the mean to the median do not change the qualitative features of
the results.
CONVERGENCE CLUBS IN INCOME PER CAPITA 65
0.85 0.4
1.5 1.5
0.80 0.3
Adjustment Rates
Adjustment Rates
1.0 1.0
Adjustment Rates
Adjustment Rates
0.75 0.2
0.5 0.5
0.70 0.1
0.0 0.0
0.65 0.0
-0.5 -0.5
0.60 -0.1
0.62
1.5 1.5 0.6
0.60
Adjustment Rates
Adjustment Rates
Adjustment Rates
0.56
0.5 0.5 0.4
0.54
0.50
-0.5 -0.5 0.2
0.48
FIGURE 2
such a tendency is much stronger for units, which starts above the mean, whereas
poor regions tend to stay uniformly poor; busts are more probable than mira-
cles. The four groups display different mobility characteristics. In the first group
there is a strong tendency to stay in the low-income class and in the second group
there is complete immobility. The third group mirrors, with minor differences, the
tendencies of the whole sample, but 67 percent of the units starting above the av-
erage end up below it in the steady state. The fourth group also shows a tendency
to slump: About 50 percent of those who started above average are expected to
be below average in the steady states (curiously, most are French and German
regions!).
Few general economic conclusions can be drawn from the analysis. Among
the indicators suggested by theory, the distribution of income per capita at the
beginning of the sample seems to be the one with the highest information content.
66 CANOVA
TABLE 2
MOBILITY INDICES
NOTES: The M statistic is given by M = 1 − P11 − P22 . P11 is the probability that the unit
starts below average and ends up below average in the steady state, P22 is the probability
that the unit starts above and ends up above average in the steady state, P12 and P12 are
the probabilities that the unit transits from a state to the other. In the case, the group is
unbalanced, so that all units in the group are initially in one income class, the statistics
M is computed as M = 0.5 − Pii where Pii is the diagonal value different from zero.
5535
Log predictive density
5530
5525
5520
5515
5510
5505
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135
3844
3842
3840
3838
3836
3834
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
1699.2
1698.0
1696.8
1695.6
1694.4
1693.2
95 100 105 110 115 120 125 130 135
FIGURE 3
Canova, 2001) that the convergence process in Europe stopped at the beginning
of the 1980s. Furthermore, the majority of the 73 regions examined by these au-
thors belong to the identified third convergence club. Therefore, it is perhaps not
surprising that the two studies reach different conclusions. Second, and probably,
more important, Barro and Sala-i-Martin do not use the information contained in
the panel. Instead, they take averages of growth rates and run a cross-sectional re-
gression on the initial conditions. Such an approach disregards those unit-specific
heterogeneities that this and other articles found to be very important even in a
group of regions with relatively similar institutional setups.
68 CANOVA
5.2. OECD National Income Per Capita. For this data set N = 21, time runs
from 1951 to 1985 and at most three groups are allowed (i.e., Q = 2). Following
Canova and Marcet (1995) a AR(1) with country-specific parameters is chosen for
the log of income per capita scaled by the OECD average. Contrary to the case of
regional data, useful information to order units is available at the country level.
Hence, I search for clubs ordering units according to (i) the magnitude of the per
capita GDP relative to the OECD average in 1950, with poor units coming first; (ii)
the magnitude of the average human capital in 1950, measured as in Barro and Lee
(1994), ordering units increasingly in their average endowment of human capital;
(iii) the magnitude of the government expenditure share in 1950 or on average in
the sample period; (iv) the dispersion of income distribution in 1950 (Gini index
from the Luxemburg Income Study), with units displaying high dispersions coming
first; (v) the dispersion of the distribution of human capital in 1950, (measured as
the sum of the percentage of the population with primary and university education
using Barro and Lee data), with units displaying high dispersions coming first; (vi) a
center–periphery classification of the world economy (G-3 first and then the rest);
(vii) a geographical criterion with European nations first and rest of the world
afterward and Mediterranean countries preceding northern European countries
in the order; and (viii) the average openness of the economies (measured as the
ratio of import plus exports over GDP).8
When one break is allowed the maximized value of the log L+ for the seven
classifications is 2436, 2423, 2411, 2433, 2423, 2420, 2415, and 2430, respectively,
suggesting that the predictive power of the model is maximized when units are
ordered according to the initial conditions of income per capita. Therefore, con-
sistent with Durlauf and Johnson (1995), the procedure prefers initial output over
literacy rates as the most useful ordering device. Note, however, that differences
in L+ for three classifications are relatively small since the ordering of units is very
similar in these cases. That is, countries that have low initial income conditions
also have low average initial human capital, a distribution of income with high
dispersion, and are geographically located in the “South” of the developed world
and are less open to trade than average. The maximized value of log L+ obtained
using government expenditure shares as an ordering device is the lowest of all,
and insignificantly different from the one with no breaks (2408), indicating that
policy variables may have no role in shaping club convergence, at least for OECD
countries. Attempts to refine membership in these two initial groups failed. For
example, the predictive density obtained by reordering units within groups using
literacy rates or government variables is indistinguishable from the one obtained
in the baseline case.
Given this ordering, the posterior odds ratio establishes the presence of one
break in the cross section, with a value of 0.979, given equal prior probabilities on
the null and the alternative. In Figure 4, I plot log L+ as a function of the location
8 Substitution of the size of the population holding 50 percent of national wealth for Gini indices
and the sum of the inverse of the percentage of the population with primary and the inverse of the
percentage of population with secondary education for the percentage of the population holding
primary and university degrees does not change the results. The ordering obtained with these new
indices is practically identical to the ordering I use.
CONVERGENCE CLUBS IN INCOME PER CAPITA 69
2440
2435
2430
Log predictive density
2425
2420
2415
2410
2405
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
FIGURE 4
of the break point for the ordering based on initial income per capita together
with the predictive density obtained in the case of no breaks (dotted line). The
first group contains the five poorest units (Turkey, Portugal, Greece, Spain, and
Ireland) and the second group the rest.
Estimates of the hyperparameters for the two groups are β 1 = [−0.162, 0.824]
and β 2 = [0.0004, 0.958], suggesting a much faster a priori average rate of con-
vergence in the first group. The dispersion of estimates is small but nonnegligible
(in particular, the dispersion of estimates of the AR parameter is 0.02 in the first
group and 0.05 in the second group) indicating, once again, that clustering is more
prevalent than convergence even after optimally splitting the sample.
The posterior characteristics of the two clubs differ. For example, the average
posterior estimate of the steady state in the first group is −0.7647 and in the second
group is 0.0498. This difference is statistically and economically large: It implies
that there will be a permanent discrepancy in the average per capita income of
units in the two groups of about 60 percent. The dispersion of estimated steady
states around these poles of attraction is smaller than the one obtained when all
units are (weakly) pooled together. However, differences of about 15–20 percent in
steady-state income per capita in each group are still possible. Finally, the mobility
70 CANOVA
characteristics of the two groups are similar: Apart from a few exceptions, the
ranking of units in the income distribution changes very little over time: The
initially poor will still be the poorest in the steady state. What is interesting about
this last observation is the fact that there is no evidence that the economic boom
that took place in Ireland in the late 1990s and allowed the country to move up in
the OECD income distribution ladder was forthcoming.
In sum, in agreement with what Quah (1996b) and Durlauf and Johnson (1995)
have detected for a larger sample, I find that clustering along the poor–rich dimen-
sion is prevalent. Countries that were initially poor are also those having below
average initial human capital, large income and educational inequalities, and are
located in the “South” of the developed world. These characteristics are very per-
sistent and produce a polarization in the steady-state distribution of income. The
policy implications of results are striking: Unless some major changes occur, the
initially poor will tend to cluster around a basin of attraction that is substantially
below the OECD average and policy can do little to improve the situation of
backward countries.
6. CONCLUSIONS
APPENDIX
1140
1135
DGP With One Break
1130
1125
1120
1115
1110
1105
1100
1095
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130
1160
1155
DGP With Two Breaks
1150
1145
1140
1135
1130
1125
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130
1380.0
1377.5
DGP With No Breaks
1375.0
1372.5
1370.0
1367.5
1365.0
1362.5
1360.0
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130
FIGURE A.1
odds ratio for the hypothesis that there are three groups in the generated data
is 0.8 × 108 .
Next, I conducted three experiments: First, I randomized the order of the units
within the two groups before estimation is undertaken. This did not change any of
the results, confirming that, absent any information on the appropriate ordering
of the data, the number of actual permutations to be tried is substantially less
than N!. Second, I reshuffled the entire cross section, taking the first 20 units of
the time series and putting them last. In this case the ordered data displays three
groups with breaks at i = 30 and i = 124. Estimating an AR(1) model on the data,
the posterior odds ratio finds two breaks, and the predictive density is maximized
CONVERGENCE CLUBS IN INCOME PER CAPITA 73
1144
1136
1128
Value Predictive density
1120
1112
1104
1096
1088
1080
1072
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120
Permutation number
125
100
Location of the break
75
50
25
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120
Permutation number
FIGURE A.2
though average estimates of the hyperparameters of the third group are more
biased, probably because of the small number of units in this group. Finally, when
the cross section is homogeneous, average estimates (across replications) are still
biased but by a smaller amount (average β = [0.3816, 0.7372]) whereas variances
of the estimated coefficients are similar to those obtained in the baseline case.
Overall, the results indicate that the testing procedure has a reasonable size and
power properties against the particular alternative I consider. It also appears to be
able to identify multiple groups and the location of the breaks with sufficient pre-
cision, even when the data are not correctly ordered. However, since the posterior
odds ratio appears to be slightly biased when there are no heterogeneities and no
CONVERGENCE CLUBS IN INCOME PER CAPITA 75
B. This appendix lists the NUTS2 regions belonging to the four groups we
have found:
REFERENCES
PLOBERGER, W., W. KRAMER, AND K. KONTRUS, “A New Test for the Structural Stability in
the Linear Regression Model,” Journal of Econometrics 40 (1989), 307–18.
QUAH, D., “Convergence Empirics across Economies with Some Capital Mobility,” LSE
Center for Economic Performance 257 (1996a).
——, “Regional Convergence Clusters across Europe,” CEPR working paper 1286 (1996b).
TITTERINGTON, J., R. MAKOV, AND J. SMITH, Statistical Analysis of Finite Mixture Distributions
(New York: Wiley, 1985).