Ethnicity and Conflict: An Empirical Study: American Economic Review 2012, 102 (4) : 1310-1342
Ethnicity and Conflict: An Empirical Study: American Economic Review 2012, 102 (4) : 1310-1342
Ethnicity and Conflict: An Empirical Study: American Economic Review 2012, 102 (4) : 1310-1342
http://dx.doi.org/10.1257/aer.102.4.1310
This paper examines the link between measures of ethnic distribution and social
conflict.
The influence of the Marxian paradigm is clearly seen in the traditional view that
income or wealth inequality is a major potential cause of conflict. Early empiri-
cal studies emphasized indicators of income or wealth distribution as possible cor-
relates of conflict (see, e.g., Brockett 1992; Midlarski 1988; Muller and Seligson
1987; Muller et al. 1989; and Nagel 1974, among several others). As the survey
article by Lichbach (1989) concludes, however, the results obtained were generally
ambiguous, or statistically insignificant.
The emphasis on inequality as a driver of conflict is natural, in the sense that the
poor might be reasonably expected to harbor strong antagonisms against the rich.
Yet the existence of antagonisms is only part of the story. The prevalence of sus-
tained conflict requires those antagonisms to be channeled into organized action,
often a tall order when economic strengths are so disparate. The clear economic
demarcation across classes is a two-edged sword: while it breeds resentment, the
very poverty of the have-nots militates against a successful insurrection, and even
then the different skill and occupational niches occupied by capitalist and worker
makes effective redistribution across classes a more indirect and difficult prospect.
* Esteban: Instituto de Análisis Económico (CSIC) and Barcelona GSE, 08193 Bellaterra, Barcelona, Spain (e-mail:
[email protected]); Mayoral: CSIC and Barcelona GSE, 08193 Bellaterra, Barcelona, Spain (e-mail: laura.
[email protected]); Ray: Department of Economics, New York University, New York, NY 10012 (email: debraj.
[email protected]). We gratefully acknowledge financial support from CICYT project ECO2011-25293 and Recercaixa.
Esteban and Mayoral are beneficiaries of a financial contribution from the AXA Research Fund. Ray’s research was
funded by National Science Foundation Grant SES-0962124 and a Fulbright-Nehru Fellowship from the Fulbright
Foundation. He thanks the Indian Statistical Institute for warm hospitality during a year of leave from NYU. We
are very grateful to James Fearon for giving us access to his dataset on ethnic groups, to Ignacio Ortuño-Ortín, who
computed linguistic indices for us based on the Ethnologue dataset, and to Michael Ross for his dataset on natural
resources. We are grateful to Oeindrila Dube, Kaivan Munshi, Natalija Novta, and Romain Wacziarg for helpful com-
ments, to Laura Cozma, Andrew Gianou, and Bor ˇe k Vaší c
ˇ ek for superb research assistance, and to seminar participants
at various venues where this paper was presented. Finally, we acknowledge constructive comments by three anony-
mous referees; the final version owes much to them.
†
To view additional materials, visit the article page at http://dx.doi.org/10.1257/aer.102.4.1310.
1310
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1311
1
In addition, within-group economic disparities allow the complementary activities of conflict funding and con-
flict participation to take place. Esteban and Ray (2008a) base a theory of ethnic salience in conflict on this premise.
2
See Blattman and Miguel (2010) for an extensive survey that discusses these papers and related literature.
3
Measures of polarization were developed independently by Esteban and Ray (1994) and Wolfson (1994). The
measure MRQ use can be viewed as a special case of the one we deploy in this paper. It presumes that all intergroup
distances are “binary.” In contrast, we will draw in detail on alternative measures of intergroup distances. Fearon
(2003a) has already made the point that ethnolinguistic distances may potentially play a role in explaining ethnic
conflict and computed a measure based on dissimilarity between pairs of languages. Desmet, Ortuño-Ortín, and
Wacziarg (2012) examine this point in a different context, by studying the level of social transfers in ethnically het-
erogeneous societies. They find that the measures that include variation in distances outperform the ones that don’t.
4
Recall Horowitz (1985, p. 39): “A centrally focused system [with few groupings] possesses fewer cleavages
than a dispersed system, but those it possesses run through the whole society and are of greater magnitude.”
1312 THE AMERICAN ECONOMIC REVIEW june 2012
for such “public” and “private” prizes as well as different mixes of those prizes. We
review their approach briefly in Section I. They show that the equilibrium intensity of
conflict is linearly related to just three measures of distribution and no other: polariza-
tion (P), fractionalization (F), and a Greenberg-Gini index of ethnic difference (G), all
to be formally defined below (see Proposition 1). Moreover, the model tells us that the
weight of each of these indices in explaining conflict intensity depends on the particu-
lar nature of each conflict. Specifically, ethnic polarization will influence conflict if the
prize is public and group cohesion is high, and ethnic fractionalization will influence
conflict if the prize is private (and group cohesion, once again, is high). Finally, the
Greenberg-Gini difference index becomes relatively important in explaining conflict
if group cohesion is low.
The purpose of the current paper is to bring these theoretical predictions to the
data. We study 138 countries over 1960–2008. We begin by implementing the idea
that the equilibrium level of conflict is linked to the three distributional measures
identified above. Across a variety of specifications and robustness checks (described
in Tables 1–8), the ethnic polarization measure is highly significant and positive,
the effect of fractionalization is equally large and positive, though somewhat less
significant, and the Greenberg-Gini, while significant, affects conflict negatively.
The fact that polarization is strongly significant suggests that disputes over public
goods, broadly defined, is an important feature of social conflicts. Such public goods
could be narrowly economic, such as access to a particular trade or a labor market,
or they could represent political power or cultural dominance, or plain animosity.
The fact that fractionalization is significant as well suggests that divisible pecuniary
benefits also play a role in conflict. Finally, the importance of polarization and frac-
tionalization, and the fact that G enters negatively, can together be interpreted, using
the theory, as an indicator that within-group cohesion in the contribution of conflict
resources is particularly high in situations of open conflict.
Recall that the relative importance of P and F in explaining conflict depends on the
extent to which payoffs are public rather than private. The previous exercise implicitly
assumes that this composition is the same across countries. In the remainder of the
paper, we take the analysis a step forward by constructing proxies for country-specific
values of relative publicness from ancillary data. To do so, we employ indicators for
privateness or publicness of the prize that vary across countries. We capture private-
ness by oil reserves, and publicness by different measures of autocracy (see Section V
for more details). With these two sets of indicators we construct an index of relative
publicness Λ. We use the structure of Proposition 1 to create the variables P × Λ
and F × (1 − Λ). These variables test for the ideas that the impact of polarization is
heightened by relative publicness Λ, while that of fractionalization is enhanced by
relative privateness 1 − Λ. Our second main result is that these assertions (see the first
three columns of Table 9) are supported to a remarkable degree.
In the foregoing analysis, we continued to assume that the level of within-group
cohesion is the same across the countries in our sample. In a last step that exploits
fully the structure of the theory, we use indicators from the World Values Survey to
estimate group cohesion by country, and then enter all these variables into the regres-
sion exactly as specified by the model. This permits sharper tests that rely even more
deeply on the structure of the model. Once again, we invoke Proposition 1 to inform
the empirical specification, and once again our results are strongly supportive of the
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1313
theory; see the last three columns of Table 9. The three steps imply increasing faith
in the logical structure of the model. We do not take a particular stand on this issue,
and leave it to the reader to decide which approach (if any) she finds most convincing.
Section I summarizes the theory. Section II describes the data. Baseline empirical
results are presented in Section III, while Section IV examines robustness along sev-
eral dimensions. Section V extends the analysis to allow for intercountry variations
in relative publicness and cohesion. Section VI concludes.
I. Theory
The background for this paper is Esteban and Ray (2011) (hereafter, ER). ER
describe a theory of conflict incidence in which distributional measures play a cen-
tral role.5 There are m groups engaged in conflict, with Ni the number of individuals
in group i, and N the total population. The winner enjoys two sorts of prizes: one is
private and therefore excludable, and the other is public.
Examples of private payoffs include administrative or political positions, spe-
cific tax breaks, bias in the allocation of public expenditure and infrastructures,
or access to rents from natural resources. Privateness has two properties. First, the
prize is divided among the winning group, so group size matters (Olson 1971).
Second, the identity of the winner is irrelevant to the losers.6 Let � be the per capita
value of the private prize at stake.
In most conflicts, victory also yields a prize that is public in nature: its enjoyment is
independent of the population size. This includes political power, control over policy,
cultural values, religious dominance, and so on. The (population-normalized) mag-
nitude of such public payoffs—call it π—must depend on the extent to which exist-
ing institutions permit the group in power to impose policies or values on the rest of
society. In general, other groups will derive payoffs from these choices, depending on
“how far” they are from the winner. Say that a member of group i enjoys payoff uij π if
the ideal policy of group j is chosen. This induces a notion of “distance” across i and j:
dij ≡ uii − uij , so that the per capita loss to i from j ’s ideal policy is just πdij.
Individuals in each group expend resources r (time, effort, risk) to influence the
final outcome. Write the income equivalent cost to such expenditure as c(r) and
assume that c is increasing, smooth, and strictly convex, with c′ (0) = 0. Add indi-
vidual contributions in group i to obtain group contribution Ri. We presume that the
probability of success for group i is given by pi = Ri/RN , where RN ≡ ∑ i Ri.7 We
denote by ρ ≡ RN/N the per capita value of the resources expended in conflict.
The payoff, then, to a person in group i who expends resources r is given by
�
m
(1) πuii + pi _
n − ∑ pj πdij − c(r),
i j=1
5
It is simply presumed that society is in a state of (greater or lesser) turmoil. For explicit models of the decision
to enter into conflict, see Esteban and Ray (2008a, b) and Ray (2009).
6
To be sure, there could be differential degrees of resentment over the identity of the winner. Simply include this
component under the public type of the two prizes.
7
If RN = 0, use an arbitrary allocation of win probabilities.
1314 THE AMERICAN ECONOMIC REVIEW june 2012
Close the model by presuming that every individual has an “extended utility func-
tion” (as in Sen 1966) that places weight 1 on personal payoffs, described in equa-
tion (1), and weight α on the aggregate of all payoffs for other group members. As ER
observe, the weight α could be altruism, or some measure of the extent to which group
monitoring, possibly with promises and threats, overcomes the usual free-rider prob-
lem. Indeed, α could exceed 1, the latter being the weight placed on the individual.
The main theoretical proposition to follow is based on three measures of eth-
nic divisions that are all based on the same underlying parameters: the population
shares ni of each group, as well as the intergroup distances dij just defined. First, we
introduce a measure of polarization based on Duclos, Esteban, and Ray (2004) and
Esteban and Ray (2011):
m m
P = ∑ ∑ n 2i nj dij .
i=1 j=1
The distinction between P and G is superficial at first sight but it is of great concep-
tual importance. The squaring of population shares in P forces group sizes to matter
over and above the mere counting of individual heads implicit in G.
Our last measure is ethnic fractionalization, which discards the intergroup dis-
tances from the Gini-Greenberg and replaces them with 0–1 variables:
m m m
∑ ∑ ni nj = ∑ ni (1 − ni).
F =
i=1 j≠1 i=1
We now describe the right-hand side of equation (2). The equilibrium level of con-
flict intensity depends on the exogenous data of the model: individual preferences,
group size, nature and size of the prize, and level of group cohesion. Equation (2)
tells us that all this information must be combined in a special way. In particular,
it suffices to aggregate all the information on preferences and group sizes into just
three indices—P, F, and G/N—with the weights on the three distributional measures
depending on the composition of the prize and on the level of group commitment. The
publicness of the prize reinforces the effect of polarization, the privateness of the prize
reinforces the effect of fractionalization, while high group cohesion enhances both
measures and simultaneously diminishes the effect of G/N. ER discuss these effects
in detail.
We study 138 countries over 1960–2008. The time period is divided into 5-year
subperiods for a total of 1,125 observations (in most cases).8 We start by considering
several indicators for the intensity of conflict and then we deal with the measurement
of group size and intergroup distances needed to compute the distributional indices.
The discussion of the measurement of the degree of publicness of the prize and of
the level of group commitment is postponed to Section V, when they will play a
role in our analysis. The Appendix contains detailed descriptions of all the variables
employed in the empirical analysis, as well as summary statistics pertaining to them.
A. Conflict
We measure intensity of conflict on the basis of the death toll. We use data on
battle deaths from the UCDP/PRIO dataset.9 Ideally we would like to have informa-
tion on the total number of deaths per year as a proper indicator for the intensity of
conflict as captured in the ER model presented in Section I. Unfortunately, available
information is quite limited and unreliable. This has led to the convention of mea-
suring conflict by a binary variable. The PRIO dataset offers a yearly binary indica-
tor of whether there is conflict or peace based on three threshold levels depending
on the number of deaths: “low” (prio25), “intermediate” (priocw), and “war”
(prio1000).10 In the current exercise, a country is recorded as having experienced a
conflict incidence at some level in a given period if, in any of the years within that
period, the corresponding threshold condition has been met.11
We take as our baseline prio25, which reports all conflicts with 25 or more battle
deaths in a year. Higher thresholds remove the small—and intermediate—conflict
8
The last subdivision, 2005–2008, contains only four years.
9
This is a joint dataset of the Uppsala Conflict Data Program (UCDP) at the Department of Peace and Conflict
Research, Uppsala University, and the Centre for the Study of Civil War at the International Peace Research
Institute, Oslo (PRIO). It is available at http://www.prio.no/Data/. See Gleditsch et al. (2002) for a presentation of
the dataset and the relevant definitions. Correlates of War (COW) is an alternative dataset. It has been used by Doyle
and Sambanis (2003), Collier and Hoefller (2002), and Fearon and Laitin (2003a).
10
PRIO considers a country to be in a state of conflict when one of the warring parties is the incumbent govern-
ment and the number of battle-related casualties goes beyond a threshold, as described in the main text.
11
We note with some misgivings that the PRIO thresholds are not normalized by the population of the country
in question, which undoubtedly biases civil wars in favor of large countries. The population control in our exercises
should take care of this problem.
1316 THE AMERICAN ECONOMIC REVIEW june 2012
events that the model also seeks to explain. At the same time, we are aware of the
need for alternative definitions, and report results on them. These include not just
the higher-threshold definitions used by PRIO, but also nonbinary alternatives based
both on PRIO and on other data sources. See Section IVA, in which these alterna-
tives are introduced.
So far, we’ve discussed measures of conflict incidence. Conflict onset is a sepa-
rate notion: it describes the start of a “fresh episode” of war or violence. Our theory
does not comfortably fit this particular concept as it is constructed to explain the
intensity of conflict and not the decision of triggering it. Nevertheless, in the inter-
est of robustness, we provide results in Section IVD for various definitions of onset
provided by PRIO.
B. Distributional Indices
Our core independent variables are the indices G, F, and P. In line with
Proposition 1, the index G enters the regression divided by total population, N,
expressed in millions. In order to compute these indices we need the relevant groups
for every country and a proxy for the “distance” in preferences across groups.
12
Attention is restricted to groups that account for over one percent of country population. The average number
of ethnic groups per country is around five, with half the countries housing three to five groups, though the African
average exceeds eight. About 70 percent of countries have a single ethnic group that is a majority, though in most
cases the largest minority is pretty large: only around 20 percent of all countries have a single group that accounts
for over 90 percent of the population.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1317
Laitin (1999, 2000); Laitin (2000); Fearon (2003a); Desmet, Ortuño-Ortín, and
Weber (2009); and Desmet, Ortuño-Ortín, and Wacziarg (2012), and employ the
linguistic distance between two groups13 as an appropriate indicator for their differ-
ence in preferences over public goods.
The different languages spoken can be organized in a language tree capturing
their genealogy. All Indo-European languages, for instance, will belong to a com-
mon subtree. Subsequent splits create further “subsubtrees,” down to the current
language map.14 Ethnologue reports a maximum of 15 steps of branching, though
of course, not all modern language families hit this upper bound along their own
evolutionary branches.15 The distance between two “cultures” can be approximated
by lack of proximity on the language tree. Specifically, define the similarity between
two languages, i and j, sij , as the ratio of the number of common branches to the max-
imum possible number—15 for the entire tree.16 Then, following Fearon (2003a)
and Desmet, Ortuño-Ortín, and Wacziarg (2012), we define the distance between the
two languages, κij , as κij = 1 − s δij , for some parameter δ > 0.
Fearon computes distances using δ = 0.5 and Desmet, Ortuño-Ortín, and Weber
(2009) and Desmet, Ortuño-Ortín, and Wacziarg (2012) use δ = 0.05. We shall take
as a baseline the value δ = 0.05, for reasons that are discussed in greater detail in
Section IVC.
C. Additional Variables
The literature uses a variety of controls, to a large extent depending on the specific
hypothesis being tested. We take as our baseline the set of controls used by MRQ,
that is: log population (pop); log GDP per capita (gdppc); a dummy for oil/diamond
production (oil/diam); percentage of mountainous terrain (mount); noncontiguity
of country territory (ncontig); and democracy (democ). In all the specifications, the
controls are measured in the first year of each period. As a robustness check, we also
use fewer controls in some specifications and additional controls in others, such as
governance variables from Polity IV and Freedom House.17 The online Appendix also
replicates our exercise with controls used in Fearon and Laitin (2003a) and Collier,
Hoeffler, and Rohner (2009). We also construct estimates of group concern from the
World Values Survey, as well as indices of relative publicness of the prize, based on data
from Polity IV and Freedom House, and of privateness of the prize, using data on oil
13
Because of the way they are constructed, Fearon’s groups may contain subgroups speaking different lan-
guages. In this case, we follow him in taking the language spoken by the dominant subgroup as representative of
the entire group.
14
For instance, Spanish and Basque diverge at the first branch, since they come from structurally unrelated lan-
guage families. By contrast, the Spanish and Catalan branches share their first seven nodes: Indo-European, Italic,
Romance, Italo-Western, Western, Gallo-Iberian, and Ibero-Romance languages.
15
The interested reader can find a detailed discussion of the language tree in Desmet, Ortuño-Ortín, and
Wacziarg (2012).
16
If two groups speak the same language, sij is set to 1.
17
More specifically, we consider the lack of executive constraints (excons) and the level of autocracy (autocr),
both from Polity IV, and the extent of suppression of civil liberties (civlib) and political rights (polrights) from
Freedom House. Following Besley and Persson (2010), we use time-invariant versions of these variables, since
short-run changes are likely to be correlated with the incidence of conflict. See Section V and the Appendix for more
details on the construction of these variables.
1318 THE AMERICAN ECONOMIC REVIEW june 2012
reserves from Haber and Menaldo (2011). These variables will serve as essential ingre-
dients for the analysis of Section V that exploits the structure of the model more deeply.
A more detailed definition of the control variables is included in the Appendix.
A. Specification
The goal of our exercise is to take equation (2) to the data as follows:
σit = X1it β1 + X2it β2 + εit; i = 1, … , C, t = 1, … , T,
(3)
where X1it are the relevant distributional variables in the model, X2it is a collection
of controls, εit is an innovation, and C and T are the number of countries and time
periods, respectively. But we don’t observe the dependent variable as written.
Instead we will consider intensity of conflict as a latent variable that we infer from
the realizations of the PRIO binary variables, presuming that
where X it = (X1it, X2it), W *is a threshold that becomes an intercept in H, β is the vector
of coefficients of interest, and H is the cumulative distribution function of εit with
symmetric probability density function (pdf).18 To begin with, the variables X1itare
the different distributional indices P, F, and G/N. This is our baseline specification,
which effectively presumes that cohesion and the importance of public goods are the
same across countries. Later, in Section V, we relax this restriction.
B. Baseline
18
When using a nonbinary indicator, we shall have two thresholds associated with two intensity levels.
19
A country “has oil” if it produces at least $100 per person (in constant 2000 dollars) in rents from oil. A coun-
try “has diamonds” if diamonds are produced locally. We take this information from Ross (2011).
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1319
Notes: p-values are reported in parentheses. Robust standard errors adjusted for clustering have been employed to
compute z-statistics.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
stage, the set of controls we use is in line with MRQ, and we use this as our baseline
control set. Column 7 adds in more political and governance controls.
Throughout, P, F, and G/N are significant. P has the expected positive coefficient
and is highly significant. The coefficient associated with F is also positive; G/N,
while also significant, has negative sign; we interpret this below. Moreover, lagged
conflict is highly significant and, in line with a common result in all the literature
on conflict with cross-country data, per capita income is significantly and negatively
correlated with conflict.20 Finally, we do not see a direct effect of natural resources,
20
We have also examined the effect of income inequality. Using the Gini of personal incomes as a regressor has
no effect either on the value of the coefficient corresponding to P or on its significance.
1320 THE AMERICAN ECONOMIC REVIEW june 2012
though when this is used to construct a measure of relative publicness and interacted
with distributional measures in the way suggested by the theory, it will have a highly
significant impact (see Table 9).
Using the theory summarized in Proposition 1 and the assumption that α and λ are
constant across countries, it is possible to provide an interpretation for the estimated
parameters. First, the fact that P is highly significant suggests that both the publicness
of the prize (λ) as well as the degree of group cohesion (α) are significant. Moreover,
our results for G/N suggest that α is close to or perhaps even larger than 1, indicating
that models of free-riding are perhaps less relevant than we make them out to be, at
least in the cases of civil conflict in the data.21 One obvious possibility is selection:
observed conflicts must have been successful in resolving the collective action prob-
lem, so that such conflicts must be associated with a high value of group cohesion.
As for the public component, whether it is economic (control of a labor or housing
market, or a trade), cultural (the establishment of some notion of ideological or reli-
gious superiority), or political (control of the state) is something we cannot identify.
All we can say is that it is central to conflict. At the same time, the significance of
F suggests that private components, such as the existence of natural resources, are
also important. While natural resources in and of themselves are not significant in
our regressions, we shall see in Section V that they come fully into their own when
interacted with the distributional variables as directed by the theory.
Our interpretation—that both public and private goods matter for conflict—
is relevant to the discussion on greed versus grievance as motivations for ethnic
conflict introduced by Collier and Hoeffler.22 While we are not sure of the utility
of this distinction, one possible interpretation is that “greed” corresponds to con-
flict over private goods, while “grievance” would come under the rubric of public
goods (political rights and freedoms, or religious dominance). Our exercise points
to the importance of both motives, and this will be enhanced further as we exploit
the model structure even more in Section V.
Finally, we point to the quantitative importance of polarization and fractional-
ization in conflict. Consider the baseline set of controls (Table 1, column 6). Our
estimated coefficients imply that if we move from the 20th percentile of polariza-
tion to the 80th percentile, holding all other variables at their means, the prob-
ability of conflict rises from approximately 13 percent to 29 percent. Performing
the same exercise for F takes us from 12 percent to 25 percent. These are similar
(and strong) effects.
The observation that α > 1 means that individuals might effectively be placing more weight on the group than
21
they do on themselves.
See Collier and Hoeffler (2004) and, more recently, Collier, Hoeffler, and Rohner (2009).
22
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1321
Notes: Panel A ranks the median fractionalization decile in increasing order of polarization.
Panel B ranks the median polarization decile in increasing order of fractionalization.
23
To explore whether our results are driven by a particular group of observations, we have employed several
tools to detect the influential observations in the sample. There are 78, 43, and 117 influential observations accord-
ing to the Pearson residual, deviance residual, and Pregibon leverage statistics, respectively. The online Appendix
reproduces column 6 in Table 1 once influential observations have been removed from the sample. The significance
of P and F remains unaffected.
1322 THE AMERICAN ECONOMIC REVIEW june 2012
PRIO25 (residuals)
PRIO25 (residuals) 0.9
0.9
0.7 0.7
0.5 0.5
0.3 0.3
0.1 0.1
−0.07 −0.02 0.03 0.08 0.13 0.18 −0.45 −0.35 −0.25 −0.15 −0.05 0.05 0.15 0.25 0.35 0.45
−0.1 −0.1
−0.3 −0.3
Our baseline uses the conflict binary variable prio25. PRIO reports other indi-
cators, and nonbinary alternatives are also possible. Table 3 reports on the use
of alternative dependent variables to proxy conflict. Column 1 repeats column 6
from the baseline specification for comparison (the same controls are used here).
Column 2 employs the intermediate notion priocw, which is prio25 augmented by
the requirement that the overall conflict must yield at least 1,000 deaths. Column 3
uses prio1000, the PRIO definition of civil war, which demands at least 1,000 deaths
per year. Column 4 reports on a nonbinary measure of intensity—prioint—based
on the PRIO dataset, that separates conflict episodes satisfying prio1000 from the
rest. “Peace” is assigned a value of 0, events satisfying prio25 that are not prio1000
are assigned 1, and events recorded as prio1000 are assigned 2.25 Finally, col-
umn 5 uses an alternative measure of conflict intensity: the continuous index of
social conflict (isc) as computed by the Cross-National Time-Series Data Archive
24
Conflict is the average of prio25 over the sample. Time-varying covariates are referred to 1960. The graphs
plot the residuals from the linear regressions: panel A, (1) P on all other covariates, (2) conflict on all other covari-
ates (excluding P); panel B, (1) F on all other covariates, (2) conflict on all other covariates (excluding F).
25
We do not use priocw in the definition because it relies on the overall number of deaths, and does not neces-
sarily imply a higher intensity in any particular year.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1323
Notes: Columns 1–3, logit; column 4, ordered logit; column 5, ordinary least squares (OLS). p-values are reported
in parentheses. Robust standard errors adjusted for clustering have been employed to compute z-statistics.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
similar effects: priocw goes from under 7 percent to 16 percent, while prio1000
increases from approximately 3 percent to 6 percent.
While our results are generally robust to the choice of other dependent variables,
we record our own preferences. The variable prio25 is generally useful, because
it serves as a repository of all conflicts. In contrast, prio1000 will fail to register
conflicts that run into hundreds of deaths per year. To be sure, the choice will depend
on the questions being asked, but there is no way in which our theory allows us to
eliminate (as examples) the Palestinian or Guatemalan conflicts, neither of which
receive a coding in any year of the PRIO dataset (up to 2008) as a prio1000 conflict.
Moreover, the cumulative deaths in all these cases are sizable.28 This motivates our
strong preference for PRIO’s own baseline definition of conflict using prio25, and
in what follows, we will not emphasize prio1000 any longer.
Another option, and one that we have a distinct preference for, is the use of
prioint, which places larger weight on prio1000 conflicts as described above. We
use prio25 only because it is standard, but prioint performs just as well in every
one of the regressions displayed for prio25. The online Appendix contains a full set
of estimations using prioint.
B. Alternative Groupings
28
There are many other examples. To choose a current one, the Indian government has described the ongoing Maoist
conflict in tribal areas as the greatest internal security threat to the country. Yet, while the conflict has been severe, with
many killings, the annual numbers have been in the hundreds, but below the prio1000 threshold, as of 2010.
29
The information from Ethnologue has already been used for the analysis of conflict by Alesina et al. (2009);
Desmet, Ortuño-Ortín, and Weber (2009); and Desmet, Ortuño-Ortín, and Wacziarg (2012).
30
For instance, in the case of Mexico, Ethnologue reports 291 living languages. In contrast, the number of ethnic
groups for this country in Fearon’s dataset is four (Mestizo, Amerindian, White, and Mayan).
31
An alternative approach, which we have examined with equal success for P, is to instrument for the distribu-
tional measures obtained from Fearon groupings using their counterparts from the Ethnologue classification. The
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1325
Table 4 exactly replicates Table 3 using the group sizes furnished by Ethnologue.
The behavior of the polarization index P, as well as that of G/N, is unchanged. In
particular, polarization continues to be as significant as in our previous exercise (even
the estimated coefficients are very close). But fractionalization is no longer significant.
This is not surprising. Indices that fail to take intergroup distances into account are less
robust to the definition of groupings. We return to this issue in Section IVC.
C. Group Distances
A separate concern is the robustness of our results with respect to the param-
eter δ for the distance variable. We use δ = 0.05, as do Desmet, Ortuño-Ortín, and
Wacziarg (2012). Fearon (2003a) uses δ = 0.5. None of these choices is satisfactorily
first-stage correlations are high: for instance, the correlation between the two polarization indices (when δ = 0.05
in both cases) is equal to 0.70. But we find it difficult to push the exclusion restriction that all linguistic sources
of conflict must of necessity transmit themselves via the Fearon grouping, which, too, is a step removed from the
groups directly engaged in conflict. We therefore restrict ourselves to simply reporting the reduced-form estimates
using Ethnologue, as a robustness exercise.
1326 THE AMERICAN ECONOMIC REVIEW june 2012
−346.2
−356
−356.5
−346.4
−357
Pseudo-likelihood
Pseudo-likelihood
−346.6 −357.5
−358
−346.8 −358.5
−359
−347
−359.5
−347.2 −360
−360.5
−347.4 δ −361 δ
0. 1
05
0. 1
15
0. 2
25
0. 3
35
0 . .4
45
0 . .5
55
0. 6
65
0 . .7
75
0 . .8
85
0. 9
95
1
2
3
4
5
50
0
Q
01
02
05
07
Q
1
1
0
0.
0.
0.
0.
0.
10
R
0
0.
0.
0.
0.
0.
0.
0.
R
0.
0.
0.
0.
Panel A. Fearon groupings Panel B. Ethnologue groupings
motivated. Yet the choice is important because it implicitly selects the levels of lin-
guistic (dis)similarity to be emphasized. Low values of δ will essentially separate
the languages that have very few branches in common from the rest. As we pro-
gressively increase δ, small differences acquire greater salience while the bigger
differences play a less than proportional role. In the limit as δ → ∞, the smallest
difference is identified as a complete difference, indistinguishable from deeper lin-
guistic cleavages. The polarization measure that corresponds to this limit would
only use binary 0–1 “distances,” and is at the heart of MRQ’s empirical study:
m m m
R = ∑
∑ inj = ∑ n
n 2i (1 − ni).
i=1 j≠i i=1
Alternatively, one could reestimate the model with other values of δ to check robustness. See the online
32
Notes: Dependent variable is prio25. p-values are reported in parentheses. Robust standard errors adjusted for clus-
tering have been employed to compute z-statistics.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
language difference as important; i.e., as we raise the value of δ; see panel B and
note that the axes in panels A and B have very different scales.
We illustrate this argument in the special case of the “binary” polarization mea-
sure R which, as we’ve noted, effectively sets δ = ∞. The correlation between
P (with δ = 0.05) and R is 0.45; see the online Appendix for the scatter plot. In
Table 5, column 1 reproduces the baseline estimates for prio25 from column 6 of
Table 1, using the Fearon groupings. Column 2 replaces P with the binary index R.
The parallels between the two are evident: R simply takes over from P. This column
can be viewed as a replication of the basic equation in MRQ. Column 3 of the table
puts together both P and R along with the other distributional equations into a single
equation. The comparison continues to yield symmetric outcomes: now P and R are
both significant, and on very similar terms. Thus, while P is a powerful explanatory
variable, so, it seems, is R.
There is a striking difference, however, once we employ classifications based
on completely ungrouped linguistic criteria. Now it is imperative to carry a
1328 THE AMERICAN ECONOMIC REVIEW june 2012
notion of distance, otherwise every pair of groups will appear equally distinct.
To see this, consider our specification using Ethnologue; we’ve reproduced col-
umn 1 from Table 4 as column 4 here. Once again, P is highly significant. But
this time the replacement of P by R in column 5 is not met with equal success, or
indeed with any success at all; R is entirely insignificant. Finally, the horse race
between P and R in column 6 is resolved unambiguously in favor of P: R plays
no role at all.
The reason why R might be problematic is simple. Ethnologue groupings are fully
linguistic in nature. It is reasonable to presume that conflict in society did not follow
every such linguistic division. Allowing this outcome to be tempered by a consid-
eration of intergroup distances (even the linguistic “distances” that we adopt in the
interest of exogeneity) helps enormously. Binary measures of polarization are too
coarse to achieve this modulation in any meaningful way. The data fully support
such an assertion.
Our baseline specification uses incidence rather than onset, because the theory is
silent on the initial decision to go to conflict.34 Moreover, the operational distinction
between onset and incidence depends on taking the PRIO thresholds quite literally.
Before the threshold is crossed, we might have several manifestations of serious con-
flict (a breakdown in negotiations, an insurgency, a crackdown). “Onset” as defined by
the PRIO threshold is far from a sharp concept: it is arguably no different from a year
of “incidence,” though to be sure, the factors that contribute to the outbreak of a con-
flict do not coincide with the ones that keep feeding it (Schneider and Wiesehomeier
2006). This is why we control for lagged conflict in our incidence regressions.
That said, in the interest of robustness, Table 6 provides some onset regres-
sions. The binary onset variable onset n switches on in a particular year if the
incidence requirement is met (at the level of prio25), but not in n or more previ-
ous years.35 We take three definitions of onset by setting n = 2, 5, 8.36 The first
three columns report onset for Fearon groupings. The next three do the same for
Ethnologue groupings.37
It is clear that nothing of importance changes in the results. In each of the speci-
fications, polarization is positive and (usually) highly significant. Fractionalization
and the Greenberg-Gini continue to be generally significant, though not as strongly.
We check whether the results are driven by particular regions that might be con-
sidered more (or less) conflictual. Table 7 reports on the findings. Column 1 is
just the baseline specification for easy comparison. Column 2 introduces regional
See Esteban and Ray (2008b) for a two-stage model of conflict onset.
34
Notes: p-values are reported in parentheses. Robust standard errors adjusted for clustering have been employed to
compute z-statistics.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
dummies. Columns 3–5 eliminate a region each: Africa, Asia, and Latin America,
in that order. Column 6 returns to the baseline but this time with a full set of overall
time dummies, one for each period in the sample except for the first one. Column 7
controls for possibly different regional time trends. To this effect, we introduce
regional dummies interacted with a time trend.
It appears that the significance of polarization, overwhelmingly at around the one
percent level, is unshakeable. Fractionalization is positive and significant as well,
often highly so. The Greenberg-Gini coefficient is significant and it continues to
have the ubiquitous negative sign.
Notes: prio25 throughout; p-values are reported in parentheses. Robust standard errors adjusted for clustering have
been employed to compute z-statistics.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
while data on all time-dependent variables come from 1960. Column 3 estimates
our baseline specification using annual data in Fearon (2005).38 Column 4 uses
rare events estimation to correct for the bias created in a logit model for the small
number of conflict observations relative to the total, as in King and Zeng (2001).
Column 5 presents the same results as column 1 using the linear probability model.
Finally, column 6 uses a linear specification and allows for the possibility that the
distributional coefficients for each country are random draws from a probability
distribution, while the other coefficients are held fixed as before. We return to this
particular specification in the next section.
It is fair to say that in the variations we study (and in others not reported here), our
conclusions are unaltered. The coefficients of F and P are generally positive and highly
significant. The variable G/N is negative and sometimes significant. Our overall inter-
pretation of these results, as discussed in the baseline specification, remains unchanged.
38
Fearon’s sample runs from 1960 to 1999. Since annual data on democracy are not available in this sample, the
value of democ corresponding to the first five-year period, 1960–1965, has been used instead.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1331
Notes: prio25 throughout. p-values are reported in parentheses. Robust standard errors adjusted for clustering have
been employed to compute z-statistics. OLogit(CS): cross-sectional data estimated using ordered logit. Logit(Y):
yearly data estimated using Logit. ReLogit: Rare Events Logit estimator. OLS: Ordinary Least Squares in a linear
probability model (LPM). RC: Random coefficients in an LPM.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
An important insight of equation (2) is that the effect of each distributional measure
is influenced in specific ways by the relative publicness of conflict payoffs, as well as
the extent of group cohesion. Recall equation (2), and note that λ ≡ π/(π + μ) mea-
sures the “relative publicness” of the prize. Observe that the impact of P is enhanced by
λ, and that of F by (1 − λ). In particular, when all conflicts are public, fractionalization
cannot matter, while if all conflicts are private, polarization cannot matter. But the pres-
ence of group cohesion strengthens both these variables; consult equation (2).
While we’ve discussed this interpretation using the baseline specification, it isn’t
hard to imagine that relative publicness is country-specific, and so is the extent of
group cohesion. That places stress on the cross-country specification so far, which
presumes—the presence of additive controls notwithstanding—that the coefficients
on the distributional variables are independent of country. The random-coefficients
specification in the very last column of Table 8 is of particular interest here. Using
1332 THE AMERICAN ECONOMIC REVIEW june 2012
a likelihood ratio test, we can indeed reject the hypothesis of constant coefficients.
Statistically, this is no surprise, as such a specification across countries is often
rejected anyway, though we return to this test below.39 Despite this, it is of interest
that the estimates of the coefficients in the OLS-LPM and RC specifications
(see columns 5 and 6 of Table 8) are very similar—these coefficients are compa-
rable since they’ve both been estimated in a linear specification. That suggests that
the qualitative and quantitative conclusions of the first part of the paper hold when
the assumption of constancy of the coefficients is dropped.
The goal of this section, then, is to construct and use country-by-country proxies
for relative publicness and cohesion.
We first construct indicators for π and μ, and use these to define a proxy for
the relative publicness of the prize. Begin with the indicator for the private payoff
μ. It seems natural to associate μ with rents that are easily appropriable. Because
appropriability is closely connected to the presence of resources, we approximate
the degree of “privateness” in the prize by asking if the country is rich in natural
resources. We proxy the abundance of natural resources by the per capita value of
oil reserves (oilresv).40 Next, we create an index of “publicness” (pub) by ask-
ing different questions about the degree of power afforded to those who run the
country, “more democratic” being regarded as correlated with “less power” and
consequently a lower valuation of the public payoff to conflict. We use four differ-
ent proxies to construct the index:41 (i) the lack of executive constraints (excons);
(ii) the level of autocracy (autocr); (iii) the degree to which political rights are
flouted (polrights); and (iv) the extent of suppression of civil liberties (civlib).
Our variable pub is constructed by looking at binary versions of these outcomes and
then averaging the indicators. Details are in the Appendix. The results are robust to
different modes of construction and indeed to the choice of a subset of these mea-
sures; see the online Appendix for some variants.
Our proxy for the relative publicness of the prize is given by
where we multiply the pub indicator by per capita GDP to convert the “poor governance”
variables into monetary equivalents (note that oil reserves are expressed in money values
per capita as well). The “conversion factor” γ makes the privateness and publicness vari-
ables comparable, and allows us to combine them to arrive at the ratio Λ(γ).
We take γ = 1 in the main text. The online Appendix shows that the results are
robust to a wide range of values of γ. In addition, we can compute pseudo-likelihoods
for different values of γ, just as we did for the linguistic distance coefficient (see
Figure 2). Figure 3 displays the likelihood results for the dependent variable prio25,
39
The likelihood ratio test is made complicated by the fact that the constrained parameters being tested—the
variances of the random coefficients equal zero—lie on the boundary of the parameter space. A general distribution
theory for this test is not available, but Stram and Lee (1994) show that the tail probabilities of the distribution of
our test statistic are bounded above by those of the χ2 with the standard degrees of freedom, four in our case. The
test thus provides a sufficient condition for rejection using the χ2 (4) distribution.
40
Data on oil reserves comes from Haber and Menaldo (2011). See the Appendix for details on the construction
of this variable.
41
As mentioned earlier, we use time-invariant versions of these variables since short-run changes are likely to be
correlated with the incidence of conflict.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1333
Pseuodo-likelihood
Pseudo-likelihood
γ γ
0. 5
25
5
75
1
25
5
75
2
5
3
5
4
5
5
5
6
7
8
9
10
15
20
30
0. 5
25
5
75
1
25
5
75
2
5
3
5
4
5
5
5
6
7
8
9
10
15
20
30
1
0.
1.
2.
3.
4.
5.
0.
1.
2.
3.
4.
5.
0.
0.
1.
1.
0.
0.
1.
1.
Panel A. Without group cohesion Panel B. With group cohesion
and for the two empirical specifications in Table 9 that use prio25. Clearly, γ = 1 is
a good choice under this criterion.
Table 9, columns 1–3, provides a variety of different specifications using the binary
variable prio25 as well as the multivalued indicators prioint and isc. Group cohe-
sion is held constant, and our main independent variables are P × Λ, F × (1 − Λ),
and G/N × Λ, where Λ ≡ Λ(1). This allows us to test whether the interacted indi-
ces of ethnic inequality, fractionalization, and polarization are significant. We also
include the noninteracted indices in order to examine whether their significance
truly comes from the interaction term.42 Polarization interacted with Λ is positive
and generally significant, and the same is true of fractionalization interacted with
1 − Λ. (The interacted Greenberg-Gini is not significant.)
The level terms P, F, and G are now no longer significant. Conditional on hav-
ing the true values of λ, this is precisely what the model would predict. After all,
the influence of polarization (say) should be zero when there are no public goods,
broadly defined to include primordial goals, at stake. The fact that our estimate Λ
happens to achieve the same goal is of interest, and possibly suggests that factors
such as pure primordialism have little to do with ethnic conflict.
We remark on the interpretation of interactions in nonlinear regressions (such as the
logit). It is well known that any significance of an interaction term cannot be attributed
fully to the true effect of that interaction term on the dependent variable (see, e.g., Ai
and Norton 2003).43 It does imply, however, that the effect on the underlying latent
variable—the cost of conflict, as proxied by the number of deaths—is indeed as esti-
mated. And this is what we are interested in from the viewpoint of the theory.
Next, we allow group cohesion to vary across countries. We estimate a proxy A for
the level of group cohesion α by exploiting the answers to a certain set of questions
asked in the 2005 wave of the World Values Survey. We use the latest wave available
because it covers the largest number of countries. One could argue that the answers
might be conditioned by the existence of previous or contemporary conflict. The
We do not include the level of Λ—there is nothing to suggest that it will have a level effect—but rather the
42
components of Λ, which are the oilresv indicator and the variable pub × gdppc constructed from the governance
variables.
43
A linear probability model, though problematic on other grounds, would not exhibit this problem.
1334 THE AMERICAN ECONOMIC REVIEW june 2012
Notes: Columns 1 and 4, logit; columns 2 and 5, ordered logit; columns 3 and 6, OLS; p-values are reported in
parentheses. Robust standard errors adjusted for clustering have been employed to compute z-statistics.
*** Significant at the 1 percent level.
** Significant at the 5 percent level.
* Significant at the 10 percent level.
questions we have selected, described in detail in the Appendix, do not ask about
specific groups, but address issues like adherence to social norms, identification
with the local community, the importance of helping others, and so on.44 We con-
struct a summary proxy for α—call it A—from the scores on these questions.
44
To be sure, even such a procedure is vulnerable to charges of endogeneity, but hopefully less so.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1335
45
We feel no particular compunction to include all possible level effects. Terms such as P × Λ × A can be viewed
as composite variables that the model predicts will matter, and not as interactions of (in this instance) three variables.
46
See Table 15 in the online Appendix and the discussion there.
1336 THE AMERICAN ECONOMIC REVIEW june 2012
population is large.) The theoretical structure therefore both disciplines our empiri-
cal specification and allows for interpretation of the estimated coefficients.
The first task of this paper is to implement the above measures; specifically, those
that employ intergroup “distances.” We do so using linguistic differences across
groups, in the spirit of Fearon (2003a) and Desmet, Ortuño-Ortín, and Wacziarg
(2012). Linguistic distance—as constructed by the cardinality of intervening nodes
on the language tree—is plausibly exogenous to conflict, while at the same time
they can be expected to drive—or at least influence—antagonisms across groups.47
We then proceed to an empirical analysis that closely parallels the theory. Our main
result is that ethnic polarization has a large and highly significant impact on conflict
across a number of different specifications. By and large, though with somewhat lesser
consistency, this is also true of fractionalization. These two findings suggest that public
and private components of conflict are generally both present, and that within-group
cohesion is strong during conflict.48 The numerical effects of the two measures are
large and quite similar. For instance, moving polarization from the 20th percentile to
the 80th percentile, holding all other variables at their means, approximately doubles
the chances of conflict, and the same is true of fractionalization.
Our results concerning polarization (and to a lesser degree, fractionalization) are
highly robust. They extend to a variety of different measures of conflicts (including
different binary measures of conflict incidence, as well as continuous indices), to alter-
native ways of calculating language distances, to different choices of groups (as long
as language is principally used in defining them), to the use of different regional dum-
mies or selections, and to the inclusion of overall or regional time trends.
A limitation of our cross-sectional approach is that the importance of public and
private components of conflict, as well as the extent of group cohesion, are presumed
not to vary over countries. We therefore pursue refinements of our specification by
constructing country-specific measures of “relative publicness” of the prize, using
data on natural resources (as a proxy for private payoffs) and data on autocracy (as
a proxy for public payoff to holding power). We combine these measures to create
an index of “relative publicness” of conflict payoffs, one that varies across countries.
When our measures are interacted with this index in exactly the way suggested by
the theory, the resulting coefficients are significant across a variety of specifications,
and are strongly supportive of the conceptual framework.
The exercise can be augmented still further by using a measure for within-group
cohesion, one that we construct by using information from the World Values Survey.
We can then directly address the model by imposing even more structure, by con-
structing variables that conform precisely to those predicted by the theory. The result-
ing outcomes, both for polarization and fractionalization, are highly significant.
This paper takes a step toward the establishment of a strong empirical relation-
ship between conflict and certain indicators of ethnic group distribution, one that is
47
Another possibility, infeasible at present, is to use measures of genetic distance across groups. This is not hard
to do across countries, but at this time there is too little within-country variation. In those cases in which more disag-
gregated data is available (see Cavalli-Sforza, Menozzi, and Piazza 1994), it has been prohibitively difficult to obtain
group share data. Perhaps in the near future, with more detailed genetic datasets, this approach will become feasible.
The Greenberg-Gini index (normalized by population, as required by the theory) is also significant in several
48
of the specifications, but usually with a negative sign. This finding supports our conclusion that group cohesion is
extremely important and present to a significant degree in times of conflict. Presumably, the very fact that a conflict
is observed implies that free-rider problems have been overcome to a large extent.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1337
Appendix
We provide definitions of all major variables used in the paper, beginning with the
different measures of conflict.
prio25. “Armed conflict” from PRIO: a contested incompatibility that concerns
government and/or territory where the use of armed force between two parties, of
which at least one is the government of a state, results in at least 25 battle-related
deaths per year and per incompatibility. We consider only types 3 and 4 from the
database; these refer to internal armed conflict. If a country has experienced a prio25
conflict according to the PRIO dataset in any of the years of our five-year period,
this variable takes a value equal to 1.
priocw. “Intermediate armed conflict” from PRIO: includes all prio25 conflicts
that result in a minimum of 1,000 deaths over the course of the conflict. We consider
only types 3 and 4 (internal armed conflict). If a country has experienced a priocw
conflict according to the PRIO dataset in any of the years of our five-year period,
this variable takes a value equal to 1.
prio1000. “War” from PRIO: same definition as prio25 with a threshold of battle-
related deaths of at least 1,000 per year and per incompatibility. We consider only
types 3 and 4 (internal armed conflict). If a country has experienced a prio1000
conflict according to the PRIO dataset in any of the years of our five-year period,
this variable takes a value equal to 1.
prioint. “Conflict intensity” from PRIO: we assign a value of 0 if there is peace in
a given year, a value of 1 if there are events satisfying prio25 that are not prio1000,
and a value of 2 if there are events recorded as prio1000. The value of prioint is the
maximum conflict level experienced within the five-year period.
isc. Index of social conflict. The Cross-National Time-Series Data Archive
(CNTS) computes the isc index as the weighted average of eight variables related
to social unrest.49
These variables are (weights are provided in brackets): Assassinations (domestic1) [25]: Any politically moti-
49
vated murder or attempted murder of a high government official or politician. General Strikes (domestic2) [20]:
Any strike of 1,000 or more industrial or service workers that involves more than one employer and that is aimed
at national government policies or authority. Guerrilla Warfare (domestic3) [100]: Any armed activity, sabotage, or
bombings carried on by independent bands of citizens or irregular forces and aimed at the overthrow of the pres-
ent regime. Major Government Crises (domestic4) [20]: Any rapidly developing situation that threatens to bring
the downfall of the present regime, excluding situations of revolt aimed at such overthrow. Purges (domestic5)
[20]: Any systematic elimination by jailing or execution of political opposition within the ranks of the regime or
the opposition. Riots (domestic6) [25]: Any violent demonstration or clash of more than 100 citizens involving
1338 THE AMERICAN ECONOMIC REVIEW june 2012
share of group i and m is the number of groups. Data on group shares has been obtained
from Fearon (2003b) and the Ethnologue project (http://www.ethnolgue.com).
P. Polarization, computed as P = ∑ i=1
∑ j=1
n 2i njκij, where κ ij = 1 − s 0.05 ij and
m m
sijis the degree of similarity between two languages i and j, given by the ratio of the
number of common branches to the maximum possible number—15 for the entire
tree.50 Group shares are constructed as above, for F; data on language and linguistic
distances come from Ethnologue.
G. Greenberg-Gini index, defined as G = ∑ i=1
∑ j=1
ni nj κij . We use the same
m m
the use of physical force. Revolutions (domestic7) [150]: Any illegal or forced change in the top government elite,
any attempt at such a change, or any successful or unsuccessful armed rebellion whose aim is independence from
the central government. Antigovernment Demonstrations (domestic8) [10]: Any peaceful public gathering of at least
100 people for the primary purpose of displaying or voicing their opposition to government policies or authority,
excluding demonstrations of a distinctly antiforeign nature. The calculation of the isc is performed as follows:
weighted sum of occurrences of each event divided by the number of types of variables, 8.
If two groups speak the same language, sijis set to 1.
50
51
These questions are: V84: “It is important to this person to help the people nearby; to care for their well-
being.” V87: “It is important to this person to always behave properly; to avoid doing anything people would say is
wrong.” V89: “Tradition is important to this person; to follow the customs handed down by one’s religion or fam-
ily.” V7: “For each of the following, indicate how important it is in your life. Would you say it is: Politics.” V9: “For
each of the following, indicate how important it is in your life: Religion.” V211: “People have different views about
themselves and how they relate to the world. Using this card, would you tell me how strongly you agree or disagree
with each of the following statements about how you see yourself?: I see myself as part of my local community.”
All variables are normalized to a 1–4 scale.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1339
a time-invariant dummy in the following way: first, the percentage of years in the
sample for which a country received a score greater than four is computed. Then, if
this percentage is smaller than 0.4, a country received a value of excons equal to 1 in
all the sample.
gdppc. Log of real GDP per capita corresponding to the first year of each five-year
period. Source: Maddison (2008).
mount. Percent mountainous terrain. The data source is Fearon and Laitin
(2003b), who use the codings of geographer A. J. Gerard.
N. Population, in millions. Source: Maddison (2011).
ncont. Noncontiguous states, referring to countries with territory holding at least
10,000 people and separated from the land area containing the capital city either by
land or by 100 kilometers of water. Source: Fearon and Laitin (2003b).
oil/diam. Oil/Diamond dummy, which takes the value 1 if the country is “rich in
oil” or produces (any positive quantity of) diamonds. A country is “rich in oil” if the
average value of its oil production in a period is larger than 100 US dollars in 2000
constant dollars. Source: Ross (2011).
oilresv. Per capita value of oil reserves at the beginning of the period. Data for
oil reserves comes from Haber and Menaldo (2011). To convert quantities into dol-
lars, we use the oil price data in Ross (2006), where prices are referred to the same
year as reserves. Finally, we divide by population to obtain a per capita measure.
1340 THE AMERICAN ECONOMIC REVIEW june 2012
polrights. (Lack of) political rights. The data source is Freedom House (2011),
which considers a 1–7 scale (1 indicates most free). We transform this variable into
a time-invariant dummy in the following way: first, the percentage of years in the
sample for which a country received a score smaller than four was calculated. Then,
if this percentage was smaller than 40 percent, a country received a value of 1 in all
the sample.
pop. Log of population in the first year of each five-year period. Source: Maddison
(2008).
pub. Publicness index. It is defined as the simple average of excons, autocr,
polrights, and civlib.
Λ. Relative publicness of the prize, defined as Λ = pub/(pub + oilresv/gdp). It
corresponds to Λ(γ) in the text when the conversion factor γ is set equal to 1.
Table 10 presents the mean and the standard deviation of all the variables employed
in the empirical analysis.
REFERENCES
Ai, Chunrong, and Edward C. Norton. 2003. “Interaction Terms in Logit and Probit Models.” Econom-
ics Letters 80 (1): 123–29.
Alesina, Alberto, Arnaud Devleeschauwer, William Easterly, Sergio Kurlat, and Romain Wacziarg.
2003. “Fractionalization.” Journal of Economic Growth 8 (2): 155–94.
Atlas Narodov Mira. 1964. Miklukho-Maklai Ethnological Institute at the Department of Geodesy and
Cartography of the State Geological Committee of the Soviet Union: Moscow.
Banks, Arthur S. 2008. “Cross-National Time-Series Data Archive (CNTS) 1815–2007.” Databanks
International, Jerusalem, Israel (accessed October 1, 2010).
Besley, Timothy, and Torsten Persson. 2010. “State Capacity, Conflict, and Development.” Economet-
rica 78 (1): 1–34.
Blattman, Christopher, and Edward Miguel. 2010. “Civil War.” Journal of Economic Literature 48
(1): 3–57.
Brockett, Charles D. 1992. “Measuring Political Violence and Land Inequality in Central America.”
American Political Science Review 86 (1): 169–76.
Brubaker, Rogers, and David D. Laitin. 1998. “Ethnic and Nationalist Violence.” Annual Review of
Sociology 24: 423–52.
Cavalli-Sforza, L. Luca, Paolo Menozzi, and Alberto Piazza. 1994. The History and Geography of
Human Genes. Princeton, NJ: Princeton University Press.
Collier, Paul, and Anke Hoeffler. 2002. “On the Incidence of Civil War in Africa.” Journal of Conflict
Resolution 46 (1): 13–28.
Collier, Paul, and Anke Hoeffler. 2004. “Greed and Grievance in Civil War.” Oxford Economic Papers
56 (4): 563–95.
Collier, Paul, Anke Hoeffler, and Dominic Rohner. 2009. “Beyond Greed and Grievance: Feasibility
and Civil War.” Oxford Economic Papers 61 (1): 1–27.
Desmet, Klaus, Ignacio Ortuño-Ortín, and Shlomo Weber. 2009. “Linguistic Diversity and Redistribu-
tion.” Journal of the European Economic Association 7 (6): 1291–318.
Desmet, Klaus, Ignacio Ortuño-Ortín, and Romain Wacziarg. 2012. “The Political Economy of Eth-
nolinguistic Cleavages.” Journal of Development Economics 97 (1): 322–32.
Doyle, Michael W., and Nicholas Sambanis. 2003. “International Peacebuilding: A Theoretical and
Quantitative Analysis.” American Political Science Review 94 (4): 779–801.
Duclos, Jean-Yves, Joan Esteban, and Debraj Ray. 2004. “Polarization: Concepts, Measurements,
Estimation.” Econometrica 72 (6): 1737–72.
Esteban, Joan, and Debraj Ray. 1994. “On the Measurement of Polarization.” Econometrica 62 (4):
819–51.
Esteban, Joan, and Debraj Ray. 1999. “Conflict and Distribution.” Journal of Economic Theory 87
(2): 379–415.
Esteban, Joan, and Debraj Ray. 2008a. “On the Salience of Ethnic Conflict.” American Economic
Review 98 (5): 2185–202.
VOL. 102 NO. 4 Esteban et al.: Ethnicity and Conflict: an Empirical Study 1341
Esteban, Joan, and Debraj Ray. 2008b. “Polarization, Fractionalization and Conflict.” Journal of Peace
Research 45 (2): 163–82.
Esteban, Joan, and Debraj Ray. 2011. “Linking Conflict to Inequality and Polarization.” American
Economic Review 101 (4): 1345–74.
Esteban, Joan, Laura Mayoral, and Debraj Ray. 2012. “Ethnicity and Conflict: An Empirical Study:
Dataset.” American Economic Review. http://dx.doi.org/10.1257/aer.102.4.1310.
Fearon, James D. 2003a. “Ethnic and Cultural Diversity by Country.” Journal of Economic Growth 8
(2): 195–222.
Fearon, James D. 2003b. “Ethnic and Cultural Diversity by Country: Dataset.” Journal of Economic
Growth (accessed October 1, 2010).
Fearon, James D. 2005. “Primary Commodity Exports and Civil War.” Journal of Conflict Resolution
49 (4): 483–507.
Fearon, James D., and David D. Laitin. 1999. “Weak States, Rough Terrain, and Large-Scale Ethnic
Violence since 1945.” Paper presented at the Annual Meeting of the American Political Science
Association, Atlanta, GA.
Fearon, James D., and David D. Laitin. 2000. “Violence and the Social Construction of Ethnic Identity:
Review Essay.” International Organization 54 (4): 845–77.
Fearon, James D., and David D. Laitin. 2003a. “Ethnicity, Insurgency, and Civil War.” American Politi-
cal Science Review 97 (1): 75–90.
Fearon, James D., and David D. Laitin. 2003b. “Ethnicity, Insurgency, and Civil War: Dataset.” Politi-
cal Science Review (accessed October 1, 2010).
Freedom House. 2011. “Freedom in the World 1973–2012.” http://www.freedomhouse.org (accessed
January 1, 2011).
Gleditsch, Nils P., Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg, and Håvard Strand.
2002. “Armed Conflict 1946–2001: A New Dataset.” Journal of Peace Research 39 (5): 615–37
(accessed October 1, 2010).
Greenberg, Joseph H. 1956. “The Measurement of Linguistic Diversity.” Language 32 (1): 109–15.
Gurr, Ted Robert. 1996. “Minorities at Risk III Dataset: User’s Manual.” CIDCM, University of Mary-
land. http://www.cidcm.umd.edu/inscr/mar/home.htm.
Haber, Stephen, and Victor Menaldo. 2011. “Do Natural Resources Fuel Authoritarianism? A Reap-
praisal of the Resource Curse: Dataset.” American Political Science Review (accessed July 1, 2011).
Hansen, Bruce E. 1996. “Inference When a Nuisance Parameter Is Not Identified under the Null
Hypothesis.” Econometrica 64 (2): 413–30.
Horowitz, Donald L. 1985. Ethnic Groups in Conflict. Berkeley, CA: University of California Press.
King, Gary, and Langche Zeng. 2001. “Logistic Regression in Rare Events Data.” Political Analysis
9 (2): 137–63.
Laitin, David D. 2000. “What Is a Language Community?” American Journal of Political Science 44
(1): 142–55.
Lewis, M. Paul, ed. 2009. Ethnologue: Languages of the World. 16th ed. Dallas: SIL International.
http://www.ethnologue.com/ (accessed October 1, 2010).
Lichbach, Mark I. 1989. “An Evaluation of ‘Does Economic Inequality Breed Political Conflict?’
Studies.” World Politics 41 (4): 431–70.
Maddison, Angus. 2008. “Historical Statistics of the World Economy: 1–2008 AD: Dataset.” http://
www.ggdc.net/maddison/Maddison.htm (accessed October 1, 2010).
Midlarski, Manus I. 1988. “Rulers and the Ruled: Patterned Inequality and the Onset of Mass Political
Violence.” American Political Science Review 82 (2): 491–509.
Miguel, Edward, Shanker Satyanath, and Ernest Sergenti. 2004. “Economic Shocks and Civil Con-
flict: An Instrumental Variables Approach.” Journal of Political Economy 112 (4): 725–53.
Montalvo, Jose G., and Marta Reynal-Querol. 2005. “Ethnic Polarization, Potential Conflict, and Civil
Wars.” American Economic Review 95 (3): 796–816.
Muller, Edward N., and Mitchell A. Seligson. 1987. “Inequality and Insurgency.” American Political
Science Review 81(2): 425–52.
Muller, Edward N., Mitchell A. Seligson, Hung-der Fu, and Manus I. Midlarski. 1989. “Land Inequal-
ity and Political Violence.” American Political Science Review 83 (2): 577–96.
Nagel, Jack. 1974. “Inequality and Discontent: A Nonlinear Hypothesis.” World Politics 26 (4): 453–
72.
Olson, Mancur. 1971. The Logic of Collective Action. Cambridge, MA: Harvard University Press.
Polity IV. 1800–2009. “Polity IV Project: Political Regime Characteristics and Transitions. ” http://
www.systemicpeace.org/polity/polity4.htm (accessed 1 October, 2010).
Ray, Debraj. 2009. “Remarks On the Initiation of Costly Conflict.” Unpublished.
1342 THE AMERICAN ECONOMIC REVIEW june 2012
Ross, Michael. 2006. “A Closer Look at Oil, Diamonds and Civil War.” Annual Review of Political Sci-
ence 9: 265–300.
Ross, Michael. 2011. “Replication Data for: Oil and Gas Production and Value, 1932–2009.” http://
dvn.iq.harvard.edu/dvn/dv/mlross (accessed May 1, 2011).
Rummel, Rudolph J. 1963. “Dimensions of Conflict Behavior Within and Between Nations.” General
Systems Yearbook 8: 1–50.
Schneider, Gerald, and Nina Wiesehomeier. 2006. “Ethnic Polarization, Potential Conflict, and Civil
Wars: Comment.” Unpublished.
Sen, A. K. 1966. “Labour Allocation in a Cooperative Enterprise.” Review of Economic Studies 33:
361–71.
Stram, Daniel O., and Jae Won Lee. 1994. “Variance Components Testing in the Longitudinal Mixed
Effects Model.” Biometrics 50 (4): 1171–77.
Strand, Håvard. 2006. “Onset of Armed Conflict: A New List for the Period 1946–2004, with Appli-
cations.” Unpublished.
Wolfson, Michael C. 1994. “When Inequalities Diverge.” American Economic Review 84 (2): 353–58.
World Values Survey. 2005. v.2009090901, 2009. World Values Survey Association. www.world
valuessurvey.org (accessed January 2011).