In order to describe the isonymic structure of Albania, the distribution of 3,068,447 surnames was studied in the 12
prefectures and their administrative subdivisions: the 36 districts and 321 communes. The number of different surnames
found was 37,184. Effective surname number for the entire country was 1327, the average for prefectures was 653.3
84.3, for districts 365.9 42.0 and for communes 122.6 8.7. These values display a variation of inbreeding between
administrative levels in the Albanian population, which can be attributed to the previously published Prefecture effect.
Matrices of isonymic distances between units within administrative levels were tested for correlation with geographic
distances. The correlations were highest for prefectures (r = 0.71 0.06 for Euclidean distance) and lowest for communes
(r = 0.37 0.011 for Neis distance).
The multivariate analyses (Principal component analysis and Multidimensional Scaling) of prefectures identify three main
clusters, one toward the North, the second in Central Albania, and the third in the South. This pattern is consistent
with important subclusters from districts and communes, which point out that the country may have been colonised by
diffusion of groups in the North-South direction, and from Macedonia in the East, over a pre-existing Illiryan population.
Introduction 1920, and excluding the World War II parenthesis, it has been
independent ever since.
Albania has a long and complex history. It was populated by The language spoken in Albania is a separate Indo-
an Aryan people, the Illiryans, around 3000 BC. In historical European branch spoken by more than 7 million persons,
times, it was conquered by the Macedons of Phylip in 300 and has influences from Latin, Greek, and in modern times
350 BC, coming under Greek power. Then, it became a from Southern Slavic. The land is mountainous, and the Alba-
Roman province first under the Republic and then under nians call themselves Shqipetari, children of the eagles. The
the Empire for about five centuries. After the split of the present language is derived from the Toske dialect, which is
Empire, it stayed under the rule of the Byzantines until the spoken in the South of the country, as opposed to the Gege di-
15th century, when it became part of the Ottoman Empire. alect in the North. Due to the relative isolation of the country
When the Ottoman Empire dissolved in 1912, nationalism and to minor settlements of invading armies over the course
arose in Albania, and the country gained independence in of centuries, its population seems of considerable interest for
the study of population genetics.
However, studies of the genetic structure of the Albanian
population are recent and few, and refer mainly to the fre-
Corresponding author: Chiara Scapoli, Department of Life Sci- quencies of traditional blood group markers (Mikerezi et al.,
ences and Biotechnology, University of Ferrara, Via L. Borsari 46,
I-44121 Ferrara, Italy. Tel: +39-0532-455744; Fax: +39-0532- 1995; Susanne et al., 1996) and to the distribution of sur-
249761; E-mail: [email protected] names (Mikerezi et al., 2003). In this work, we continue to
I. Mikerezi et al.
In the following subsections, we briefly touch on and re- where the summation is over all surnames. Neis distance (Nei,
call the definitions of some of the statistics derived from the 1973) is
surname distributions and their meaning in the study of mi-
croevolution in human groups (for an exhaustive review, see Iij
Nd = log .
Relethford, 1988). (Iii Ijj )
Euclidean and Neis distances have been developed for
Isonymy within and between groups purely genetic data; however, they can be applied to the fre-
The main statistics derived from surname distributions are: quencies of surnames, since these simulate alleles at a locus in
(1) isonymy within a group J, namely Ijj = k pkj 2 where the recombining region of the Y chromosome (the daughters
pkj is the relative frequency of surname k in group J, and the inherit the surname with the paternal X chromosome).
sums comprise all surnames; and (2) As geographical coordinates, we used the centroids of
random isonymy between
groups I and J estimated as Iij = k pki pkj ; where pki and pkj prefecture, district and commune areas obtained from the
are the relative frequencies of surname k in groups I and J, ArcGis (ESRI) map downloaded from Global Administra-
respectively, and the sums comprise all surnames. tive Areas site (
The distribution of surnames between groups, in this case The correlations of isonymic distances with the geographic
prefectures, districts, and communes, is useful for assessing ones give very similar results independently from the isonymic
their population similarities, under the limit hypothesis of index used, and this is further indication that either of the
common origin. isonymy measures can be used without loss of generality.
The significance of correlations was assessed with the Man-
tels test using 1000 permutations (Mantel, 1967; Smouse
Fishers alpha () et al., 1986). For a graphic representation of the surname re-
Fishers was estimated according to Barrai et al. (1996). It lationship between different prefectures, these were mapped
estimates the number of surnames having equal frequency, on the first and second dimension of the Multidimensional
which would result in the same isonymy as that observed. Scaling (MDS) of Laskers distance matrix. In order to de-
It is exactly homologous to the allele effective number in tect the direction of surname diffusion, following Menozzi
a genetic system (Barrai et al., 2000). A small value of et al. (1978), the first three components from the Principal
would indicate large inbreeding and drift, whereas a large Component Analysis (PCA) of the same matrix, were also
value would indicate migration and low inbreeding. It has projected individually on the Albania map, with the ArcGis R
been verified (Wright, 1951) that in the presence of a rate of (ESRI) software package. To complement and clarify the clus-
migration (m): FST = 1/(4Nm + 1), then, = Nm + (1/4), tering, we built dendrograms (Ward, 1963; Cavalli-Sforza &
since FST = I/4 (Crow & Mange, 1965) and = 1/I for large Edwards, 1967) of prefectures and of districts. These were
samples (Rodriguez-Larralde et al., 1993). Then, for large obtained from the matrix of Lasker distances between admin-
N, tends to Nm. This makes a useful predictor of the istrative sections, using the agglomeration method of Ward
evolutionary dynamics of a system, and a sufficient indicator (1963). They were considered only as a help to the cluster-
of structure. ing, we do not imply that the present situation was generated
by subsequent splits of preexisting clusters.
Isolation by distance
To detect isolation by distance, we calculate the linear Random kinship
correlation of surname distances (Laskers, Euclidean and Random kinship IJ (x) between any two localities I and J at
Neis) between localities I and J, with their geographic distance x is given by
distances. IJ (x) = K exp (Bx) (Malecot, 1955; Kimura, 1960)
Laskers distance (Rodriguez-Larralde et al., 1998) is
defined as where K is the average kinship at geographic distance x =
0, say average FST , and B is a function of average mutation
L = log(Iij ). rate and of the variance of x. Then, IJ (x) is always positive
and is expected to decrease exponentially to 0 with increasing
Euclidean distance (Cavalli-Sforza & Edwards, 1967) is de- distance. Random kinship was defined as
fined as
IJ (x) = IIJ (x)/4
E = 1 pki pkj (Barrai et al., 2012) with average FST as the average kinship
k at distance x = 0.
C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 3
I. Mikerezi et al.
Results and Discussion rences, equal to 19.0% of the total number of surnames used
here. The most frequent surnames are Hoxha with 39,088 oc-
The Most Frequent Surnames currences, C ela with 14,632, Marku with 13,852, Shehu with
The distribution, by prefecture and district, of the surname 12,348, and Muca with 12,236. After these, one finds Kola
numbers used in the analysis with the main parameters de- (11,443), Dervishi (10,953), Gjoka (10,191), Kurti (10,152)
rived from the isonymy theory, are given in Table 1. The and in 10th place Koci (9533). Overall, the first 10 surnames
data for communes and bashkias are presented in Table S1 comprise 144,428 individuals, or 4.7% of the total number of
available, as all further supplementary materials mentioned electors.
in this paper, at our website: Surnames of clear Arabic origin are frequent in the North
alberto.carrieri/ricerca.htm. and the East of Albania. Dervishi (10,953), which is seventh
In Figure S1, we give the distribution of the logarithm of in the general list, is the first name of clear Arabic origin,
the number of surnames over the logarithm of the number followed by Elezi (8155), Sinani (6237), Hasani (4541), and
of times they occur (Fox & Lasker, 1983; Zipf, 1935. See Osmani (4103). The Turkish language was the main vehi-
this last reference for the meaning and uses of the log-log cle for other frequent surnames that were formed by first
distribution). In this case, it is fairly linear (Fig. S1). It is names of Arabic or Persian origin like Brahimaj (1684),
called a typical rank-size distribution or Zipfian curve, and Brahimi (2225), Elezaj (1970), Islami (1751), among several
it is so named by glottologists (Adamic & Huberman, 2002), others.
and here it indicates the number of instances (people) with a Greek surnames, a result of the influence of the Christian
unique surname. orthodox religion, are frequent in the South of the Coun-
In Albania, surnames originated and have been established try, which borders with Greece. Short lists of the 30 most
generally in the same way as in other European countries. frequent Albanian surnames of Arabian and Greek origin are
The Albanian language belongs to the Indo-European group, given in Tables S3 and S4. However, these lists are by far
and, despite several exchanges with other languages, it has incomplete, since they are based on our knowledge of Arabic
preserved its own structure in its formative elements. Accord- and Greek, knowledge, which is very limited. In particular,
ing to Bidollari (2010), the language does not possess general for the Greek names, we list only those which start with
rules, as other Indo-European languages, for the patronimic Papa (which means priest, father) to avoid uncertain-
formation like the suffixes -ad`es, -eid`es, -poulos in Greek, -ez ties. There are 9961 surnames beginning with Papa, which
in Spanish and Portuguese, -escu in Romanian, -ich in Slavic are joined with another name of Christian (or sometimes non-
languages and so on. It does not possess suffix elements indi- Christian) origin, like Papajani, Papajorgji, and Papanikolla.
cating lineage like -son, preferred frequently in the English and Note the curious Papazisi, which might be a translocation of
Swedish languages, or -sohn in German, and -sen in Danish. the Arabic Aziz (which means strong) on the Greek Papa.
However, many Albanian surnames have been formed by the So Papazisi might be the father of the strong.
patronymisation process of the anthroponyms (first names),
ethnonyms and toponyms in all the cases when it was neces-
sary to indicate social or geographic origin. Isonymy Parameters in Albanian Prefectures,
Albania was for nearly five centuries under Turkish oc- Districts, and Communes
cupation. Therefore, several surnames, like Hoxha, Hoxhaj,
Shehu, Shehaj, Dervishi and others have been introduced Fishers alpha and inbreeding by isonymy
through the Muslim religion indicating in such cases lev- Values of and FST are given in Table 1 for prefectures and
els of the religious hierarchy. Some other surnames have been districts and in Table S1 for communes. We recall that , the
strongly influenced by the Turkish language, for example, sur- effective
2 surname number, is the inverse of isonymy I (I =
names that have been formed by the introduction of suffixes p and = 1/I, Barrai et al., 1996), so that FST = 1/(4)
like -llari, -xhi, -lli and -li. and then the meaning of is exactly homologous to the
Here, we deal with 3,068,447 persons and 37,184 sur- effective allele number of genetic systems.
names, so that the average number of instances (persons) The effective surname number , in Albania, was estimated
having an unique surname, the so called type-token ra- at 1327 for the country, considered as a unit. The average for
tio of glottologists, is 82 (see further down our ratio Sample the 12 prefectures was 653.3 84.3. For the 36 districts, it
Size/Surnames in Table 2 and King and Jobling (2009) for was 365.9 42.0 and for the 321 communes it was 122.6
other type-token ratios in Europe). 8.7. The difference between the estimates of , then of
We studied in some detail the 100 most frequent surnames FST , in prefectures, districts, communes and for the country
(Table S2). Overall, these surnames comprise 583,708 occur- as a unit, is observed when different subdivisions of the same
Table 1 Prefecture, district, number of surnames N, number of different surnames S, Fishers , Karlin-McGregor , isonymy I, and FST in
Albania. Districts grouped by prefecture.
C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 5
I. Mikerezi et al.
Table 2 Comparison of isonymy parameters in nine European countries, in five South-American countries, in the United States and Texas,
and in Yakutia. Overall, 122 million surnames were analysed.
Austria 1 140,766 854 0.59 7.1
Albania 3.0 37,184 123 0.71 82
Belgium 1.1 137,442 997 0.74 8
France 6 495,104 1615 0.69 12.1
Germany 5.2 462,526 1596 0.51 11.2
Holland 2.4 126,485 787 0.46 19
Italy 5.1 215,623 1236 0.61 23.7
Switzerland1 1.7 166,116 891 0.72 10.2
Spain 3.6
Paternal 94,886 134 0.21 38
Maternal 110,034 144 0.26 33
Yakutia 0.5 44,625 107 0.69 11.1
North America
United States 18 899,585 1366 0.24 20
Texas 3.6 235,740 734 0.42 15.3
South America
Argentina3 22.6 414,441 422 0.47 54.5
Venezuela2 3.9 68,665 122 0.78 56.8
Bolivia4 23.2 174,922 122 0.5 144.6
Paraguay3 4.8 39,047 108 0.42 122.9
area and population are considered. Very properly in the case In the analysis, is significantly and negatively correlated
of Albania, the difference constitutes the Prefecture Effect, (r = 0.16) with latitude, possibly due to the average higher
identified for FST by Nei and Imaizumi (1966), in Japan, and population density of southern communes. So, the largest
so named by Scapoli et al. (2007). Nei and Imaizumi observed values of (the inverse of isonymy) were seen in the large
that, for the same area and population, small subdivisions have towns, which are also capitals of prefectures. Highest s for
larger FST , and larger subdivisions have smaller FST . In their communes were 1245 in the commune of Korce , 1222 in
study, the effect was seen in towns and in the Japanese prefec- Tirana, 990 in Durres, 748 in Vlore, and 720 in Shkoder.
tures where the towns were located; hence the name. It could These large communes give the name to the prefectures where
also be named a geographic scale effect that intervenes in they are located. The lowest values observed in communes
many phenomena since it is just a question of heterogeneity were = 7 in Sheze, in the prefecture of Elbasan, = 10
increasing with population size. Of course, the prefecture ef- in Hysgjokaj and = 11 in Ballagat, both communes in
fect is visible both on FST and . It appears that Albania is the prefecture of Fier, and = 12 in Shtiqen and = 13
no exception, and, since is inversely related with FST, the in Surroj, both in Kukes. These communes are located in
sequence mountainous areas and have a small population.
FST Prefecture < FST District < FST Commune
is respected. Isolation by distance
In Albania, the lowest levels of random inbreeding, indi- We studied isolation by distance through the correlation
cated by FST , are expected and observed in the highly popu- of geographic with surname distances at the prefecture,
lated areas of the central part of the country, the area around district and commune levels. We found that Euclidean,
the capital Tirana. Neis and Laskers distance between the 12 prefectures were
Figure 2 Variation of Laskers distance between prefectures Figure 3 Variation of Laskers distance (s.d.) over kilometres
with geographic linear distance. between 321 communes in Albania.
C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 7
I. Mikerezi et al.
and, possibly due to the large number of pairwise distances the South of the country. One district in the South, Tepelene,
we had available, it is also applicable to a geographic structure, clusters with a Central-Northern belt of the seven districts of
which, like Albania, is elongated from North to South but Durres, Kruje, Tirane, Mat, Bulqize, Diber and Kukes.
is poorly linear. We were not surprised when we observed A second central main cluster, south of the former, includes
the considerable agreement between the Malecot model and in an East-West belt the districts of Kavaje, Lushnje, Peqin,
kinship decay in Chile, since this latter country is practically Elbasan, Gramsh and Librazhd.
linear. Still, the agreement between the model and the ob- Then, comes a Southern group of districts: Kucove, Berat,
served decay of kinship over kilometric distance in Albania, Skrapar, Korce , Pogradec and Devoll. All these are adjacent
which is elongated but far from linear, is indicative of a gen- also geographically. However, we underline that the cluster-
eral validity of the model although originally it was derived ing of Malesi e Madhe, Tropoje and Has in the North, with
only for a linear structure. the Vlore cluster in the South, might indicate injection, be-
tween North and South of Albania, of eastern groups from
Macedonia toward the Adriatic (Fig. S9).
Relations between the Administrative Sections From the projection, some other minor but relevant points
of Albania emerge, which complement the clustering. In particular, the
Tirane district stands at the centre of the bidimensional pro-
In order to obtain a general idea on the movements of popu- jection, with Durres. This might indicate that these districts,
lation groups in Albania, we conducted MDSs and PCAs on which together comprise almost one quarter of the Albanian
the matrix of Laskers distances between prefectures, between population, possess most of the surnames of the nation.
districts and between communes. We report here and as sup- Malesi-e-Madhe in Shkoder, and Mallakaster in Fier are
plementary material some of the results of these analyses. marginal both on the projection and in Albanian geography,
bordering, respectively, Montenegro, Kosovo at North and
Prefectures the limit of the Toske dialect in the South.
The MDS projection on the first two dimensions of the ma- A visual indication of the isonymic proximity of districts is
trix between prefectures (Fig. S6) differentiates a few clusters, given by the maps of Figure 5 where the similarity of districts
which correspond to groups of neighbouring prefectures. In is indicated by the similar intensity of the same colour. It is
the resulting dendrogram (Fig. S7), a first large cluster com- appropriate at this point to indicate that recently new methods
posed mainly of the central prefectures is observed: Tirane, of identifying spatial concentration of surnames have been de-
Durres, Elbasan, Diber, Fier and Berat. These last two form veloped (e.g. Longley et al., 2011; Chesire & Longley, 2012),
a subcluster within this cluster. Then, three prefectures in which give specific examples on various ways of clustering and
the South-East and the extreme South, namely Korce , Vlore representing geographical dimensions of surname frequency
and Gjirokaster, form the next cluster. Finally, two prefectures data. Most interesting seem the developments which include
of the North cluster together, Shkoder and Lezhe, whereas forenames to detect ethnicity of groups (Mateos et al., 2011).
Kukes represents an exception because, despite being a moun- This adds a further dimension to isonymy studies, which needs
tainous prefecture of the North, clusters together with the to be explored.
Central prefectures, possibly due to the emigration from the
poorer areas toward the highly populated and richer areas
around the capital Tirana. Communes
From the MDS projection in Figure S6, some other minor We found that, only at the commune level, there were 157
but relevant points emerge, which complement the clustering pairs of communes out of 51,360, which did not share sur-
of prefectures. In particular, Tirane, Durres and Elbasan stand names. Out of these 157 pairs, 49 included the commune of
alone at the centre of the bidimensional projection, removed Liqenas in Korce , which has a mainly Macedonian popula-
from the other prefectures. Vlore is marginal as is Korce . tion. Also, 34 pairs included the commune of Lure in Diber,
but we did not find a good reason for this last preference.
Of course, there are various reasons why in Albania this ab-
Districts sence of the same surname in small communes may occur.
The projection on the first two dimensions of the MDS tends We believe that, among others, one reason is to be found
to differentiate several clusters, which correspond fairly well in the complexity of the Albanian alphabet, which often re-
to neighbouring districts (Fig. S8). sults in the same name being written differently in different
In the dendrogram (Fig. S9), the districts of Malesi e communes. However, there is also some effect of distance on
Madhe, Tropoje and Has, at the Northern border with Mon- the phenomenon. The average geographic distance between
tenegro, cluster with Fier, Mallakaster and Vlore, which are in the 157 pairs having infinite Laskers and Neis distance is
Figure 5 Projection of Laskers matrix of surname distances on districts in Albania by mapping (A) the first three
PCAs factors (I: Factor 1 = 42.8%; II: Factor 2 = 26.9%; III: Factor 3 = 11.5%) (B) the first three MDSs dimensions
(I: Dimension 1; II: Dimension 2; III: Dimension 3. Stress 11.2%).
128.9 14.7 km. The average distance for the other 51,203 for Nei (refer again to Fig. S3 for Laskers distance between
pairs is 95.9 0.06 km, and the difference is significant (t[oo] = communes).
8.568, P << 0.0001). We bypassed the problem posed in the As noted briefly above, we estimated in the commune of
multivariate analysis of the distance matrices, by the elements Korce , the capital of the homonymous prefecture, the highest
of infinite value, by substituting to the 157 infinite isonymic value of in the Country (1245). Relatively high and low
distances, the nearest maximum observed. In this way, we met estimates of inbreeding are also observed in several communes
no complexities in the subsequent analysis of the distance ma- of the central area in the Tirana region. This might explain
trices of Lasker and Nei. It is important to note that if the the position of this prefecture relative to the other groups and
157 infinite distances are excluded, the correlations for com- might indicate recent immigration toward the main urban
munes rise from 0.44 to 0.47 for Lasker, and from 037 to 0.39 area of Albania. Low (and high FST ) are observed in the
C 2013 Blackwell Publishing Ltd/University College London Annals of Human Genetics (2013) 9
I. Mikerezi et al.
communes of Elbasan (one commune with = 7), Fier (one be ordered in the North-South direction. However, although
commune with = 10 and a second one with = 11), and this component indicates movement in the North-South di-
Kukes (one communes with = 12 and one with = 13). rection, the sense of movement cannot be detected from it,
We do not present here either the projection of the first unless we accept that the highest deviations are the most re-
two dimensions of MDS for the 321 321 matrix of the cent ones.
communes nor the dendrogram derived from it. Both are, Overall, the three components account for 81.2% of the
however, given as Figures S10 and S11. Since the projec- surname variation as obtained from Laskers distance matrix.
tions of the individual names of 321 elements are illegible, The mappings of the first three dimensions of the MDS
we decided to label with different symbols the communes seem to us compatible with those obtained from the PCA.
North of the River Shkumbin in central Elbasan (152 and The indication of possible East-West movement seems clear
169, respectively) to detect whether the two main groups of enough for the first and second dimension, and less so for
points depicted (Fig. S10) in the projection contain a major- the third. So, this isonymic structure of Albania seems to
ity of communes where Gege or Toske is spoken. In fact, the be mainly due to ancient migration from the East toward
subdivision is sharp, a vast majority the Gege-speaking com- the coast, with radiation toward the North and South, with
munes cluster together as do those speaking Toske. Then, the subsequent isolation and drift, with drift and short-range mi-
two main groups identified through surname distances are gration playing a major role in the generation of the present
highly correlated with the two linguistic areas of Albania, the geographical variation of surnames.
Gege area in the North and the Toske area in the South.
We used the same technique to visualise the clusters in
the dendrogram, putting the labels G and T at the endpoints
of the graph (Fig. S11). The resulting clusters correlate with The methodology described in this paper was used to analyze
latitude, but the North-South distribution of communes is the isonymic structure of several South American countries
not as clear as in the projection from the MDS. (Rodriguez-Larralde et al., 2000, 2011; Dipierri et al., 2005,
2011; Barrai et al., 2012). In these countries, 4 (Venezuela),
24 (Argentina), 23 (Bolivia), 4.5 (Paraguay) and 16.5 (Chile)
Mapping of the first three components of Laskers matrix million surnames from the registers of electors were used.
The structures revealed by the MDSs and the dendrograms In European countries and in the United States, we anal-
are only partially indicative of the possible movements of the ysed surnames of telephone users (Barrai et al., 2001; Scapoli
population, therefore, to have a general idea of the direc- et al., 2005, 2007; Rodriguez-Larralde et al., 2007). In thinly
tion, if any, of settlements in Albania, we mapped on the populated Siberia, we used half a million surnames (Tarskaya
nation (following Menozzi et al., 1978) the first three com- et al., 2009). The average value of for all the cities (or states,
ponents of the matrix of Laskers distance, obtained from a in the case of Venezuela and the United States, or districts,
PCA and from the MDS. We provide the PCA components in the case of Argentina and Paraguay), and the isolation by
because the relative importance of each component is given by distance measured by the correlation between isonymic and
the corresponding eigenvalue, while the MDS provides the geographic distances, are given in Table 2 for the countries
value of the stress for a judgement of the overall fitting on the studied up to now. Several features emerge from the compar-
three dimensions. The resulting maps are given in Figure 5 isons reported in Table 2. First, the general similarity among
(A for PCA and B for MDS, respectively). European nations in profusion of surnames as measured by ,
The variation of the first component, which accounts for and for isolation by distance, as measured by the linear corre-
almost half of the variability (42.8%) in the North-South lation. Secondly, the relatively small value of in Venezuela,
direction, indicates movement from the centre of the coun- Bolivia, Paraguay, Spain, Chile and now Albania; and thirdly,
try toward North and South. This might mean that from a the practical absence of isolation by distance in the United
chronological point of view, immigration was in the East- States, excluding bilingual Texas (Rodriguez-Larralde et al.,
West direction from Macedonia, establishing a centre of high 2007). In Albania, the average number of persons having the
density of migrants, which subsequently moved North and same surname (measured by the ratio Sample Size/Surnames,
South. The third component (11.5%) gives the same indi- given as the index SS/S in Table 2, is more similar (82) to
cation, although with minor intensity. Then, the sense of that of Argentina, Bolivia and Venezuela than to that of other
movement may be hypothesised from the East toward the European countries. It may be of some interest to compare
Adriatic Coast, since the entry of surnames from the sea in our Table 2 with King and Joblings (2009) table 1. There,
significant numbers is unlikely. they give the mean number of carriers per surname in 5538
It appears that only the second component (26.9%) is some- households in 27 countries. Where applicable, their results are
what directional; the deviations from the second axis appear to consistent with ours.
Nei, M. (1973) The theory and estimation of genetic distance. In: Supporting Information
Genetic structure of populations (ed. N. E. Morton). Hawaii: Hawaii
University Press. Additional supporting information may be found in the online
Nei, M. & Imaizumi, J. (1966) Genetic structure of human popula- version of this article:
tions. I. Local differentiation of blood groups gene frequencies in
Japan. Heredity 21, 936. Table S1 Distribution of isonymy parameters.
Relethford, J. H. (1988) Estimation of kinship and genetic distance
from surnames. Hum Biol 60, 475492. Table S2 The 100 most frequent surnames in Albania.
Rodriguez-Larralde, A., Barrai, I. & Alfonzo, J. C. (1993) Isonymy
structure of four Venezuelan states. Ann Hum Biol 20, 131145. Table S3 The most frequent names of Arabic origin in
Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C., Albania.
Mamolini, E. & Barrai, I. (1998) Isonymy and the genetic struc-
ture of Switzerland. II. Isolation by distance. Ann Hum Biol 25, Table S4 Surnames with the prefix Papa of clear Greek
533540. origin.
Rodriguez-Larralde, A., Morales, J. & Barrai, I. (2000) Surname
frequency and the isonymy structure of Venezuela. Am J Hum Figure S1 Variation of the number of occurrences in 3 mil-
Biol 12, 352362. lion surnames in Albania.
Rodriguez-Larralde, A., Gonzalez-Martin, J., Scapoli, C. & Barrai,
I. (2003) The names of Spain: A study of the isonymy structure Figure S2 Variation of Laskers distance between 36 districts
of Spain. Am J Phys Anthropol 121, 280292. in Albania.
Rodriguez-Larralde, A., Scapoli, C., Mamolini E. & Barrai, I. (2007)
Surnames in Texas: A population study through isonymy. Hum Figure S3 Variation of Laskers distance between 321 com-
Biol 79, 215239. munes in Albania.
Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Scapoli, C.,
Mamolini, E., Salvatorelli, G., De Lorenzi, S., Carrieri, A. Figure S4 Variation of Euclidean with geographic distance.
& Barrai, I. (2011) Surnames in Bolivia: A population study
through isonymy. Am J Phys Anthropol 144, 177184. doi: Figure S5 Variation of Neis with geographic distance.
10.1002/ajpa.21379. Figure S6 MDS on the matrix of Laskers distances between
Scapoli, C., Goebl, H., Sobota, S., Mamolini, E., Rodriguez-
Larralde, A. & Barrai, I. (2005) Surnames and dialects in France: Prefectures.
Population structure and cultural transmission. J Theor Biology 237, Figure S7 Dendrogram of Albania prefectures.
Scapoli, C., Mamolini, E., Carrieri, A., Rodriguez-Larralde, A. & Figure S8 MDS of Laskers distance matrix between districts.
Barrai, I. (2007) Surnames in Western Europe: A comparison of
the subcontinental populations through isonymy. Theor Popul Biol Figure S9 Dendrogram of districts from the matrix of Laskers
71, 3748. distance.
Smouse, P. E., Long, J. C. & Sokal, R. R. (1986) Multiple re-
gression and correlation extensions of the Mantel test of matrix Figure S10 Projection of the 321 communes of Albania on
correspondence. Syst Zool 35, 627632. the first two dimensions of the matrix of Laskers distances.
Susanne, C., Bajrami, Z., Kume, K. & Mikerezi, I. (1996) Gene
differentiation at the ABO, MN and Rhesus loci among Albanians Figure S11 Dendrogram of communes.
and their relation with other Balkan populations. Gene Geogr 10,
3136. As a service to our authors and readers, this journal provides
Tarskaya, L., Elchinova, G. I., Scapoli, C., Mamolini, E., Carrieri, A. supporting information supplied by the authors. Such mate-
Rodriguez-Larralde, A. & Barrai, I. (2009) Surnames in Siberia. rials are peer-reviewed and may be re-organised for online
A study of the population of Yakutia through isonymy. Am J Phys delivery, but are not copy-edited or typeset. Technical sup-
Anthropol 138, 190198.
Ward, J. H. (1963) Hierarchical grouping to optimize an objective port issues arising from supporting information (other than
function. J Am Statist Assoc 58, 236244. missing files) should be addressed to the authors.
