Last Names of Albanians Ahg12015

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

doi: 10.1111/ahg.

12015

Surnames in Albania: A Study of the Population of Albania


through Isonymy
Ilia Mikerezi2 , Endrit Xhina2 , Chiara Scapoli1 , Guido Barbujani1 , Elisabetta Mamolini1 ,
Massimo Sandri1 , Alberto Carrieri1 , Alvaro Rodriguez-Larralde3 and Italo Barrai1
1
Department of Life Sciences and Biotechnology, University of Ferrara, 44121, Ferrara, Italy
2
Department of Biology, Faculty of Natural Sciences, Tirana, Albania
3
Centro de Medicina Experimental, Laboratorio de Genetica Humana, IVIC, Apdo. 20632, Caracas 1020A, Venezuela

Summary
In order to describe the isonymic structure of Albania, the distribution of 3,068,447 surnames was studied in the 12
prefectures and their administrative subdivisions: the 36 districts and 321 communes. The number of different surnames
found was 37,184. Effective surname number for the entire country was 1327, the average for prefectures was 653.3
84.3, for districts 365.9 42.0 and for communes 122.6 8.7. These values display a variation of inbreeding between
administrative levels in the Albanian population, which can be attributed to the previously published Prefecture effect.
Matrices of isonymic distances between units within administrative levels were tested for correlation with geographic
distances. The correlations were highest for prefectures (r = 0.71 0.06 for Euclidean distance) and lowest for communes
(r = 0.37 0.011 for Neis distance).
The multivariate analyses (Principal component analysis and Multidimensional Scaling) of prefectures identify three main
clusters, one toward the North, the second in Central Albania, and the third in the South. This pattern is consistent
with important subclusters from districts and communes, which point out that the country may have been colonised by
diffusion of groups in the North-South direction, and from Macedonia in the East, over a pre-existing Illiryan population.
Keywords: Albania, population structure, isonymy, inbreeding, isolation by distance

Introduction
Albania has a long and complex history. It was populated by
an Aryan people, the Illiryans, around 3000 BC. In historical
times, it was conquered by the Macedons of Phylip in 300
350 BC, coming under Greek power. Then, it became a
Roman province first under the Republic and then under
the Empire for about five centuries. After the split of the
Empire, it stayed under the rule of the Byzantines until the
15th century, when it became part of the Ottoman Empire.
When the Ottoman Empire dissolved in 1912, nationalism
arose in Albania, and the country gained independence in

Corresponding author: Chiara Scapoli, Department of Life Sciences and Biotechnology, University of Ferrara, Via L. Borsari 46,
I-44121 Ferrara, Italy. Tel: +39-0532-455744; Fax: +39-0532249761; E-mail: [email protected]

232

Annals of Human Genetics (2013) 77,232243

1920, and excluding the World War II parenthesis, it has been


independent ever since.
The language spoken in Albania is a separate IndoEuropean branch spoken by more than 7 million persons,
and has influences from Latin, Greek, and in modern times
from Southern Slavic. The land is mountainous, and the Albanians call themselves Shqipetari, children of the eagles. The
present language is derived from the Toske dialect, which is
spoken in the South of the country, as opposed to the Gege dialect in the North. Due to the relative isolation of the country
and to minor settlements of invading armies over the course
of centuries, its population seems of considerable interest for
the study of population genetics.
However, studies of the genetic structure of the Albanian
population are recent and few, and refer mainly to the frequencies of traditional blood group markers (Mikerezi et al.,
1995; Susanne et al., 1996) and to the distribution of surnames (Mikerezi et al., 2003). In this work, we continue to


C

2013 Blackwell Publishing Ltd/University College London

Surnames in Albania

investigate the Albanian population with the aim of detecting its structure through the isonymic methods as defined by
Crow and Mange (Crow & Mange, 1965) in the three administrative levels of the nation, namely: 12 prefectures, 36
districts and 321 communes. The data that were made available to us are the surnames of the electors of the 2009 general
elections database.
We report here how, in Albania, isonymic distance varies
with geography, as we observed in other European countries.
We obtained indications of the direction of migration, by
studying the geographic heterogeneity of surnames. For each
level, we studied the surname effective number, , and the
value of random inbreeding, FST .
We recall that surnames are a weak marker for inbreeding
and a strong marker for migration. Two Bianchi in Italy
may be more or less distantly related, as two White in
Britain, but one Bianchi in Britain or one White in Italy
are indicative of migration, as clearly as an immunofluorescent
cell in a negative field. With this proviso, our aim in this work
was the study of the present isonymic structure of Albania
resulting from surname drift and population movements in an
area about 320 km long and on average about 90 km wide,
bordering with the Adriatic sea, South of Montenegro and
Kosovo, West of Macedonia and North of Greece.

Materials and Methods


Administrative Subdivisions of Albania
In 2011, one of the authors (IM) obtained from the Central Election Commission (CEC) of Albania the data suitable
for describing the isonymy structure of the country with the
methodologies developed by us. In the data that were made
available, a total of 3,068,447 individuals were distributed in
the 12 prefectures, the 36 districts, and in 373 communes. The
Albanian Administration classifies as communes 308 such
units which are prevalently agricultural, plus 65 bashkias
which are predominantly urban. However, several communes
are pooled for electoral purposes, so that we had available
321 lower units, some of them groups of smaller units. In
this analysis, we decided to use these hierarchical subdivisions
as statistical units, since the geography of all three levels is
well-defined, and all the individuals in the sample available
are classified accordingly, communes inside districts inside
prefectures inside Albania. Hence, for the analysis, we had
available 37,184 surnames of more than 3 million individuals,
all classified according to the administrative subdivisions.
The area studied covers the entire nation, about 28,000
square km, an area slightly larger than Sicily. The 12 prefectures differ in position, area, and population. The prefectures,
districts, and communes are indicated in Figure 1. There are


C

2013 Blackwell Publishing Ltd/University College London

Figure 1 Distribution of the 321 communes (dots) in the 12


prefectures and 36 districts as acquired from 2009 census data in
Albania.

six prefectures in the North, the northernmost being Shkoder,


Kukes, and Lezhe, then southward the two prefectures of
Diber and Durres, followed by the prefecture of the capital
Tirane. Traditionally, the River Shkumbin in the central zone
across the prefecture of Elbasan separates the North from the
South and the two dialects of Albania, the Gege from the
Toske. The South has six prefectures, namely Elbasan itself,
Fier and Berat, and Korce , Vlore, and Gjirokaster. The last
three are the southernmost and border with Greece.
Differences in surnames due to the complexities of the 36letter Albanian alphabet were maintained through the proper
ASCII codes.

Annals of Human Genetics (2013) 77,232243

233

I. Mikerezi et al.

In the following subsections, we briefly touch on and recall the definitions of some of the statistics derived from the
surname distributions and their meaning in the study of microevolution in human groups (for an exhaustive review, see
Relethford, 1988).

Isonymy within and between groups


The main statistics derived from surname distributions
are:

(1) isonymy within a group J, namely Ijj = k pkj 2 where
pkj is the relative frequency of surname k in group J, and the
sums comprise all surnames; and (2)
random isonymy between
groups I and J estimated as Iij = k pki pkj ; where pki and pkj
are the relative frequencies of surname k in groups I and J,
respectively, and the sums comprise all surnames.
The distribution of surnames between groups, in this case
prefectures, districts, and communes, is useful for assessing
their population similarities, under the limit hypothesis of
common origin.
Fishers alpha ()
Fishers was estimated according to Barrai et al. (1996). It
estimates the number of surnames having equal frequency,
which would result in the same isonymy as that observed.
It is exactly homologous to the allele effective number in
a genetic system (Barrai et al., 2000). A small value of
would indicate large inbreeding and drift, whereas a large
value would indicate migration and low inbreeding. It has
been verified (Wright, 1951) that in the presence of a rate of
migration (m): FST = 1/(4Nm + 1), then, = Nm + (1/4),
since FST = I/4 (Crow & Mange, 1965) and = 1/I for large
samples (Rodriguez-Larralde et al., 1993). Then, for large
N, tends to Nm. This makes a useful predictor of the
evolutionary dynamics of a system, and a sufficient indicator
of structure.
Isolation by distance
To detect isolation by distance, we calculate the linear
correlation of surname distances (Laskers, Euclidean and
Neis) between localities I and J, with their geographic
distances.
Laskers distance (Rodriguez-Larralde et al., 1998) is
defined as
L = log(Iij ).
Euclidean distance (Cavalli-Sforza & Edwards, 1967) is defined as
 

pki pkj
E = 1
k

234

Annals of Human Genetics (2013) 77,232243

where the summation is over all surnames. Neis distance (Nei,


1973) is


Iij
.
Nd = log 
(Iii Ijj )
Euclidean and Neis distances have been developed for
purely genetic data; however, they can be applied to the frequencies of surnames, since these simulate alleles at a locus in
the recombining region of the Y chromosome (the daughters
inherit the surname with the paternal X chromosome).
As geographical coordinates, we used the centroids of
prefecture, district and commune areas obtained from the
ArcGis (ESRI) map downloaded from Global Administrative Areas site (http://gadm.org/).
The correlations of isonymic distances with the geographic
ones give very similar results independently from the isonymic
index used, and this is further indication that either of the
isonymy measures can be used without loss of generality.
The significance of correlations was assessed with the Mantels test using 1000 permutations (Mantel, 1967; Smouse
et al., 1986). For a graphic representation of the surname relationship between different prefectures, these were mapped
on the first and second dimension of the Multidimensional
Scaling (MDS) of Laskers distance matrix. In order to detect the direction of surname diffusion, following Menozzi
et al. (1978), the first three components from the Principal
Component Analysis (PCA) of the same matrix, were also
projected individually on the Albania map, with the ArcGis
(ESRI) software package. To complement and clarify the clustering, we built dendrograms (Ward, 1963; Cavalli-Sforza &
Edwards, 1967) of prefectures and of districts. These were
obtained from the matrix of Lasker distances between administrative sections, using the agglomeration method of Ward
(1963). They were considered only as a help to the clustering, we do not imply that the present situation was generated
by subsequent splits of preexisting clusters.
R

Random kinship
Random kinship IJ (x) between any two localities I and J at
distance x is given by
IJ (x) = K exp (Bx) (Malecot, 1955; Kimura, 1960)
where K is the average kinship at geographic distance x =
0, say average FST , and B is a function of average mutation
rate and of the variance of x. Then, IJ (x) is always positive
and is expected to decrease exponentially to 0 with increasing
distance. Random kinship was defined as
IJ (x) = IIJ (x)/4
(Barrai et al., 2012) with average FST as the average kinship
at distance x = 0.

C

2013 Blackwell Publishing Ltd/University College London

Surnames in Albania

Results and Discussion


The Most Frequent Surnames
The distribution, by prefecture and district, of the surname
numbers used in the analysis with the main parameters derived from the isonymy theory, are given in Table 1. The
data for communes and bashkias are presented in Table S1
available, as all further supplementary materials mentioned
in this paper, at our website: http://web.unife.it/utenti/
alberto.carrieri/ricerca.htm.
In Figure S1, we give the distribution of the logarithm of
the number of surnames over the logarithm of the number
of times they occur (Fox & Lasker, 1983; Zipf, 1935. See
this last reference for the meaning and uses of the log-log
distribution). In this case, it is fairly linear (Fig. S1). It is
called a typical rank-size distribution or Zipfian curve, and
it is so named by glottologists (Adamic & Huberman, 2002),
and here it indicates the number of instances (people) with a
unique surname.
In Albania, surnames originated and have been established
generally in the same way as in other European countries.
The Albanian language belongs to the Indo-European group,
and, despite several exchanges with other languages, it has
preserved its own structure in its formative elements. According to Bidollari (2010), the language does not possess general
rules, as other Indo-European languages, for the patronimic
formation like the suffixes -ad`es, -eid`es, -poulos in Greek, -ez
in Spanish and Portuguese, -escu in Romanian, -ich in Slavic
languages and so on. It does not possess suffix elements indicating lineage like -son, preferred frequently in the English and
Swedish languages, or -sohn in German, and -sen in Danish.
However, many Albanian surnames have been formed by the
patronymisation process of the anthroponyms (first names),
ethnonyms and toponyms in all the cases when it was necessary to indicate social or geographic origin.
Albania was for nearly five centuries under Turkish occupation. Therefore, several surnames, like Hoxha, Hoxhaj,
Shehu, Shehaj, Dervishi and others have been introduced
through the Muslim religion indicating in such cases levels of the religious hierarchy. Some other surnames have been
strongly influenced by the Turkish language, for example, surnames that have been formed by the introduction of suffixes
like -llari, -xhi, -lli and -li.
Here, we deal with 3,068,447 persons and 37,184 surnames, so that the average number of instances (persons)
having an unique surname, the so called type-token ratio of glottologists, is 82 (see further down our ratio Sample
Size/Surnames in Table 2 and King and Jobling (2009) for
other type-token ratios in Europe).
We studied in some detail the 100 most frequent surnames
(Table S2). Overall, these surnames comprise 583,708 occur-


C

2013 Blackwell Publishing Ltd/University College London

rences, equal to 19.0% of the total number of surnames used


here. The most frequent surnames are Hoxha with 39,088 occurrences, C
ela with 14,632, Marku with 13,852, Shehu with
12,348, and Muca with 12,236. After these, one finds Kola
(11,443), Dervishi (10,953), Gjoka (10,191), Kurti (10,152)
and in 10th place Koci (9533). Overall, the first 10 surnames
comprise 144,428 individuals, or 4.7% of the total number of
electors.
Surnames of clear Arabic origin are frequent in the North
and the East of Albania. Dervishi (10,953), which is seventh
in the general list, is the first name of clear Arabic origin,
followed by Elezi (8155), Sinani (6237), Hasani (4541), and
Osmani (4103). The Turkish language was the main vehicle for other frequent surnames that were formed by first
names of Arabic or Persian origin like Brahimaj (1684),
Brahimi (2225), Elezaj (1970), Islami (1751), among several
others.
Greek surnames, a result of the influence of the Christian
orthodox religion, are frequent in the South of the Country, which borders with Greece. Short lists of the 30 most
frequent Albanian surnames of Arabian and Greek origin are
given in Tables S3 and S4. However, these lists are by far
incomplete, since they are based on our knowledge of Arabic
and Greek, knowledge, which is very limited. In particular,
for the Greek names, we list only those which start with
Papa (which means priest, father) to avoid uncertainties. There are 9961 surnames beginning with Papa, which
are joined with another name of Christian (or sometimes nonChristian) origin, like Papajani, Papajorgji, and Papanikolla.
Note the curious Papazisi, which might be a translocation of
the Arabic Aziz (which means strong) on the Greek Papa.
So Papazisi might be the father of the strong.

Isonymy Parameters in Albanian Prefectures,


Districts, and Communes
Fishers alpha and inbreeding by isonymy
Values of and FST are given in Table 1 for prefectures and
districts and in Table S1 for communes. We recall that , the
effective
surname number, is the inverse of isonymy I (I =
 2
p and = 1/I, Barrai et al., 1996), so that FST = 1/(4)
and then the meaning of is exactly homologous to the
effective allele number of genetic systems.
The effective surname number , in Albania, was estimated
at 1327 for the country, considered as a unit. The average for
the 12 prefectures was 653.3 84.3. For the 36 districts, it
was 365.9 42.0 and for the 321 communes it was 122.6
8.7. The difference between the estimates of , then of
FST , in prefectures, districts, communes and for the country
as a unit, is observed when different subdivisions of the same

Annals of Human Genetics (2013) 77,232243

235

I. Mikerezi et al.

Table 1 Prefecture, district, number of surnames N, number of different surnames S, Fishers , Karlin-McGregor , isonymy I, and FST in
Albania. Districts grouped by prefecture.
Prefecture

District

Berat
Berat
Kucove
Skrapar
Diber
Diber
Mat
Bulqize
Durres
Durres
Kruje
Elbasan
Elbasan
Gramsh
Peqin
Librazhd
Fier
Fier
Lushnje
Mallakaster
Gjirokaster
Gjirokaster
Tepelene
Permet
Korce
Korce
Kolonje
Pogradec
Devoll
Kukes
Kukes
Has
Tropoje
Lezhe
Lezhe
Mirdite
Kurbin
Shkoder
Shkoder
Puke
Malesi madhe
Tirane
Tirane
Kavaje
Vlore
Sarande
Delvine
Vlore

236

Annals of Human Genetics (2013) 77,232243

FST

169,377
112,084
35,894
21,399
120,994
50,866
42,669
27,459
289,512
236,662
52,850
299,600
197,185
25,062
26,185
51,168
352,352
193,704
128,406
30,242
121,628
66,969
28,946
25,713
264,449
152,114
15,813
64,452
32,070
72,875
39,510
13,247
20,118
148,395
72,257
26,750
49,389
239,312
179,065
21,712
38,535
712,068
631,027
81,041
277,885
74,963
23,788
179,134

5276
4042
2123
1314
2482
1216
1296
915
9698
9149
1861
6555
5568
839
1103
1309
7479
5379
3691
998
4544
3150
1621
1539
7860
6250
1232
2497
1462
1844
1113
270
886
4080
2617
778
2192
7350
6642
892
1235
19,057
18,415
2743
7335
3534
1504
5327

496
420
277
273
377
298
247
191
775
757
337
457
442
168
103
186
623
510
345
147
910
767
273
460
1110
1108
453
378
211
351
190
84
198
173
133
67
298
658
637
123
260
997
1048
282
913
470
339
694

0.00293
0.00374
0.00767
0.01258
0.00312
0.00582
0.00575
0.00691
0.00268
0.0032
0.00633
0.00153
0.00224
0.00663
0.00396
0.00362
0.00177
0.00264
0.00268
0.00482
0.00744
0.01133
0.00934
0.01756
0.00419
0.00724
0.02783
0.00583
0.00653
0.0048
0.00479
0.00629
0.00973
0.00117
0.00184
0.00249
0.006
0.00275
0.00355
0.00562
0.00671
0.00141
0.00167
0.00347
0.00328
0.00623
0.01404
0.00386

0.00201
0.00238
0.0036
0.00366
0.00265
0.00335
0.00404
0.00521
0.00129
0.00132
0.00297
0.00219
0.00226
0.00595
0.00953
0.00536
0.00161
0.00196
0.0029
0.0068
0.0011
0.0013
0.00365
0.00217
0.0009
0.0009
0.00221
0.00264
0.00473
0.00284
0.00524
0.0118
0.00504
0.00576
0.0075
0.0148
0.00335
0.00152
0.00157
0.0081
0.00384
0.001
0.00095
0.00354
0.00109
0.00213
0.00295
0.00144

0.000505
0.000597
0.000907
0.000926
0.000664
0.000844
0.001017
0.001312
0.000323
0.000331
0.000746
0.000548
0.000566
0.001497
0.002392
0.001346
0.000402
0.000491
0.000726
0.001709
0.000277
0.00033
0.000922
0.000553
0.000226
0.000227
0.000567
0.000664
0.001191
0.000714
0.001317
0.00297
0.001272
0.001442
0.001879
0.003708
0.000842
0.000381
0.000394
0.002036
0.000965
0.000251
0.000239
0.000889
0.000275
0.000535
0.000747
0.000362


C

2013 Blackwell Publishing Ltd/University College London

Surnames in Albania

Table 2 Comparison of isonymy parameters in nine European countries, in five South-American countries, in the United States and Texas,
and in Yakutia. Overall, 122 million surnames were analysed.
Country
Europe
Austria
Albania
Belgium
France
Germany
Holland
Italy
Switzerland1
Spain
Paternal
Maternal
Asia
Yakutia
North America
United States
Texas
South America
Argentina3
Venezuela2
Bolivia4
Paraguay3

Surnames
(S)

(average)

Isolation
by distance

140,766
37,184
137,442
495,104
462,526
126,485
215,623
166,116

854
123
997
1615
1596
787
1236
891

0.59
0.71
0.74
0.69
0.51
0.46
0.61
0.72

7.1
82
8
12.1
11.2
19
23.7
10.2

94,886
110,034

134
144

0.21
0.26

38
33

0.5

44,625

107

0.69

11.1

18
3.6

899,585
235,740

1366
734

0.24
0.42

20
15.3

22.6
3.9
23.2
4.8

414,441
68,665
174,922
39,047

422
122
122
108

0.47
0.78
0.5
0.42

54.5
56.8
144.6
122.9

Sample size
(SS, millions)
1
3.0
1.1
6
5.2
2.4
5.1
1.7
3.6

Type-token
(SS/S)

Cantons.
States.
3
Districts.
4
Provinces.
2

area and population are considered. Very properly in the case


of Albania, the difference constitutes the Prefecture Effect,
identified for FST by Nei and Imaizumi (1966), in Japan, and
so named by Scapoli et al. (2007). Nei and Imaizumi observed
that, for the same area and population, small subdivisions have
larger FST , and larger subdivisions have smaller FST . In their
study, the effect was seen in towns and in the Japanese prefectures where the towns were located; hence the name. It could
also be named a geographic scale effect that intervenes in
many phenomena since it is just a question of heterogeneity
increasing with population size. Of course, the prefecture effect is visible both on FST and . It appears that Albania is
no exception, and, since is inversely related with FST, the
sequence

In the analysis, is significantly and negatively correlated


(r = 0.16) with latitude, possibly due to the average higher
population density of southern communes. So, the largest
values of (the inverse of isonymy) were seen in the large
towns, which are also capitals of prefectures. Highest s for
communes were 1245 in the commune of Korce , 1222 in
Tirana, 990 in Durres, 748 in Vlore, and 720 in Shkoder.
These large communes give the name to the prefectures where
they are located. The lowest values observed in communes
were = 7 in Sheze, in the prefecture of Elbasan, = 10
in Hysgjokaj and = 11 in Ballagat, both communes in
the prefecture of Fier, and = 12 in Shtiqen and = 13
in Surroj, both in Kukes. These communes are located in
mountainous areas and have a small population.

FST Prefecture < FST District < FST Commune


is respected.
In Albania, the lowest levels of random inbreeding, indicated by FST , are expected and observed in the highly populated areas of the central part of the country, the area around
the capital Tirana.


C

2013 Blackwell Publishing Ltd/University College London

Isolation by distance
We studied isolation by distance through the correlation
of geographic with surname distances at the prefecture,
district and commune levels. We found that Euclidean,
Neis and Laskers distance between the 12 prefectures were

Annals of Human Genetics (2013) 77,232243

237

I. Mikerezi et al.

Figure 2 Variation of Laskers distance between prefectures


with geographic linear distance.

considerably correlated with linear geographic distance, with


r = 0.709 0.062, r = 0.560 0.079 and r = 0.621 0.082,
respectively. The same tendency was observed between the 36
districts, although the correlations in this case were smaller,
r = 0.581 0.029, r = 0.543 0.033 and r = 0.584 0.030,
respectively. Similarly, between communes, we observed 0.47
0.008, 0.37 0.011, 0.44 0.011 for Euclidean, Neis
and Laskers. As an example, the variation of Laskers distance
between prefectures is given in Figure 2 (see Fig. S2 for the
distribution of Laskers distances between districts and Fig. S3
for that of Laskers distance between communes). Given the
high correlation between the three measures of distance (for
prefectures, r[NeiEuclidean] = 0.85 0.03; r[NeiLasker] = 0.74
0.06 and r[EuclideanLasker] = 0.65 0.08), for this analysis,
we used mainly Laskers distance.
The signal extracted from the scatter diagram of Laskers
distance over kilometres for communes is given in Figure 3.
Linearity seems dominant, in Albania a clear tendency toward
an asymptote is not observed, as it was in Spain, Bolivia and
Chile (Rodriguez-Larralde et al., 2003, 2011; Barrai et al.,
2012) where the relation between isonymic and geographic
distance flattens at large distances. In Albania, there is a sharp
increase of Laskers distance up to 120 km, which gives indication of isolation and drift below that distance. After that,
the increase in isonymic distance becomes minor, possibly
indicating the effect of internal migration. The signal for
Euclidean and Neis distance is given in Figures S4 and S5,
respectively. Note the rapid rise of Euclidean distance toward
the asymptote, due to the sensitivity of this distance to the
change of surnames and of their frequency with increasing
geographic distance.

238

Annals of Human Genetics (2013) 77,232243

Figure 3 Variation of Laskers distance (s.d.) over kilometres


between 321 communes in Albania.

Figure 4 Exponential decay of random kinship (1/2 s.d. to


avoid intersection of the lower one with the abscissa) in Albania
over geographic distance. Pairwise distances between
communes.

Kinship
We plotted kinship between communes as previously defined
as a function of geographic distance (Fig. 4). Note that at the
commune level several pairs of communes (33 per thousand)
did not share surnames.
The decrease of kinship with distance is significantly exponential, as predicted by Malecot (1955), (see also Kimura,
1960). Specifically, the exponential decay should be characteristic of structures more linear than Albania, for example, as
observed by us in Chile. However, there is considerable and
significant agreement between Malecot theory and kinship
decay in Albania. Then, the Malecot model is very strong

C

2013 Blackwell Publishing Ltd/University College London

Surnames in Albania

and, possibly due to the large number of pairwise distances


we had available, it is also applicable to a geographic structure,
which, like Albania, is elongated from North to South but
is poorly linear. We were not surprised when we observed
the considerable agreement between the Malecot model and
kinship decay in Chile, since this latter country is practically
linear. Still, the agreement between the model and the observed decay of kinship over kilometric distance in Albania,
which is elongated but far from linear, is indicative of a general validity of the model although originally it was derived
only for a linear structure.

Relations between the Administrative Sections


of Albania
In order to obtain a general idea on the movements of population groups in Albania, we conducted MDSs and PCAs on
the matrix of Laskers distances between prefectures, between
districts and between communes. We report here and as supplementary material some of the results of these analyses.

Prefectures
The MDS projection on the first two dimensions of the matrix between prefectures (Fig. S6) differentiates a few clusters,
which correspond to groups of neighbouring prefectures. In
the resulting dendrogram (Fig. S7), a first large cluster composed mainly of the central prefectures is observed: Tirane,
Durres, Elbasan, Diber, Fier and Berat. These last two form
a subcluster within this cluster. Then, three prefectures in
the South-East and the extreme South, namely Korce , Vlore
and Gjirokaster, form the next cluster. Finally, two prefectures
of the North cluster together, Shkoder and Lezhe, whereas
Kukes represents an exception because, despite being a mountainous prefecture of the North, clusters together with the
Central prefectures, possibly due to the emigration from the
poorer areas toward the highly populated and richer areas
around the capital Tirana.
From the MDS projection in Figure S6, some other minor
but relevant points emerge, which complement the clustering
of prefectures. In particular, Tirane, Durres and Elbasan stand
alone at the centre of the bidimensional projection, removed
from the other prefectures. Vlore is marginal as is Korce .
Districts
The projection on the first two dimensions of the MDS tends
to differentiate several clusters, which correspond fairly well
to neighbouring districts (Fig. S8).
In the dendrogram (Fig. S9), the districts of Malesi e
Madhe, Tropoje and Has, at the Northern border with Montenegro, cluster with Fier, Mallakaster and Vlore, which are in

C

2013 Blackwell Publishing Ltd/University College London

the South of the country. One district in the South, Tepelene,


clusters with a Central-Northern belt of the seven districts of
Durres, Kruje, Tirane, Mat, Bulqize, Diber and Kukes.
A second central main cluster, south of the former, includes
in an East-West belt the districts of Kavaje, Lushnje, Peqin,
Elbasan, Gramsh and Librazhd.
Then, comes a Southern group of districts: Kucove, Berat,
Skrapar, Korce , Pogradec and Devoll. All these are adjacent
also geographically. However, we underline that the clustering of Malesi e Madhe, Tropoje and Has in the North, with
the Vlore cluster in the South, might indicate injection, between North and South of Albania, of eastern groups from
Macedonia toward the Adriatic (Fig. S9).
From the projection, some other minor but relevant points
emerge, which complement the clustering. In particular, the
Tirane district stands at the centre of the bidimensional projection, with Durres. This might indicate that these districts,
which together comprise almost one quarter of the Albanian
population, possess most of the surnames of the nation.
Malesi-e-Madhe in Shkoder, and Mallakaster in Fier are
marginal both on the projection and in Albanian geography,
bordering, respectively, Montenegro, Kosovo at North and
the limit of the Toske dialect in the South.
A visual indication of the isonymic proximity of districts is
given by the maps of Figure 5 where the similarity of districts
is indicated by the similar intensity of the same colour. It is
appropriate at this point to indicate that recently new methods
of identifying spatial concentration of surnames have been developed (e.g. Longley et al., 2011; Chesire & Longley, 2012),
which give specific examples on various ways of clustering and
representing geographical dimensions of surname frequency
data. Most interesting seem the developments which include
forenames to detect ethnicity of groups (Mateos et al., 2011).
This adds a further dimension to isonymy studies, which needs
to be explored.

Communes
We found that, only at the commune level, there were 157
pairs of communes out of 51,360, which did not share surnames. Out of these 157 pairs, 49 included the commune of
Liqenas in Korce , which has a mainly Macedonian population. Also, 34 pairs included the commune of Lure in Diber,
but we did not find a good reason for this last preference.
Of course, there are various reasons why in Albania this absence of the same surname in small communes may occur.
We believe that, among others, one reason is to be found
in the complexity of the Albanian alphabet, which often results in the same name being written differently in different
communes. However, there is also some effect of distance on
the phenomenon. The average geographic distance between
the 157 pairs having infinite Laskers and Neis distance is

Annals of Human Genetics (2013) 77,232243

239

I. Mikerezi et al.

Figure 5 Projection of Laskers matrix of surname distances on districts in Albania by mapping (A) the first three
PCAs factors (I: Factor 1 = 42.8%; II: Factor 2 = 26.9%; III: Factor 3 = 11.5%) (B) the first three MDSs dimensions
(I: Dimension 1; II: Dimension 2; III: Dimension 3. Stress 11.2%).

128.9 14.7 km. The average distance for the other 51,203
pairs is 95.9 0.06 km, and the difference is significant
(t[oo] = 8.568, P  0.0001). We bypassed the problem posed
in the multivariate analysis of the distance matrices, by the
elements of infinite value, by substituting to the 157 infinite
isonymic distances, the nearest maximum observed. In this
way, we met no complexities in the subsequent analysis of the
distance matrices of Lasker and Nei. It is important to note
that if the 157 infinite distances are excluded, the correlations
for communes rise from 0.44 to 0.47 for Lasker, and from 037

240

Annals of Human Genetics (2013) 77,232243

to 0.39 for Nei (refer again to Fig. S3 for Laskers distance


between communes).
As noted briefly above, we estimated in the commune of
Korce , the capital of the homonymous prefecture, the highest
value of in the Country (1245). Relatively high and low
estimates of inbreeding are also observed in several communes
of the central area in the Tirana region. This might explain
the position of this prefecture relative to the other groups and
might indicate recent immigration toward the main urban
area of Albania. Low (and high FST ) are observed in the


C

2013 Blackwell Publishing Ltd/University College London

Surnames in Albania

communes of Elbasan (one commune with = 7), Fier (one


commune with = 10 and a second one with = 11), and
Kukes (one communes with = 12 and one with = 13).
We do not present here either the projection of the first
two dimensions of MDS for the 321 321 matrix of the
communes nor the dendrogram derived from it. Both are,
however, given as Figures S10 and S11. Since the projections of the individual names of 321 elements are illegible,
we decided to label with different symbols the communes
North of the River Shkumbin in central Elbasan (152 and
169, respectively) to detect whether the two main groups of
points depicted (Fig. S10) in the projection contain a majority of communes where Gege or Toske is spoken. In fact, the
subdivision is sharp, a vast majority the Gege-speaking communes cluster together as do those speaking Toske. Then, the
two main groups identified through surname distances are
highly correlated with the two linguistic areas of Albania, the
Gege area in the North and the Toske area in the South.
We used the same technique to visualise the clusters in
the dendrogram, putting the labels G and T at the endpoints
of the graph (Fig. S11). The resulting clusters correlate with
latitude, but the North-South distribution of communes is
not as clear as in the projection from the MDS.

Mapping of the first three components of Laskers matrix


The structures revealed by the MDSs and the dendrograms
are only partially indicative of the possible movements of the
population, therefore, to have a general idea of the direction, if any, of settlements in Albania, we mapped on the
nation (following Menozzi et al., 1978) the first three components of the matrix of Laskers distance, obtained from a
PCA and from the MDS. We provide the PCA components
because the relative importance of each component is given by
the corresponding eigenvalue, while the MDS provides the
value of the stress for a judgement of the overall fitting on the
three dimensions. The resulting maps are given in Figure 5
(A for PCA and B for MDS, respectively).
The variation of the first component, which accounts for
almost half of the variability (42.8%) in the North-South
direction, indicates movement from the centre of the country toward North and South. This might mean that from a
chronological point of view, immigration was in the EastWest direction from Macedonia, establishing a centre of high
density of migrants, which subsequently moved North and
South. The third component (11.5%) gives the same indication, although with minor intensity. Then, the sense of
movement may be hypothesised from the East toward the
Adriatic Coast, since the entry of surnames from the sea in
significant numbers is unlikely.
It appears that only the second component (26.9%) is somewhat directional; the deviations from the second axis appear to

C

2013 Blackwell Publishing Ltd/University College London

be ordered in the North-South direction. However, although


this component indicates movement in the North-South direction, the sense of movement cannot be detected from it,
unless we accept that the highest deviations are the most recent ones.
Overall, the three components account for 81.2% of the
surname variation as obtained from Laskers distance matrix.
The mappings of the first three dimensions of the MDS
seem to us compatible with those obtained from the PCA.
The indication of possible East-West movement seems clear
enough for the first and second dimension, and less so for
the third. So, this isonymic structure of Albania seems to
be mainly due to ancient migration from the East toward
the coast, with radiation toward the North and South, with
subsequent isolation and drift, with drift and short-range migration playing a major role in the generation of the present
geographical variation of surnames.

Conclusions
The methodology described in this paper was used to analyze
the isonymic structure of several South American countries
(Rodriguez-Larralde et al., 2000, 2011; Dipierri et al., 2005,
2011; Barrai et al., 2012). In these countries, 4 (Venezuela),
24 (Argentina), 23 (Bolivia), 4.5 (Paraguay) and 16.5 (Chile)
million surnames from the registers of electors were used.
In European countries and in the United States, we analysed surnames of telephone users (Barrai et al., 2001; Scapoli
et al., 2005, 2007; Rodriguez-Larralde et al., 2007). In thinly
populated Siberia, we used half a million surnames (Tarskaya
et al., 2009). The average value of for all the cities (or states,
in the case of Venezuela and the United States, or districts,
in the case of Argentina and Paraguay), and the isolation by
distance measured by the correlation between isonymic and
geographic distances, are given in Table 2 for the countries
studied up to now. Several features emerge from the comparisons reported in Table 2. First, the general similarity among
European nations in profusion of surnames as measured by ,
and for isolation by distance, as measured by the linear correlation. Secondly, the relatively small value of in Venezuela,
Bolivia, Paraguay, Spain, Chile and now Albania; and thirdly,
the practical absence of isolation by distance in the United
States, excluding bilingual Texas (Rodriguez-Larralde et al.,
2007). In Albania, the average number of persons having the
same surname (measured by the ratio Sample Size/Surnames,
given as the index SS/S in Table 2, is more similar (82) to
that of Argentina, Bolivia and Venezuela than to that of other
European countries. It may be of some interest to compare
our Table 2 with King and Joblings (2009) table 1. There,
they give the mean number of carriers per surname in 5538
households in 27 countries. Where applicable, their results are
consistent with ours.

Annals of Human Genetics (2013) 77,232243

241

I. Mikerezi et al.

Albania is the only European country in which we had


near-census data (persons below 18 years of age were not included, our data being those of electors), as we had in South
America. The ratio in countries where we had only the surnames of telephone users is about 25% of the ratio observed
in countries where we had census data. We would like to label
this as a census effect, but at present it is more prudent to
attribute the phenomenon to a bias of the telephone directory. However, according to Lasker (1985), this should not
be a major problem in countries with high telephone penetration rates, since telephone lines are a good sample measure
of households in the country. In this context, 25% of the total population simply reflects four people per telephone line,
which may well approach the average household size. In any
case, we will wait to explore the effect further when we shall
have available more data from national censuses, because for
the time being, barring Yakutia and Albania, the effect is confounded with the small number of different single surnames
in the Spanish language.
In Albania, Gege is spoken in the northern prefectures and
Toske in the southern ones. In Vlore and in Gjirokaster both
Greek and Toske are spoken. It is interesting to note that in
the map projection of MDS analysis of communes, a vast majority of the Gege-speaking communes cluster together, as do
those speaking Toske. Thus, the two main swarms identified
through surname distances are highly correlated with the two
linguistic areas of Albania; the Gege area in the North and
the Toske area in the South.
In this analysis, all inbreeding estimates were lower (and
higher) in the highly populated central area, in the Tirana
region. At present, most internal migration seems to take
place toward the capital and the other main towns. Consequently, for the time being, we may conclude that currently the population structure of this country is the result of
the joint action of directional and short-range migration and
drift, with directional migration dominating over drift at short
distances, as suggested by the rapid rise of Laskers over geographic distance below 120 km and by its flattening above that
distance.

Acknowledgements
The authors are grateful to the CEC of Albania who conceded
the data. The authors are also particularly grateful to both
Referees who gave valuable advice. The work was supported
by grants of the University of Ferrara to Chiara Scapoli.

References
Adamic, L. A. & Huberman, B. A. (2002) Zipf law and the Internet.
Glottometrics 3, 143150.

242

Annals of Human Genetics (2013) 77,232243

Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. &
Rodriguez-Larralde, A. (1996) Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Ann Hum
Biol 23, 431455.
Barrai, I., Rodriguez-Larralde, A., Mamolini, E. & Scapoli, C.
(2000) Elements of the surname structure of Austria. Ann Hum
Biol 26, 115.
Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Manni, F. &
Scapoli, C. (2001) Elements of the surname structure of the USA.
Am J Phys Anthropol 114, 109123.
Barrai, I., Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Acevedo,
N., Mamolini, E., Sandri, M., Carrieri, A. & Scapoli, C.
(2012) Surnames in Chile. A study of the population of Chile
through isonymy. Am J Phys Anthropol 147, 380388. doi:
10.1002/ajpa.22000.
Bidollari, C
. (2010) Onomastic investigations. In Albanian. Tirane:
Botimet Kumi Editor.
Cavalli-Sforza, L. L. & Edwards, A. W. F. (1967) Phylogenetic analysis
models and estimation procedures. Am J Hum Genet 19, 233257.
Chesire, J. A. & Longley, P. A. (2012) Identifying spatial concentrations of surnames. Int J Geogr Inform Sci 26, 309325.
Crow, J. F. & Mange, A. (1965) Measurements of inbreeding from
the frequency of marriages between persons of the same surname.
Eugen Q 12, 199203.
Dipierri, J. E., Alfaro, E., Scapoli, C., Mamolini, E., RodriguezLarralde, A. & Barrai, I. (2005) Surnames in Argentina. A population study through isonymy. Am J Phys Anthropol 128, 199209.
Dipierri, J. E., Rodriguez-Larralde, A., Alfaro, E. L., Scapoli, C.,
Mamolini, E., Salvatorelli, G., De Lorenzi, S., Sandri, M., Carrieri, A. & Barrai, I. (2011) Surnames in Paraguay: A study of
the population of Paraguay through isonymy. Ann Hum Genet 75,
678687. doi: 10.1111/j.1469-1809.2011.00676.x.
Fox, W. R. & Lasker, G. W. (1983) The distribution of surname
frequencies. Int Stat Rev 51, 8187.
Kimura, M. (1960) Outline of population genetics (in Japanese). Tokyo:
Baifukan.
King, T. E. & Jobling, M. A. (2009) Whats in a name? Y chromosomes, surnames and the genetic genealogy revolution. Trends
Genet 25(8), 351360.
Lasker, G. W. (1985) Surnames and genetic structure. Cambridge: Cambridge University Press.
Longley, P. A., Chesire, J. A. & Mateos, P. (2011) Creating a regional
geography of Britain through the spatial analysis of surnames.
Geoforum 42, 506516.
Malecot, G. (1955) Decrease of relationship with distance. Cold
Spring Harbour Symp 20, 5253.
Mantel, N. (1967) The detection of disease clustering and a generalized regression approach. Cancer Res 27, 209220.
Mateos, P., Longley, P. A. & OSullivan, D. (2011) Ethnicity and
population structure in personal naming networks. PloS ONE 6,
e22943. doi:10.1371/journal.pone.0022943.
Menozzi, P., Piazza, A. & Cavalli-Sforza, L. L. (1978) Synthetic
maps of human gene frequencies in Europeans. Science 201, 786
792.
Mikerezi, I., Susanne, C., Bajrami, Z. & Kume, K. (1995) Differentiation of Albanian human populations and their relationships with
Balkanic ethnic groups according to gene frequencies at ABO,
MN and Rhesus loci. IUAES International Congress, April 2021,
1995, Torino, Italia, p. 32.
Mikerezi, I., Pizzetti, P., Lucchetti, E. & Ekonomi, M. (2003)
Isonymy and the genetic structure of Albanian population. Coll
Antropol 27, 507514.


C

2013 Blackwell Publishing Ltd/University College London

Surnames in Albania

Nei, M. (1973) The theory and estimation of genetic distance. In:


Genetic structure of populations (ed. N. E. Morton). Hawaii: Hawaii
University Press.
Nei, M. & Imaizumi, J. (1966) Genetic structure of human populations. I. Local differentiation of blood groups gene frequencies in
Japan. Heredity 21, 936.
Relethford, J. H. (1988) Estimation of kinship and genetic distance
from surnames. Hum Biol 60, 475492.
Rodriguez-Larralde, A., Barrai, I. & Alfonzo, J. C. (1993) Isonymy
structure of four Venezuelan states. Ann Hum Biol 20, 131145.
Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C.,
Mamolini, E. & Barrai, I. (1998) Isonymy and the genetic structure of Switzerland. II. Isolation by distance. Ann Hum Biol 25,
533540.
Rodriguez-Larralde, A., Morales, J. & Barrai, I. (2000) Surname
frequency and the isonymy structure of Venezuela. Am J Hum
Biol 12, 352362.
Rodriguez-Larralde, A., Gonzalez-Martin, J., Scapoli, C. & Barrai,
I. (2003) The names of Spain: A study of the isonymy structure
of Spain. Am J Phys Anthropol 121, 280292.
Rodriguez-Larralde, A., Scapoli, C., Mamolini E. & Barrai, I. (2007)
Surnames in Texas: A population study through isonymy. Hum
Biol 79, 215239.
Rodriguez-Larralde, A., Dipierri, J., Alfaro, E., Scapoli, C.,
Mamolini, E., Salvatorelli, G., De Lorenzi, S., Carrieri, A.
& Barrai, I. (2011) Surnames in Bolivia: A population study
through isonymy. Am J Phys Anthropol 144, 177184. doi:
10.1002/ajpa.21379.
Scapoli, C., Goebl, H., Sobota, S., Mamolini, E., RodriguezLarralde, A. & Barrai, I. (2005) Surnames and dialects in France:
Population structure and cultural transmission. J Theor Biology 237,
7586.
Scapoli, C., Mamolini, E., Carrieri, A., Rodriguez-Larralde, A. &
Barrai, I. (2007) Surnames in Western Europe: A comparison of
the subcontinental populations through isonymy. Theor Popul Biol
71, 3748.
Smouse, P. E., Long, J. C. & Sokal, R. R. (1986) Multiple regression and correlation extensions of the Mantel test of matrix
correspondence. Syst Zool 35, 627632.
Susanne, C., Bajrami, Z., Kume, K. & Mikerezi, I. (1996) Gene
differentiation at the ABO, MN and Rhesus loci among Albanians
and their relation with other Balkan populations. Gene Geogr 10,
3136.
Tarskaya, L., Elchinova, G. I., Scapoli, C., Mamolini, E., Carrieri, A.
Rodriguez-Larralde, A. & Barrai, I. (2009) Surnames in Siberia.
A study of the population of Yakutia through isonymy. Am J Phys
Anthropol 138, 190198.
Ward, J. H. (1963) Hierarchical grouping to optimize an objective
function. J Am Statist Assoc 58, 236244.
Wright, S. (1951) The genetic structure of populations. Ann Eugen
15, 324354.
Zipf, G. K. (1935) The psychobiology of language. Boston, MA:
Houghton-Mifflin.


C

2013 Blackwell Publishing Ltd/University College London

Supporting Information
Additional supporting information may be found in the online
version of this article:
Table S1 Distribution of isonymy parameters.
Table S2 The 100 most frequent surnames in Albania.
Table S3 The most frequent names of Arabic origin in
Albania.
Table S4 Surnames with the prefix Papa of clear Greek
origin.
Figure S1 Variation of the number of occurrences in 3 million surnames in Albania.
Figure S2 Variation of Laskers distance between 36 districts
in Albania.
Figure S3 Variation of Laskers distance between 321 communes in Albania.
Figure S4 Variation of Euclidean with geographic distance.
Figure S5 Variation of Neis with geographic distance.
Figure S6 MDS on the matrix of Laskers distances between
Prefectures.
Figure S7 Dendrogram of Albania prefectures.
Figure S8 MDS of Laskers distance matrix between districts.
Figure S9 Dendrogram of districts from the matrix of Laskers
distance.
Figure S10 Projection of the 321 communes of Albania on
the first two dimensions of the matrix of Laskers distances.
Figure S11 Dendrogram of communes.
As a service to our authors and readers, this journal provides
supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organised for online
delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than
missing files) should be addressed to the authors.
Received: 9 August 2012
Accepted: 18 November 2012

Annals of Human Genetics (2013) 77,232243

243

You might also like