Academia.eduAcademia.edu

Structural indicators in citation networks

2011, Scientometrics

Abstract

ABSTRACT New indicators, including the outgrow index, characterizing an article in its ego citation network are introduced. We take full advantage of the existing duality (cites–is cited by) in a citation network. Although algebraic aspects are emphasized, a first step towards their interpretation is attempted. Examples of their calculation and of future applications are provided.

Scientometrics (2012) 91:451–460 DOI 10.1007/s11192-011-0587-3 Structural indicators in citation networks Xiaojun Hu • Ronald Rousseau • Jin Chen Received: 3 December 2011 / Published online: 31 December 2011 Ó Akadémiai Kiadó, Budapest, Hungary 2011 Abstract New indicators, including the outgrow index, characterizing an article in its ego citation network are introduced. We take full advantage of the existing duality (cites–is cited by) in a citation network. Although algebraic aspects are emphasized, a first step towards their interpretation is attempted. Examples of their calculation and of future applications are provided. Keywords Ego citation network  Structural indicators  Algebraic relations  Graph-theoretic analysis  Citation generations  Outgrow index Introduction Informetrics is not only the study of regularities (the so-called informetric laws) or of citation counting and its consequences; it is also the study of related algebraic structures such as graphs or networks. In this contribution we introduce some new notions related to the structure of citation networks. These notions are described independently from any X. Hu Medical Information Centre, Zhejiang University School of Medicine, Hangzhou 310058, China e-mail: [email protected] R. Rousseau (&) KHBO (Association K.U.Leuven), Faculty of Engineering Technology, Zeedijk 101, 8400 Oostende, Belgium e-mail: [email protected] R. Rousseau Universiteit Antwerpen (UA), IBW, Stadscampus, Venusstraat 35, 2000 Antwerpen, Belgium R. Rousseau Department of Mathematics, K.U.Leuven, Celestijnenlaan 200B, 3000 Leuven, Belgium J. Chen College of Public Administration, Zhejiang University, Hangzhou 310027, China 123 452 X. Hu et al. specific database. We recall that a citation network always contains two points of view: the ‘cites’ point of view, and the ‘is cited by’ view. These viewpoints are dual and both will play a role in our contribution. The Science Citation Index was introduced by Garfield in 1963 (Garfield 1963, 1964) and although article citations naturally lead to networks, it took quite some time before library and information scientists began to consider citation analysis as a kind of network study. Garner (1967) was among the first to combine citation analysis and network (or graph) theory. Small (1973) and Marshakova’s (1973) introduction of co-citation analysis can be seen as one of the first applications of network thinking in the field. Nowadays, the graph- or network approach to citation analysis is all-invasive and few would not see citation analysis as a form of applied graph theory (Boyack et al. 2005; Otte and Rousseau 2002; Persson et al. 2004; Wagner and Leydesdorff 2005; Yang and Ding 2009; Chen and Guan 2010). Although most topics discussed in this article apply to any publication, we will study only the article citation network. These articles may be journal articles or articles published in a conference proceedings book, but we assume that no other types of references or citation sources are present in the network under study. This restriction is only introduced to simplify our treatment. We begin with a purely algebraic, graph-theoretic description of the notions we introduce here, then we consider dynamical aspects, and we take a first step in the direction of attaching meaning to our constructs, i.e. we go beyond the purely algebraic construct. Next we present some real world cases taken from Thomson Reuters’ Web of Science (WoS). The application of these indices is placed in the framework of generations of references and citations (Hu et al. 2011a). Relations between these structural indices are discussed and time series for the dynamic ones are drawn. Methods In order to show the meaning of these new indicators for characterizing the microstructure of citation networks, we perform ego-centred citation analyses for an article, and give the mathematical definition for each structural indicator. Next we extend these analyses to include dynamical aspects. Ego article citation network: the ‘cites’ relation We consider an article citation network and focus on one specific source article: the ego, as it is called in network theory (Wasserman and Faust 1994), see Fig. 1. Our contribution follows the lead of Howard D. White who was one of the first to perform ego-centred citation analyses (White 2000). We refer to this source article as A and consider A’s reference list, denoted as Ref(A). Recall that we assume that this reference list contains only articles. The length of A’s reference list, i.e. its number of references, is denoted as TRef(A). This approach implies that we follow the ‘cites’ relation (taking A’s point of view). Article A and all articles in its reference list form a set, denoted as ER(A) = Ref(A) [ {A}, where ER stands for the extended reference list (namely, extended by including A). We will attach a positive number to each element of ER(A). This will allow us to rank ER(A) and we will characterize the relative position of A in this ranked list. Note that references correspond to outlinks in the ‘cites’ network. For each element in ER(A) we determine the number of articles by which it is cited. In this contribution, this is the number we are interested in. Yet our basic framework, though 123 Structural indicators in citation networks 453 Fig. 1 Article A’s (the ego) citation network not its interpretation, also applies to other numbers one may associate to an article, such as its age or the number of authors. Next we rank all elements in ER(A) according to its number of received citations. Finally the position of A in this list is characterized by its citations-of-references number CR(A) defined as: CRðAÞ ¼ 1  RðAÞ TRef ðAÞ þ 1 ð1Þ where R(A) denotes the rank of A in this list. In case of ties we use an average rank. This notion was introduced by us in an earlier article and termed the outgrow index (Rousseau and Hu 2010). CR(A) is always a number between zero (included) and one (not included). Articles that cite an article from A’s reference list form the set of all articles which are bibliographically coupled with A (this set may, or may not, depending on the application, be defined as to include article A itself or not). Each of these articles has already a bibliographic coupling strength and a relative bibliographic coupling strength with A (Kessler 1963). This establishes a relation between the notions of bibliographic coupling and the outgrow or CR-index. Instead of considering the number of citations received by each element in ER(A) we can also consider the number of references in each of these articles. This is yet another number associated to ER(A). In this way we obtain a new ranked list and we can determine a reference–reference index RR(A) defined as: RRðAÞ ¼ 1  R0 ðAÞ TRef ðAÞ þ 1 ð2Þ where R0 (A) is the rank of A in the list determined by the number of references. This approach uses references of references, hence two generations of references, establishing another relation between our new approach and notions studied in the field (Hu et al. 2011a). Note that the set ER(A) can be studied for its own sake, and this in many different ways: new ways, as introduced here and in (Rousseau and Hu 2010), but also well-known ways. Indeed, one may determine the average and median number of received citations (for each member); an I3-indicator can be calculated using an appropriate reference set (Leydesdorff and Bornmann 2011), while box plots (Egghe and Rousseau 1990, p. 25) can be used to compare ERs for different As. 123 454 X. Hu et al. Ego article citation network: the ‘is cited by’ relation Now we consider all articles that cite A and denote this set as Cit(A) [A is cited by all elements in Cit(A)]. The number of elements in Cit(A) is denoted as TCit(A). This means that, taking A’s point of view, we now follow the ‘is cited by’ relation. Article A and all citing articles form a set, denoted as EC(A) = Cit(A) [ {A}. Again we will attach a positive number to each element of EC(A), leading to a ranking of EC(A). As was the case for the ‘cites’ relation, a number between zero and one will be used to characterize the relative position of A in this ranked list. First we determine for each element in EC(A) the number of articles by which it is cited. Next we rank all elements in EC(A) according to its number of received citations. Finally the position of A in this list is characterized by its citation–citation number CC(A) defined as: CCðAÞ ¼ 1  R00 ðAÞ TCit ðAÞ þ 1 ð3Þ where R00 (A) denotes the rank of A in this new list. Note that, as for ER(A), also the set EC(A) can be studied for its own sake, and this again in many different ways. Note that this approach uses citing articles of citing articles, hence two citation generations as studied e.g. in (Hu et al. 2011a). Finally, instead of considering the number of citations received by each element in EC(A) we can also take the number of references in each of these articles. In this way we obtain another ranked list and determine a reference-citation index RC(A) defined as: RCðAÞ ¼ 1  R000 ðAÞ TCit ðAÞ þ 1 ð4Þ where R000 (A) is the rank of A in the list determined by the number of references. Articles that are cited by an article citing A form the set of all articles which are co-cited with A. Each of these articles has a co-citation strength and a relative co-citation strength with A (for a definition of these notions, we refer to (Small 1973). Note that we have taken full advantage of the existing duality (Rousseau 2010) in a citation network. Indeed, with every ‘cites’ relation there is a ‘is cited by’ relation. In this way the notions of bibliographic coupling and co-citation are dual notions. Returning to the ‘cites’ and ‘is cited by’ arrows originating from the source article, we point out that the number of references and the number of received citations are just the in- and out-degree of the source article in the citation network. We also note that the CR- or outgrow index is determined by two sets: BC(A), the set of all articles that are bibliographically coupled with A, and Cit(A), the set of all articles that cite A. Similarly, the RC-index is determined by Ref(A) and CoCiT(A), the set of all articles co-cited with A. Dynamical aspects We first point out which notions are static, i.e. do not change once the source article is introduced in the citation network, and which are dynamic, i.e. change or can change over time. The set of references of article A, and ER(A) are static and hence also A’s outdegree in the ‘cites’ network. Consequently also the references of references index, RR(A) is a static indicator. Once another article, say B, is published the bibliographic coupling coefficient 123 Structural indicators in citation networks 455 between A and B is fixed. Yet, most notions mentioned above are dynamic, namely all those involving received citations. The sets of articles that are bibliographically coupled, BC(A), or co-cited, CoCit(A), with A also change over time. At the moment an article is published (introduced in the network) its RR-index is fixed, while its CR-index is usually zero as each reference item is cited at least once, namely by A, while A is usually not cited. An exception might be the case that A is cited in the same journal issue as the one in which A is published. Such cases are from now on excluded. CC(A) and RC(A) are zero when A is published as at that moment EC(A) = {A}. The three indices CR(A), RC(A) and CC(A) are dynamic ones. Indeed, the relative position of A in ER(A) and EC(A) may change over time. When A receives its first citation (say by just one other article) then CR(A), the outgrow index) may or may not increase. RC(A) stays 0 or becomes 0.5 and CC(A) becomes 0.5. Even if A stays always the most-cited article in EC(A) its CC(A) still increases. This follows from the definition of the CC-index. Dynamical aspects of the outgrow index were studied in (Hu et al. 2011b). Interpretation For one source article a dynamic indicator seems more interesting than a static one. Yet, static ones may also be of interest when comparing these indices for different As. What is the meaning of the RR-index? This index characterizes the length of A’s reference list with respect to the reference lists of its own references. Assuming that an article’s reference list represents the knowledge on which it is built, we may observe that on the one hand, an article with a short reference list may still, be it indirectly, be built on a rich amount of knowledge. In this case A’s RR-index will be low. On the other hand an article with a long reference list may be built on many articles each studying a narrow topic (hence each with a short reference list). In that case A’s RR index will be quite high. Given the fact that reference lists tend to increase (Persson et al. 2004; Althouse et al. 2009), one may expect the RR-index to be rather high than low, but this is just an observation based on average behaviour, independent of concrete cases. The CR-index has already been studied (Rousseau and Hu 2010) and because of its practical meaning been termed the outgrow index. It is, indeed an index that characterizes to which extent an article outgrows (in terms of citations) the referenced items on which it is based. It is an indicator of the relation between the visibility of the source article and its references. One may expect that this index increases over time. If an article is highly cited and has a high CC-index then it is a leading article in its environment, perhaps describing a really innovative idea. If after some years an article has a rather low CC-index then it probably is not really important or only used in a minor role. If an article is a review article then it probably has a high RC-index. If it is a normal article then a low RC-index may show that the article is mainly mentioned in review articles, while a high RC-value may indicate that it has been used mainly in highly focused articles. Generally one may expect for normal articles that this index decreases over time. Relations between the structural indices and citation generations Structural indices in the first citation generation According to the definition of the CR- or outgrow index (Rousseau and Hu 2010), article A and all articles in its reference list form a set, denoted as ER(A) = Ref(A) [ {A}, where 123 456 X. Hu et al. Fig. 2 Structural indices of article A based on one citation generation Ref(A) denotes are references of article A, or all references taken into account (one may remove non-article references, or self-references). For CR(A) one generation of citations and one generation of references are used, see Fig. 2. The RC-index (references-of-citations) is defined in a similar way. One generation of citations and their references are traced in order to obtain formula (4), see Fig. 2. Structural indices in the second citation generation In order to define A’s CC-index again the set EC(A) is used. Instead of considering references, now citing articles of each of the articles in this set are counted, leading to a citations-of-citations index. Articles in EC(A) are ranked according to the number of received citations. The position of A in this list is characterized by its citation–citation number CC(A). The CC-index uses citing articles of citing articles, hence two citation generations. Figure 3 illustrates this approach. Finally, the references-of-references index denoted as RR(A), uses the set ER(A) and ranks its elements according to the number of references. This approach considers references of references hence two generations of references were tracked down (Hu et al. 2011a), see again Fig. 3. Examples In this section, we study two real examples using data from the WoS. We consider the first two articles published in the year 2000, volume 47, issue 1 of the journal Scientometrics. Bibliographic data are given in Table 1. Data were collected on 2010/12/25. Clearly the first one is rather well-cited, while the second one is moderately cited. We determine the four structural indices and study dynamical aspects using time series (Hu et al. 2011b). 123 Structural indicators in citation networks 457 Fig. 3 Structural indices of article A, using two citation generations Table 1 Bibliographic data of the two articles studied as examples Bibliographic data Article 1 Title Renormalized impact factor Author(s) Ramirez AM, Garcia EO, Del Rio JA Source SCIENTOMETRICS 2000, 47 (1): 3–9 Times cited 24 Article 2 Title Does peer review predict the performance of research projects in health sciences? Author(s) Claveria LE, Guallar E, Cami J, Conde J, Pastor R, Ricoy JR, Rodriguez-Farre E, Ruiz-Palomo F, Munoz E Source SCIENTOMETRICS 2000, 47(1):11–23 Times cited 6 In order to make the result more focused self-citations and items that are not WoS source items were removed from citations as well as from references. Results are given in Tables 2 and 3 and illustrated in Figs. 4 and 5. The three curves in each figure show that the CC-index is much higher than the other indices, the reason being that except article A itself, each article in the set EC(A) is published later than article A. Therefore, in general, in the list of articles in EC(A) ranked according to the number of received citations, the position of A in this list is higher than the others. It should also be noted that, at the early years since the publication of article A, the CR-index is always the lowest. The reason for this is that in the list of ER(A) articles 123 458 X. Hu et al. Table 2 Four structural indices and time series for Article 1 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 CR-index 0 0.25 0.25 0.25 0.25 0.31 0.44 0.63 0.63 0.63 0.69 CC-index 0 0.8 0.83 0.83 0.88 0.82 0.86 0.94 0.89 0.86 0.9 RC-index 0 0.6 0.5 0.5 0.375 0.27 0.21 0.17 0.16 0.18 0.16 RR-index 0.25 2006 2007 2008 2009 2010 Table 3 Four structural indices and time series for Article 2 2000 2001 2002 CR-index 0 0 0 0 0 0 0 0.07 0.03 0.07 0.23 CC-index 0 0 0 0 0 0.67 0.5 0.6 0.5 0.67 0.57 RC-index 0 0 0 0 0 0.33 0.33 0.2 0.2 0.33 0.29 RR-index 0.57 Fig. 4 Time series for Article 1 Fig. 5 Time series for Article 2 123 2003 2004 2005 Structural indicators in citation networks 459 ranked according to the number of received citations, each reference item is published earlier than article A. Therefore, the position of article A in the list starts at the last position. If, in consequent years, article A receives more citations than the other articles in the list, then A’s CR-index may increase, possibly outgrowing the references on which A is based. Conclusions and directions for further applications This new look on a well-known network may provide new ways of studying small subfields. Could these indices and subsets reflect changes in interest and intellectual patterns? Can they point to co-occurrences of ideas? Are they correlated? One step further can be taken by not just determining the number of citations received or references given for the elements in ER(A) and EC(A), but the number of scientific fields that give or receive citations. In this way the rankings (now based on fields used or reached) and the resulting indices connect our approach to diffusion theory (Liu and Rousseau 2010) and the study of interdisciplinarity (Rafols and Meyer 2010). In which type of graphs can these ideas be applied? We see possible applications in other citation graphs, e.g. patent citation graphs (Singh 2005; Wang et al. 2010), but also in totally different networks such as food webs (ecology) and Hasse diagrams such as the partial order graph derived from a Lorenz curve (Nijssen et al. 1998). This article provides a simple analysis of structural indices in an ego citation network. The notion of citation generations is used to describe ego article citation networks in a graph-theoretic setting. Values of structural indices are calculated for some real-life examples. The results imply that the four indices perform a basic connective role in the evolution of citation generations. Except for the RR-index, the other indices, namely the CR-index, CC-index and RC-index are dynamical. We are convinced that many interesting findings will result from further investigations using these structural indices. Acknowledgments The authors thank Raf Guns (Antwerp University) for stimulating discussions related to the outgrow index and its possible applications. This research is supported by National Natural Science Foundation of China (NSFC Grant No. 71173185); Ronald Rousseau’s research is also supported by the National Natural Science Foundation of China grants (NSFC 7101017006 and 71173154). This article is an extended version of a paper presented at the 13th International Conference on Scientometrics and Informetrics, Durban (South Africa), 4–7 July 2011 (Rousseau 2011). References Althouse, B. M., West, J. D., Bergstrom, T., & Bergstrom, C. T. (2009). Differences in impact factor across fields and over time. Journal of the American Society for Information Science and Technology, 60(1), 27–34. Boyack, K. W., Klavans, R., & Borner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351–374. Chen, Z. F., & Guan, J. C. (2010). The impact of small world on innovation: An empirical study of 16 countries. Journal of Informetrics, 4(1), 97–106. Egghe, L., & Rousseau, R. (1990). Introduction to informetrics, quantitative methods in library, documentation and information science. Amsterdam: Elsevier. Garfield, E. (1963). Science Citation Index: An international interdisciplinary index to the literature of science. Philadelphia: Institute for Scientific Information. Garfield, E. (1964). Science Citation Index: New dimension in indexing. Science, 144(361), 649–654. Garner, R. (1967). A computer oriented, graph theoretic analysis of citation index structures. In B. Flood (Ed.), Three Drexel information science research studies (pp. 3–46). Philadelphia: Drexel Press. 123 460 X. Hu et al. Hu, X. J., Rousseau, R., & Chen, J. (2011a). On the definition of forward and backward citation generations. Journal of Informetrics, 5(1), 27–36. Hu, X. J., Rousseau, R., & Chen, J. (2011b). Time series of outgrow indices. Journal of Informetrics, 5(3), 413–421. Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10–25. Leydesdorff, L., & Bornmann, L. (2011). Integrated Impact Indicators compared with impact factors: An alternative research design with policy implications. Journal of the American Society for Information Science and Technology, 62(11), 2133–2146. Liu, Y. X., & Rousseau, R. (2010). Knowledge diffusion through publications and citations: A case study using ESI-fields as unit of diffusion. Journal of the American Society for Information Science and Technology, 61(2), 340–351. Marshakova, I. V. (1973). System of document connections based on references. Nauchno-Tekhnicheskaya Informatsiya series, 2(6), 3–8. (in Russian). Nijssen, D., Rousseau, R., & Van Hecke, P. (1998). The Lorenz curve: A graphical representation of evenness. Coenoses, 13(1), 33–38. Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453. Persson, O., Glänzel, W., & Danell, R. (2004). Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics, 60(3), 421–432. Rafols, I., & Meyer, M. (2010). Diversity and network coherence as indicators of interdisciplinarity: Case studies in bionanoscience. Scientometrics, 82(2), 263–287. Rousseau, R. (2010). Bibliographic coupling and co-citation as dual notions. In B. Larsen (Ed.), The Janus faced scholar. A Festschrift in honour of Peter Ingwersen (pp. 173–183). ISSI e-zine (special volume). Rousseau, R (2011). Algebraic structures in the ego article citation network. In E. Noyons, P. Ngulube, J. Leta (Eds.), Proceedings of ISSI 2011—The 13th International Conference on Scientometrics and Informetrics, Durban, South Africa, 4–7 July 2011 (pp. 737–741). Rousseau, R., & Hu, X. J. (2010). An outgrow index. Annals of Library and Information Studies, 57(3), 287–290. Singh, J. (2005). Collaborative networks as determinants of knowledge diffusion patterns. Management Science, 51(5), 756–770. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608–1618. Wang, J. C., Chiang, C. H., & Lin, S. W. (2010). Network structure of innovation: Can brokerage or closure predict patent quality? Scientometrics, 84(3), 735–748. Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University Press. White, H. D. (2000). Toward ego-centered citation analysis. In B. Cronin & H. B. Atkins (Eds.), The Web of knowledge. A Festschrift in honor of Eugene Garfield (pp. 475–496). Medford (NJ): Information Today. Yang, E. J., & Ding, Y. (2009). Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60(10), 2107–2118. 123