Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, Proceedings of the 6th ACM/IEEE-CS joint …
…
2 pages
1 file
In scholarly digital libraries, author disambiguation is an important task that attributes a scholarly work with specific authors. This is critical when individuals share the same name. We present an approach to this task that analyzes the results of automatically-crafted web searches. A key observation is that pages from rare web sites are stronger source of evidence than pages from common web sites, which we model as Inverse Host Frequency (IHF). Our system is able to achieve an average accuracy of 0.836.
2008
Today, bibliographic digital libraries play an important role in helping members of academic community search for novel research. In particular, author disambiguation for citations is a major problem during the data integration and cleaning process, since author names are usually very ambiguous. For solving this problem, we proposed two kinds of correlations between citations, namely, Topic Correlation and Web Correlation, to exploit relationships between citations, in order to identify whether two citations with the same author name refer to the same individual.The topic correlation measures the similarity between research topics of two citations; while the Web correlation measures the number of co-occurrence in web pages. We employ a pair-wise grouping algorithm to group citations into clusters. The results of experiments show that the disambiguation accuracy has great improvement when using topic correlation and Web correlation, and Web correlation provides stronger evidences about the authors of citations.
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021
Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library applications such as search and citation analysis. While many AND algorithms have been proposed, comparing them is difficult because they often employ distinct features and are evaluated on different datasets. In response to this challenge, we present S2AND, a unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation. Our dataset harmonizes eight disparate AND datasets into a uniform format, with a single rich feature set drawn from the Semantic Scholar (S2) database. Our evaluation suite for S2AND reports performance split by facets like publication year and number of papers, allowing researchers to track both global performance and measures of fairness across facet values. Our experiments show that because previous datasets tend to cover idiosyncratic and biased slices of the literature, algorithms trained to perform well on one on them may generalize poorly to others. By contrast, we show how training on a union of datasets in S2AND results in more robust models that perform well even on datasets unseen in training. The resulting AND model also substantially improves over the production algorithm in S2, reducing error by over 50% in terms of B 3 F1. We release our unified dataset, model code, trained models, and evaluation suite to the research community. 1 Index Terms-Digital libraries, Author name disambiguation, Out-of-domain evaluation.
2010
Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and is as effective as, and in some cases superior to, supervised ones, without manually labeling any training example.
ACM/IEEE Joint Conference on Digital Libraries, 2009
In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations
International Journal of Software Engineering and Its Applications, 2016
When using search engine services to search for scholarly articles, obtaining quick and accurate search results from a huge set of scholarly information is always important. However, most of the domestic and foreign search engine services for scholarly articles present a broad range of the results that correspond to the query of the researcher's name. Such results contribute in lowering the search precision and require users to spend time and effort to verify the results and find the necessary information. Such a problem is called "author ambiguity", while solving this problem is called "author disambiguation." An author disambiguation method classifies the authors with the same name into an actual person. By resolving author ambiguity, better search results can be obtained; this increases the recall rate and accuracy when searching for scholarly articles. In order to resolve author ambiguity in this paper, we shall expand the co-author network and identify the author using the co-author network information and basic bibliographic information as the features for machine learning Support Vector Machine. To examine the effectiveness of the proposed method, we test the author disambiguation method by targeting 92,100 IT-related scholarly data generated in Korea. Author disambiguation results through the expansion of co-author network are shown to have an F-1 measure of 94.79%. The result confirms that the author disambiguation method through the implementation of the co-author network is effective.
ACM SIGMOD Record, 2012
Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. The challenges of dealing with author name ambiguity have led to a myriad of disambiguation methods. Generally speaking, the proposed methods usually attempt to group citation records of a same author by finding some similarity among them or try to directly assign them to their respective authors. Both approaches may either exploit supervised or unsupervised techniques. In this article, we propose a taxonomy for characterizing the current author name disambiguation methods described in the literature, present a brief survey of the most representative ones and discuss several open challenges.
IEEE Access
Author Name Disambiguation (AND) has emerged as a significant challenge in the bibliometric context with the growing volume of scientific literature. When citations written by different authors have the same names (polysemy or homonym names), and when an author has different names, there is ambiguity (synonyms or name variants). It is difficult to associate a citation with the correct author. Polysemy and synonyms cause merging and splitting anomalies in the citations. These anomalies affect the quantification of an author's productivity (bibliometric analysis) and the reliability and quality of the information retrieved. Many techniques for AND have been proposed in the literature; most of them do not go beyond string matching or text matching. Most do not consider the context or semantics of the terms used in the citations. The AND problem is resolved semantically in this paper using the deep learning technique on the PubMed dataset. The experimental results show that the proposed method achieves overall (11.72%, 12.5%, and 12.1%) higher precision, recall, and f-measure than the pairwise class classification.
Research, Innovation and …, 2007
Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document similarities and to find documents related to the same person. Some previous researchers have used the vector space model or have tried to extract common named entities for measuring similarities. We propose a new method that uses Web directories as a knowledge base to find shared contexts in document pairs and uses the measurement of shared contexts to determine similarities between document pairs. Experimental results show that our proposed method outperforms the vector space model method and the named entity recognition method.
2020
Entity resolution is a challenging and hot research area in the field of Information Systems since last decade. Author Name Disambiguation (AND) in Bibliographic Databases (BD) like DBLP , Citeseer , and Scopus is a specialized field of entity resolution. Given many citations of underlying authors, the AND task is to find which citations belong to the same author. In this survey, we start with three basic AND problems, followed by need for solution and challenges. A generic, five-step framework is provided for handling AND issues. These steps are; (1) Preparation of dataset (2) Selection of publication attributes (3) Selection of similarity metrics (4) Selection of models and (5) Clustering Performance evaluation. Categorization and elaboration of similarity metrics and methods are also provided. Finally, future directions and recommendations are given for this dynamic area of research.
2019
In many applications, such as scientific literature management, researcher search, social network analysis and etc, Name Disambiguation (aiming at disambiguating WhoIsWho) has been a challenging problem. In addition, the growth of scientific literature makes the problem more difficult and urgent. Although name disambiguation has been extensively studied in academia and industry, the problem has not been solved well due to the clutter of data and the complexity of the same name scenario. In this work, we aim to explore models that can perform the task of name disambiguation using the network structure that is intrinsic to the problem and present an analysis of the models.
The Life Circuits for Universal Life in an Evolving Cosmology , 2023
Elt Forum: Journal of English Language Teaching/Elt Forum, 2023
ACCADEMIA AMBROSIANA, CLASSE DI STUDI SULL’ESTREMO ORIENTE, 2023
Slovenský národopis / Slovak Ethnology, 2020
American Journal of Infection Control, 2011
Aviation, Space, and Environmental Medicine, 2011
Revista de la Asociación Española de Neuropsiquiatría, 2009
Atlas Histórico y Geográfico de la Argentina: Población
US Public Diplomacy and Democratization in Spain, 2015
RIED. Revista Iberoamericana de Educación a Distancia, 2013
International Federation of Ex-Libris Societies (FISAE) ASSOCIACIÓ CATALANA D’EXLIBRISTES I CONTRATALLA-ART, 2024
University of Massachusetts, Amherst, MA, 1991
Journal of Open Source Software
Asian Pacific Journal of Cancer Prevention, 2016
Neuroscience Letters, 1990
Lecture Notes in Computer Science, 2011