Skip to main content
Data quality is crucial in all information systems. As a key step in obtaining clean data, record linkage or entity resolution (ER) groups database records by the underlying real world entities. In this pa- per we give practical... more
    • by 
    •   5  
      Lattice TheoryEntity ResolutionData Quality (Computer Science)Record Linkage
Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of... more
    • by 
    •   18  
      Data MiningFuzzy LogicEntity ResolutionInformation Integration
P. Christen), [email protected] (V.S. Verykios). Information Systems ] (]]]]) ]]]-]]] Please cite this article as: D. Vatsalan, et al., A taxonomy of privacy-preserving record linkage techniques, Information Systems (2013), http://dx.
    • by  and +1
    •   4  
      Information SystemsEntity ResolutionRecord LinkageData Quality
Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of... more
    • by 
    •   18  
      Data MiningFuzzy LogicEntity ResolutionInformation Integration
Entity resolution, also known as data matching or record linkage, is the task of identifying and matching records from several databases that refer to the same entities. Traditionally, entity resolution has been applied in batch-mode and... more
    • by  and +1
    •   7  
      Entity ResolutionIndexing (Information Organisation)Entity MatchingEntity and Identity Resolution
Matching and merging of data from heterogeneous sources is a common need in various scenarios. Despite numerous algorithms proposed in the recent literature, there is a lack of general and complete solutions combining different dimensions... more
    • by 
    •   5  
      Redundancy EliminationEntity ResolutionContextTrust Management
Record linkage is the process of matching records from several databases that refer to the same entities. When applied on a single database, this process is known as deduplication. Increasingly, matched data are becoming important in many... more
    • by 
    •   6  
      Machine LearningEntity ResolutionData MatchingExperimental Evaluation
Record linkage is the process of matching records from several databases that refer to the same entities. When applied on a single database, this process is known as deduplication. Increasingly, matched data are becoming important in many... more
    • by 
    •   9  
      Machine LearningEntity ResolutionIndexingData Matching
In scholarly digital libraries, author disambiguation is an important task that attributes a scholarly work with specific authors. This is critical when individuals share the same name. We present an approach to this task that analyzes... more
    • by 
    •   12  
      Information SystemsComputer ScienceInformation RetrievalEntity Resolution
In recent years, Online Social Networks (OSNs) have essentially become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus and offers for particular services and functionalities. To take... more
    • by  and +1
    •   6  
      Machine LearningEntity ResolutionSocial Network Analysis (SNA)Facebook
Entity resolution (ER) is an important and common problem in data cleaning. It is about identifying and merging records in a database that represent the same real-world entity. Recently, matching dependencies (MDs) have been introduced... more
    • by  and +1
    • Entity Resolution
In this paper, we present recent work which has been accomplished in the newly introduced research area of privacy preserving record linkage, and then, we present our L-fold redundant blocking scheme, that relies on the Locality-Sensitive... more
    • by 
    •   3  
      Entity ResolutionRecord LinkageLocality-sensitive hashing
One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named... more
    • by 
    •   10  
      Information SystemsRelational DatabaseEntity ResolutionSemantic Web
Crowdsourcing platforms adopt the new Labour as a Service model and allow for easy distribution of small tasks to a large number of workers. Crowdsourced systems introduce the open world model of databases. In the open world model, the... more
    • by 
    •   2  
      Entity ResolutionCrowdsourced Data
    • by 
    •   4  
      Entity ResolutionProbabilistic DatabasesDuplicate DetectionRecord Linkage
This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set... more
    • by 
    •   5  
      Text MiningEntity ResolutionIdentityIdentity management
Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of... more
    • by 
    •   19  
      Data MiningFuzzy LogicEntity ResolutionInformation Integration
    • by 
    •   4  
      Entity ResolutionEntity and Identity ResolutionDuplicate DetectionRecord Linkage
Museums around the world have built databases with metadata about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available... more
    • by 
    •   6  
      Cultural HeritageEntity ResolutionLinked DataSemantic Web
Damia is a lightweight enterprise data integration service where line of business users can create and catalog high value data feeds for consumption by situational applications. Damia is inspired by the Web 2.0 mashup phenomenon. It... more
    • by 
    •   4  
      Entity ResolutionUser InterfaceData IntegrityData Flow Graph
We introduce HIL, a high-level scripting language for entity resolution and integration. HIL aims at providing the core logic for complex data processing flows that aggregate facts from large collections of structured or unstructured data... more
    • by 
    •   9  
      Entity ResolutionInformation ExtractionArithmeticScripting Language
Entity matching and entity resolution are becoming more important disciplines in data management over time, based on increasing number of data sources that should be addressed in economy that is undergoing digital transformation process,... more
    • by 
    •   6  
      Computer ScienceEntity ResolutionNamed Entity RecognitionData Integration
Most real-world machine learning problems have both statistical and relational aspects. Thus learners need representations that combine probability and relational logic. Markov logic accomplishes this by attaching weights to first-order... more
    • by 
    •   12  
      Machine LearningEntity ResolutionInformation ExtractionInductive Logic Programming
The continuous growth of the e-commerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the product-related information... more
    • by 
    •   9  
      Artificial IntelligenceMachine LearningClustering and Classification MethodsEntity Resolution
The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be... more
    • by 
    •   2  
      Entity ResolutionScalability
Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an... more
    • by 
    •   3  
      Machine LearningEntity ResolutionSimilarity Function
This paper describes the outcome of an attempt to implement the same transitive closure (TC) algorithm for Apache MapReduce running on different Apache Hadoop distributions. Apache MapReduce is a software framework used with Apache... more
    • by 
    •   14  
      Computer ScienceDistributed ComputingInformation TechnologyMachine Learning
Matching dependencies were recently introduced as declarative rules for data cleaning and entity resolution. Enforcing a matching dependency on a database instance identifies the values of some attributes for two tuples, provided that the... more
    • by 
    •   10  
      Applied MathematicsDistributed ComputingEntity ResolutionData cleaning
Addressing the research opportunities we've identified could substantially broaden the spectrum of multilingual text-mining and its practicality for supporting global S&T knowledge management. These opportunities also share a common set... more
    • by 
    •   9  
      Cognitive ScienceKnowledge ManagementText MiningCompetitive Intelligence
Web data repositories usually contain references to thousands of real-world entities from multiple sources. It is not uncommon that multiple entities share the same label (polysemes) and that distinct label variations are associated with... more
    • by 
    •   3  
      Information SystemsEntity ResolutionLibrary and Information Studies
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel... more
    • by 
    •   3  
      Entity ResolutionData ReplicationProgramming Model
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed... more
    • by 
    •   4  
      Information SystemsEntity ResolutionData IntegrityData Format
Matching dependencies (MDs) were introduced to specify the identification or matching of certain attribute values in pairs of database tuples when some similarity conditions are satisfied. Their enforcement can be seen as a natural... more
    • by 
    •   6  
      Computer ScienceEntity ResolutionData cleaningBoolean Satisfiability
Entity resolution is a critical component of data integration where the goal is to reconcile database references corresponding to the same real-world entities. Given the abundance of publicly available databases that have unresolved... more
    • by 
    •   5  
      Computer ScienceEntity ResolutionReal TimeQuery processing
Web data repositories usually contain references to thousands of real-world entities from multiple sources. It is not uncommon that multiple entities share the same label (polysemes) and that distinct label variations are associated with... more
    • by 
    •   3  
      Information SystemsEntity ResolutionLibrary and Information Studies
    • by 
    •   3  
      Entity ResolutionData Mining and Knowledge DiscoveryRelational data
This paper describes the outcome of an attempt to implement the same transitive closure (TC) algorithm for Apache MapReduce running on different Apache Hadoop distributions. Apache MapReduce is a software framework used with Apache... more
    • by  and +1
    •   13  
      Computer ScienceDistributed ComputingInformation TechnologyMachine Learning
    • by 
    •   9  
      Entity ResolutionInformation ProcessingClusteringClassification
In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services,... more
    • by 
    •   13  
      Supply Chain ManagementComplexity TheoryEntity ResolutionClustering Algorithms
First order languages express properties of entities and their relationships in rich models of heterogeneous network phenomena. Markov logic is a set of techniques for estimating the probabilities of truth values of such properties. This... more
    • by 
    •   15  
      HistoryComputer NetworksEntity ResolutionCategory Theory
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel... more
    • by 
    •   5  
      Entity ResolutionComputer SoftwareProgramming ModelHigh Efficiency
Most research into entity resolution (also known as record linkage or data matching) has concentrated on the quality of the matching results. In this paper, we focus on matching time and scalability, with the aim to achieve large-scale... more
    • by  and +1
    •   8  
      Entity ResolutionIndexing (Information Organisation)Entity MatchingEntity and Identity Resolution
    • by 
    •   3  
      Entity ResolutionDuplicate DetectionRecord Linkage
Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in... more
    • by 
    •   8  
      Machine LearningEntity ResolutionData WarehouseMachine
Approximate data matching is a central problem in several data management processes, such as data integration, data cleaning, approximate queries, similarity search and so on. An approximate matching process aims at defining whether two... more
    • by 
    •   12  
      Information SystemsComputer ScienceMachine LearningEntity Resolution
In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services,... more
    • by 
    •   15  
      Computer ScienceSupply Chain ManagementComplexity TheoryEntity Resolution
Entity resolution (ER) is an important and common problem in data cleaning. It is about identifying and merging records in a database that represent the same real-world entity. Recently, matching dependencies (MDs) have been introduced... more
    • by 
    •   2  
      Computer ScienceEntity Resolution
Matching Dependencies (MDs) are a relatively recent proposal for declarative entity resolution. They are rules that specify, on the basis of similarities satisfied by values in a database, what values should be considered duplicates, and... more
    • by 
    •   5  
      Computer ScienceEntity ResolutionData cleaningQuery Answering
Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditions are satisfied. Their enforcement can be seen as a... more
    • by 
    •   6  
      Computer ScienceEntity ResolutionData cleaningBoolean Satisfiability