This paper presents a parsing method for the entity extraction from open source documents. A Web ... more This paper presents a parsing method for the entity extraction from open source documents. A Web page of interest is first downloaded to a text file. The method then applies a set of patterns to the text file to extract interesting entity fragments. The patterns are currently particularly designed for obituary announcements. With the extracted entities, the next step is to identify these entities before they are populated into a database. An entity resolution process is presented to determine the actual identities. A case study is illustrated with the method and the results are presented also. Although the results show that the method is not technically effective and promising, the research results do help understand how well or bad a quick parsing technique extracts entities of interest from obituaries on the Web. More effective techniques should be further considered to improve the extraction results.
This paper addresses the problem of entity identification in documents in which key identity attr... more This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set of candidate identities selected from an appropriate entity catalog. This paper describes a new technique of multiple-reference, shared-relationship identity resolution that can be employed when a document references several entities that share a specific relationship, a situation that often occurs in published documents. It also describes the results obtained from a recent test of the multiple-reference, shared-relationship identity resolution technique applied to obituary notices. The preliminary results show that the multiple-reference technique can provide higher quality identification results than single-reference matching in cases where a shared relationship is asserted.
This paper presents a parsing method for the entity extraction from open source documents. A Web ... more This paper presents a parsing method for the entity extraction from open source documents. A Web page of interest is first downloaded to a text file. The method then applies a set of patterns to the text file to extract interesting entity fragments. The patterns are currently particularly designed for obituary announcements. With the extracted entities, the next step is to identify these entities before they are populated into a database. An entity resolution process is presented to determine the actual identities. A case study is illustrated with the method and the results are presented also. Although the results show that the method is not technically effective and promising, the research results do help understand how well or bad a quick parsing technique extracts entities of interest from obituaries on the Web. More effective techniques should be further considered to improve the extraction results.
This paper addresses the problem of entity identification in documents in which key identity attr... more This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set of candidate identities selected from an appropriate entity catalog. This paper describes a new technique of multiple-reference, shared-relationship identity resolution that can be employed when a document references several entities that share a specific relationship, a situation that often occurs in published documents. It also describes the results obtained from a recent test of the multiple-reference, shared-relationship identity resolution technique applied to obituary notices. The preliminary results show that the multiple-reference technique can provide higher quality identification results than single-reference matching in cases where a shared relationship is asserted.
Uploads
Papers by JaMia Moore