This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gath... more This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gathering, organizing and sharing information about entities of interest (be they people, organizations , events or some other type of entity). The formal representations used in the Fact Sheets allow users to easily share information with others , run automated queries against the information , and allow the system to attempt to automatically gather and verify information before presenting it to the analyst. The system automatically keeps track of provenance (both which document a fact came from, and who interpreted the document). When gathering information automatically, the system produces a variety of search strings (using all known names for the entity) and then scours its sources for possible answers. Individual analysts can specify what types of information they are interested in for different types of entities, and can also specify additional patterns that can be used for finding tha...
This paper describes the hypothesis generation and evidence assembly capabilities of Cycorp’s Noö... more This paper describes the hypothesis generation and evidence assembly capabilities of Cycorp’s Noöscape application, which exploits the strengths of the Cyc knowledge base and inference engine. Noöscape allows analysts to compose questions using simple English templates, and tries to find answers via deductive reasoning. If deduction is fruitless Noöscape resorts to abduction, filling in missing pieces of logical arguments with plausible conjectures to obtain answers that are only partly supported by the facts available to the inference engine. For each conjecture, Noöscape automatically generates an augmented information retrieval (IR) query for
From the beginning, a primary goal of the Cyc project has been to build a large knowledge base co... more From the beginning, a primary goal of the Cyc project has been to build a large knowledge base containing a store of
The Cyc project is predicated on the idea that effective machine learning depends on having a cor... more The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information -- what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, th... more Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, there is currently enough knowledge in Cyc for it to be feasible to attempt to acquire additional knowledge autonomously. This paper describes a system that can collect and validate formally represented, fully-integrated knowledge from the Web or any other electronically available text corpus, about various entities of interest (e.g. famous people, organizations, etc.). Experimental results and lessons learned from their analysis are presented.
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly sem... more This paper describes a novel, unsupervised method of word sense disambiguation that is wholly semantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness approaches to disambiguation that rely on string cooccurrence or relative location in a taxonomy or concept map by 1) exploiting a rich array of properties, including higher-order properties, not available in merely taxonomic (or other first-order) systems, and 2) appealing to the semantic contribution a word sense makes to the content of the target text. Experiments show that this method produces results markedly better than chance when disambiguating word senses in a corpus of topically unrelated documents.
In this paper we describe an ontological theory centered on information assets as a key invariant... more In this paper we describe an ontological theory centered on information assets as a key invariant for Business Process Reengineering. We also provide examples formulations from its application in a real-world reengineering project. The overall theory includes an ontology whose categories partition into information asset, data element, and business object. Information assets play an important role in our theory because once identified, information assets are 1) decomposed into other information assets and data elements, the latter of which are usually linked to the current-state IT environment systems, 2) linked to business objects in order to construct an ontological model of the information that will be contained and consumed in the future-state IT environment, and 3) linked to both existing and future capabilities and business rules. The first item provides traceability to the existing systems as sources of information, and the latter two items provide a connection to the future d...
This paper describes the TextLearner prototype, a knowledgeacquisition program that represents th... more This paper describes the TextLearner prototype, a knowledgeacquisition program that represents the culmination of the DARPA-IPTO-sponsored Reading Learning Comprehension seedling program, an effort to determine the feasibility of autonomous knowledge acquisition through the analysis of text. Built atop the Cyc Knowledge Base and implemented almost entirely in the formal representation language of CycL, TextLearner is an anomaly in the way of Natural Language Understanding programs. The system operates by generating an information-rich model of its target document, and uses that model to explore learning opportunities. In particular, TextLearner generates and evaluates hypotheses, not only about the content of the target document, but about how to interpret unfamiliar natural language constructions. This paper focuses on this second capability and describes four algorithms TextLearner uses to acquire rules for interpreting text.
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly sem... more This paper describes a novel, unsupervised method of word sense disambiguation that is wholly semantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness approaches to disambiguation that rely on string cooccurrence or relative location in a taxonomy or concept map by 1) exploiting a rich array of properties, including higher-order properties, not available in merely taxonomic (or other first-order) systems, and 2) appealing to the semantic contribution a word sense makes to the content of the target text. Experiments show that this method produces results markedly better than chance when disambiguating word senses in a corpus of topically unrelated documents.
The Cyc KB has a rich pre-existing ontology for representing common sense knowledge. To clarify a... more The Cyc KB has a rich pre-existing ontology for representing common sense knowledge. To clarify and enforce its terms' semantics and to improve inferential efficiency, the Cyc ontology contains substantial meta-level knowledge that provides definitional information about its terms, such as a type hierarchy. This paper introduces a method for converting that meta-knowledge into biases for ILP systems. The process has three stages. First, a "focal position" for the target predicate is selected, based on the induction goal. Second, the system determines type compatibility or conflicts among predicate argument positions, and creates a compact, efficient representation that allows for syntactic processing. Finally, mode declarations are generated, taking advantage of information generated during the first and second phases.
Proceedings of the 19th International Florida …, 2006
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly se-... more This paper describes a novel, unsupervised method of word sense disambiguation that is wholly se-mantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness ap-proaches to disambiguation ...
This paper describes the TextLearner prototype, a knowledgeacquisition program that represents th... more This paper describes the TextLearner prototype, a knowledgeacquisition program that represents the culmination of the DARPA-IPTO-sponsored Reading Learning Comprehension seedling program, an effort to determine the feasibility of autonomous knowledge acquisition through the analysis of text. Built atop the Cyc Knowledge Base and implemented almost entirely in the formal representation language of CycL, TextLearner is an anomaly in the way of Natural Language Understanding programs. The system operates by generating an information-rich model of its target document, and uses that model to explore learning opportunities. In particular, TextLearner generates and evaluates hypotheses, not only about the content of the target document, but about how to interpret unfamiliar natural language constructions. This paper focuses on this second capability and describes four algorithms TextLearner uses to acquire rules for interpreting text.
ABSTRACT This paper describes the hypothesis generation and evidence assembly capabilities of Cyc... more ABSTRACT This paper describes the hypothesis generation and evidence assembly capabilities of Cycorp's Noöscape application, which exploits the strengths of the Cyc knowledge base and infer- ence engine. Noöscape allows analysts to com- pose questions using simple English templates, and tries to find answers via deductive reason- ing. If deduction is fruitless Noöscape resorts to abduction, filling in missing pieces of logical ar- guments with plausible conjectures to obtain an- swers that are only partly supported by the facts available to the inference engine. For each con- jecture, Noöscape automatically generates an augmented information retrieval (IR) query for dispatch to the analyst's preferred search en- gines. If the IR query results allow the analyst to determine the truth or falsity of the conjec- ture, a single mouse button click communicates this knowledge to Noöscape and causes the an- swer (and its supporting argument) to be con- firmed or rejected. Noöscape presents its an- swers and chains of reasoning in English and provides source references for every assertion, making these key textual elements readily avail- able for inclusion in the analyst's reports.
Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, th... more Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, there is currently enough knowledge in Cyc for it to be feasible to attempt to acquire additional knowledge autonomously. This paper describes a system that can collect and validate formally represented, fully-integrated knowledge from the Web or any other electronically available text corpus, about various entities of interest (e.g. famous people, organizations, etc.). Experimental results and lessons learned from their analysis are presented.
The Cyc project is predicated on the idea that effective machine learning depends on having a cor... more The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information -what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
From the beginning, a primary goal of the Cyc project has been to build a large knowledge base co... more From the beginning, a primary goal of the Cyc project has been to build a large knowledge base containing a store of formalized background knowledge suitable for supporting reasoning in a variety of domains. In this paper, we will discuss the portion of Cyc technology that has been released in open source form as OpenCyc, provide examples of the content available in ResearchCyc, and discuss their utility for the future development of fully formalized knowledge bases.
This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gath... more This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gathering, organizing and sharing information about entities of interest (be they people, organizations, events or some other type of entity). The formal representations used in the Fact Sheets allow users to easily share information with others, run automated queries against the information, and allow the system to attempt to automatically gather and verify information before presenting it to the analyst. The system automatically keeps track of provenance (both which document a fact came from, and who interpreted the document). When gathering information automatically, the system produces a variety of search strings (using all known names for the entity) and then scours its sources for possible answers. Individual analysts can specify what types of information they are interested in for different types of entities, and can also specify additional patterns that can be used for finding that type of information. Once knowledge has been retrieved from the Web (or any other textual corpus) and ingested by the system, it is available to other analysts both for their own queries, and also to fit into Fact Sheets of their own design.
This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gath... more This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gathering, organizing and sharing information about entities of interest (be they people, organizations , events or some other type of entity). The formal representations used in the Fact Sheets allow users to easily share information with others , run automated queries against the information , and allow the system to attempt to automatically gather and verify information before presenting it to the analyst. The system automatically keeps track of provenance (both which document a fact came from, and who interpreted the document). When gathering information automatically, the system produces a variety of search strings (using all known names for the entity) and then scours its sources for possible answers. Individual analysts can specify what types of information they are interested in for different types of entities, and can also specify additional patterns that can be used for finding tha...
This paper describes the hypothesis generation and evidence assembly capabilities of Cycorp’s Noö... more This paper describes the hypothesis generation and evidence assembly capabilities of Cycorp’s Noöscape application, which exploits the strengths of the Cyc knowledge base and inference engine. Noöscape allows analysts to compose questions using simple English templates, and tries to find answers via deductive reasoning. If deduction is fruitless Noöscape resorts to abduction, filling in missing pieces of logical arguments with plausible conjectures to obtain answers that are only partly supported by the facts available to the inference engine. For each conjecture, Noöscape automatically generates an augmented information retrieval (IR) query for
From the beginning, a primary goal of the Cyc project has been to build a large knowledge base co... more From the beginning, a primary goal of the Cyc project has been to build a large knowledge base containing a store of
The Cyc project is predicated on the idea that effective machine learning depends on having a cor... more The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information -- what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, th... more Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, there is currently enough knowledge in Cyc for it to be feasible to attempt to acquire additional knowledge autonomously. This paper describes a system that can collect and validate formally represented, fully-integrated knowledge from the Web or any other electronically available text corpus, about various entities of interest (e.g. famous people, organizations, etc.). Experimental results and lessons learned from their analysis are presented.
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly sem... more This paper describes a novel, unsupervised method of word sense disambiguation that is wholly semantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness approaches to disambiguation that rely on string cooccurrence or relative location in a taxonomy or concept map by 1) exploiting a rich array of properties, including higher-order properties, not available in merely taxonomic (or other first-order) systems, and 2) appealing to the semantic contribution a word sense makes to the content of the target text. Experiments show that this method produces results markedly better than chance when disambiguating word senses in a corpus of topically unrelated documents.
In this paper we describe an ontological theory centered on information assets as a key invariant... more In this paper we describe an ontological theory centered on information assets as a key invariant for Business Process Reengineering. We also provide examples formulations from its application in a real-world reengineering project. The overall theory includes an ontology whose categories partition into information asset, data element, and business object. Information assets play an important role in our theory because once identified, information assets are 1) decomposed into other information assets and data elements, the latter of which are usually linked to the current-state IT environment systems, 2) linked to business objects in order to construct an ontological model of the information that will be contained and consumed in the future-state IT environment, and 3) linked to both existing and future capabilities and business rules. The first item provides traceability to the existing systems as sources of information, and the latter two items provide a connection to the future d...
This paper describes the TextLearner prototype, a knowledgeacquisition program that represents th... more This paper describes the TextLearner prototype, a knowledgeacquisition program that represents the culmination of the DARPA-IPTO-sponsored Reading Learning Comprehension seedling program, an effort to determine the feasibility of autonomous knowledge acquisition through the analysis of text. Built atop the Cyc Knowledge Base and implemented almost entirely in the formal representation language of CycL, TextLearner is an anomaly in the way of Natural Language Understanding programs. The system operates by generating an information-rich model of its target document, and uses that model to explore learning opportunities. In particular, TextLearner generates and evaluates hypotheses, not only about the content of the target document, but about how to interpret unfamiliar natural language constructions. This paper focuses on this second capability and describes four algorithms TextLearner uses to acquire rules for interpreting text.
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly sem... more This paper describes a novel, unsupervised method of word sense disambiguation that is wholly semantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness approaches to disambiguation that rely on string cooccurrence or relative location in a taxonomy or concept map by 1) exploiting a rich array of properties, including higher-order properties, not available in merely taxonomic (or other first-order) systems, and 2) appealing to the semantic contribution a word sense makes to the content of the target text. Experiments show that this method produces results markedly better than chance when disambiguating word senses in a corpus of topically unrelated documents.
The Cyc KB has a rich pre-existing ontology for representing common sense knowledge. To clarify a... more The Cyc KB has a rich pre-existing ontology for representing common sense knowledge. To clarify and enforce its terms' semantics and to improve inferential efficiency, the Cyc ontology contains substantial meta-level knowledge that provides definitional information about its terms, such as a type hierarchy. This paper introduces a method for converting that meta-knowledge into biases for ILP systems. The process has three stages. First, a "focal position" for the target predicate is selected, based on the induction goal. Second, the system determines type compatibility or conflicts among predicate argument positions, and creates a compact, efficient representation that allows for syntactic processing. Finally, mode declarations are generated, taking advantage of information generated during the first and second phases.
Proceedings of the 19th International Florida …, 2006
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly se-... more This paper describes a novel, unsupervised method of word sense disambiguation that is wholly se-mantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness ap-proaches to disambiguation ...
This paper describes the TextLearner prototype, a knowledgeacquisition program that represents th... more This paper describes the TextLearner prototype, a knowledgeacquisition program that represents the culmination of the DARPA-IPTO-sponsored Reading Learning Comprehension seedling program, an effort to determine the feasibility of autonomous knowledge acquisition through the analysis of text. Built atop the Cyc Knowledge Base and implemented almost entirely in the formal representation language of CycL, TextLearner is an anomaly in the way of Natural Language Understanding programs. The system operates by generating an information-rich model of its target document, and uses that model to explore learning opportunities. In particular, TextLearner generates and evaluates hypotheses, not only about the content of the target document, but about how to interpret unfamiliar natural language constructions. This paper focuses on this second capability and describes four algorithms TextLearner uses to acquire rules for interpreting text.
ABSTRACT This paper describes the hypothesis generation and evidence assembly capabilities of Cyc... more ABSTRACT This paper describes the hypothesis generation and evidence assembly capabilities of Cycorp's Noöscape application, which exploits the strengths of the Cyc knowledge base and infer- ence engine. Noöscape allows analysts to com- pose questions using simple English templates, and tries to find answers via deductive reason- ing. If deduction is fruitless Noöscape resorts to abduction, filling in missing pieces of logical ar- guments with plausible conjectures to obtain an- swers that are only partly supported by the facts available to the inference engine. For each con- jecture, Noöscape automatically generates an augmented information retrieval (IR) query for dispatch to the analyst's preferred search en- gines. If the IR query results allow the analyst to determine the truth or falsity of the conjec- ture, a single mouse button click communicates this knowledge to Noöscape and causes the an- swer (and its supporting argument) to be con- firmed or rejected. Noöscape presents its an- swers and chains of reasoning in English and provides source references for every assertion, making these key textual elements readily avail- able for inclusion in the analyst's reports.
Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, th... more Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, there is currently enough knowledge in Cyc for it to be feasible to attempt to acquire additional knowledge autonomously. This paper describes a system that can collect and validate formally represented, fully-integrated knowledge from the Web or any other electronically available text corpus, about various entities of interest (e.g. famous people, organizations, etc.). Experimental results and lessons learned from their analysis are presented.
The Cyc project is predicated on the idea that effective machine learning depends on having a cor... more The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information -what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
From the beginning, a primary goal of the Cyc project has been to build a large knowledge base co... more From the beginning, a primary goal of the Cyc project has been to build a large knowledge base containing a store of formalized background knowledge suitable for supporting reasoning in a variety of domains. In this paper, we will discuss the portion of Cyc technology that has been released in open source form as OpenCyc, provide examples of the content available in ResearchCyc, and discuss their utility for the future development of fully formalized knowledge bases.
This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gath... more This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gathering, organizing and sharing information about entities of interest (be they people, organizations, events or some other type of entity). The formal representations used in the Fact Sheets allow users to easily share information with others, run automated queries against the information, and allow the system to attempt to automatically gather and verify information before presenting it to the analyst. The system automatically keeps track of provenance (both which document a fact came from, and who interpreted the document). When gathering information automatically, the system produces a variety of search strings (using all known names for the entity) and then scours its sources for possible answers. Individual analysts can specify what types of information they are interested in for different types of entities, and can also specify additional patterns that can be used for finding that type of information. Once knowledge has been retrieved from the Web (or any other textual corpus) and ingested by the system, it is available to other analysts both for their own queries, and also to fit into Fact Sheets of their own design.
Uploads
Papers by John Cabral