Employing A Domain Specific Ontology To Perform Semantic Search
Employing A Domain Specific Ontology To Perform Semantic Search
Employing A Domain Specific Ontology To Perform Semantic Search
Abstract. Increasing the relevancy of Web search results has been a major concern in research over the last years. Boolean search, metadata, natural language based processing and various other techniques have been applied to improve the quality of search results sent to a user. Ontology-based methods were proposed to rene the information extraction process but they have not yet achieved wide adoption by search engines. This is mainly due to the fact that the ontology building process is time consuming. An all inclusive ontology for the entire World Wide Web might be dicult if not impossible to construct, but a specic domain ontology can be automatically built using statistical and machine learning techniques, as done with our tool: SeseiOnto. In this paper, we describe how we adapted the SeseiOnto software to perform Web search on the Wikipedia page on climate change. SeseiOnto, by using conceptual graphs to represent natural language and an ontology to extract links between concepts, manages to properly answer natural language queries about climate change. Our tests show that SeseiOnto has the potential to be used in domain specic Web search as well as in corporate intranets.
Introduction
Succeeding in the management of information is nowadays all about coping with the tremendous amount of available knowledge. Huge corporations, small organizations as well as individuals are all confronted to an overload of data. Information retrieval is a young science and methods to extract documents from the Web or from corpora are not awless. Boolean search is still the preferred way to retrieve data. This approach, although ecient, has the disadvantage of not being easy to use for specic queries since the choice of logical operators most relevant to the query is not straightforward [8]. To sort through the enormous amount of information available on the Web, researchers proposed a semantic approach to the problem. Data on the Web and in corporate intranets structured using HTML could be stored together with semantic description of its content. That way, information retrieval would be greatly facilitated [16]. Hence, a proposed solution is to rely on an ontology to extract
P. Eklund and O. Haemmerl (Eds.): ICCS 2008, LNAI 5113, pp. 242254, 2008. e c Springer-Verlag Berlin Heidelberg 2008
243
concepts and their relations from these pages. The construction of an ontology is however time consuming, particularly with large document databases [5]. However for a restricted eld of knowledge, it is possible to consider employing an ontology since the core concepts and relations of a small domain are usually more constrained [4]. The use of an ontology allows the query language to accept natural language-based sentences. Moreover, the automatic creation and update of the ontology could provide a way to manage the information in an evolving corpus as well as improving domain restricted search done on the World Wide Web. In this article, we present the SeseiOnto software, an information retrieval tool that uses natural language processing (NLP) as its search interface and an automatically generated ontology to obtain semantics about a domain. SeseiOnto uses conceptual graphs to process natural language and to evaluate the relation between a query and a document. We applied this method in the context of the 2008 ICCS Challenge. The goal of this challenge was to see how a tool that uses conceptual graphs could be used to support research on climate change. Consequently, the challenge required the tool to be evaluated using data taken from the Wikipedia page on climate change. We found out that SeseiOnto can correctly pinpoint signicant answers to natural language queries about climate change. Thus, Section 2, presents similar approaches to SeseiOnto. In Section 3, we detail how the software works. In Section 4, we analyze the dierent results obtained by SeseiOnto in the context of the ICCS Challenge. Section 5 is a review of SeseiOnto main strengths and weaknesses, and provides an introduction on future work.
Similar Approaches
There already exist dierent semantic information retrieval methods and systems, each one having its own advantages and limitations. In this section, we briey present similar work to our own. In [17], the authors presents a system that sorts documents returned by Google using a dynamically created taxonomy. This taxonomy is built using the same documents that are returned by Google for a specic user query. This relates considerably to the method that was used by the Sesei software [15], the predecessor of SeseiOnto. Therefore, this taxonomy is employed to improve the users search experience by returning documents that are the most signicant with regard to his query. One of the drawbacks of this approach is that it may not be necessary to build an ontology dynamically for every query. A domain ontology, although possibly less oriented towards the users query, could correctly provide an appropriate answer. Furthermore, documents returned from the Web will probably contain a lot of noise and information that is by no means related to the topic. In our opinion, a domain ontology contains enough semantics to cover a wide range of queries. Moreover, it needs to be updated only when the document base evolves, which is much less frequently than with each query. In [4], the author presents a method to build an ontology using expertcreated sources containing similar information. The source for this hierarchy
244
construction process is made of tables coming from Web pages. The ontology construction process presented in this work begins with a small human-built ontology requiring a thorough knowledge of the domain. Nevertheless, an interesting point made by this research is that a lot of emphasis seems to have been put on evolutionary data, which is particularly important in the context of corporate Intranets where the content constantly evolves. Another method presented in [3] is focusing on building hierarchical representation of natural language sentences using a set of rewrite rules. These rules describe subsumption relations between various text representations. The hierarchy is then employed to determine if the meaning of a given sentence entails that of another. The main similarity of this work with ours is that their analysis of text is based on a type of transformation rules. However, this approach could be time consuming if large corpora of texts were used. In [2], a sophisticated question answering system used for passage retrieval is described. This approach employs a fuzzy relation matching technique to answer queries. Similar grammatical relations are identied between queries and passages to evaluate their degree of relevancy. This system was tested in the context of the Text REtrieval Conference (TREC). Their results indicate that sophisticated relation matching techniques seems to have a strong potential for natural language question answering. The inclusion of fuzzy CGs in our algorithm is to be explored
SeseiOnto
SeseiOnto [12][11] is a standalone application used to perform semantic search on a corpus of textual documents. It aims at being an alternative to traditional Boolean search engines by providing a mean of integrating NLP-based querying and ontology extraction. Natural language is processed using the representation power of conceptual graphs; and ontologies are automatically built using the Text-To-Onto software. Figure 1 shows SeseiOnto global process. Natural language-based queries as well as the presumably relevant ones from the corpus are processed by the Connexor syntactic analyzer [7]. Connexors output is converted to CGs using a set of 76 ad hoc transformation rules [13]. The ontology is generated by the Text-ToOnto software and applied on a subset of the documents from the corpus. Using CGs that represent both the query and parts of the documents and employing a domain ontology, SeseiOnto tries to identify potential matches1 . 3.1 SeseiOntos Search Process
SeseiOnto is mainly based on the Sesei software [15]. Sesei was built to answer natural language queries on the World Wide Web using an ontology specic to the users query. Using denitions from WordNet [10], the user has to disambiguate words composing his query. A type hierarchy is created using the denitions provided by the user and the concept hierarchy of WordNet.
1
245
As for SeseiOnto, it initially takes a user query as input. This query is then sent to the Connexor [7] syntactic analyzer. Prepositions and articles are removed and words are lemmatized. Words are matched to the ontology, which is viewed as a type hierarchy by SeseiOnto. If a word from the query is not identied in the ontology, it is added to it. The query is then converted to CGs using the set of 76 transformation rules. According to previous tests [14], this set of rules is broad enough to represent a sucient number of semantic phenomena. An example of a rule is: when a noun(A) is the subject of a verb(B) at active voice in a sentence, it should then be converted to a CG stating that a concept of type A is the agent of concept of type B. Afterwards, sentences from the documents in the corpus, that is our resource documents, are converted to CGs using the same process as with the query. The next step is identifying the quantity of information shared by the query and the resource documents to know which documents are the most relevant. This goal is achieved by calculating a semantic score between the query sentence and sentences from the resource documents. To obtain this semantic score, a set of generalizations of each concept and relation are created using the ontology. To compare two concepts, or two relations, SeseiOnto will try to nd the most common generalization between them. The more specic the generalization, the higher the semantic score will be. An example of this process can be seen in Figure 2, extracted from [15]. The query is Who oers a cure for cancer? and the resource sentence is a big company will market a sedative. A generalization will only be evaluated by SeseiOnto if the concepts from the query graph and the resource graph are linked by relations of the same type.
246
Fig. 2. A query graph, a resource graph and their common generalization. The generalization of market and oer is market in the type hierarchy: trade, merchandise market of f er . . . . The generalization of company and is company, per denition.
Of course, the ontology needs to be extensive enough to make this common generalization search process possible. For more details about this step, see [15]. Hence, every document from the corpus will be assigned a semantic score. Documents and sentences deemed the most relevant, that is, sentences that seem related to the users query, will then be returned by the system. Most relevant sentences within documents are pinpointed, sorted by their semantic score and returned to the user. Additionally, SeseiOnto needs some sort of threshold to discriminate relevant and irrelevant sentences from the resource documents. In SeseiOnto, this threshold is inuenced by the domain ontology and SeseiOnto must rst set it. Therefore, to dene it, a set of queries already matched to relevant documents in the corpus needs to be available. Such a matching can be obtained through manual evaluation of queries and documents by domain experts. Queries are sent to SeseiOnto which will assign a semantic score to each resource documents. Knowing the relevancy of each document, the precision and recall of the output of SeseiOnto can be calculated for every query. This way, SeseiOnto can determine the threshold in terms of the semantic score that maximizes, in average over all queries, the precision and recall. Using a training set and a test set, together with K-Fold Cross Validation, a threshold is established and is used for subsequent user queries. More details about SeseiOntos search method can be found in and [11] and [12]. Such a process proved to be quite eective to compare similarity between two sentences. The next step in our research was to replace the type hierarchy dynamically created using words from the query by an ontology. If queries were related to the same eld of knowledge, the same ontology could probably be reused to perform the query-document matching. Instead of creating our own ontologies, we determined we could learn them automatically.
247
3.2
Ontology Learning
Building an ontology from scratch can easily become a time consuming task. It can require experts from a specic domain and thorough and extensive reasoning to create concepts and relations describing the eld. The broader the domain, the harder it can get to assemble the initial ontology. Maintaining an ontology is also a dicult task. A possible solution to ontology creation might be to construct it automatically [8]. Researchers have employed dierent types of procedures to learn an ontology automatically. Most of them rely on learning an ontology from structured information such as databases, knowledge bases and dictionaries [17]. Others think that unstructured information such as Web pages can provide a powerful mean of creating ontologies from scratch. To be able to perform such a task, a system needs to have strong natural language processing capabilities to create adequate ontologies. A valuable starting point for an ontology containing dierent types of relations between concepts is a taxonomy. A taxonomy is frequently dened as a hierarchical structure comprised of is a links between concepts that describes a specic environment. The ontology is a crucial component of our search process. To build this ontology automatically, we employ the Text-To-Onto software [9]. This ontology is constructed by using a text corpus containing documents pertaining to the same eld of knowledge. Text-To-Onto allows the knowledge engineer to use a general ontology, like WordNet, and adapt it to the domain. Domain specic concepts will be added to the ontology while superuous ones will be pruned from it. To build this ontology automatically, Text-To-Onto uses linguistic patterns and machine learning methods. Text-To-Onto also allows the building of an ontology from scratch by using its TaxoBuilder module. TaxoBuilder also uses machine learning techniques in conjunction with linguistic patterns to create the ontology. The ontology building method depends mostly of what is favored by the user between recall and precision. Deeper ontologies, i.e., with more generalizations/specializations, usually require far more computing time and produce a better recall. Shallow ontologies with many leaf nodes starting from the root tends to generally produce better precision with a shorter processing time. Using this ontology learning approach, we assumed that an ontology built using documents from the corpus provided enough semantics to represent information contained in unseen queries about the domain. SeseiOnto typically uses seven types of ontologies: 1. FCA ontology with lexicographer classes: Ontology built using Formal Concept Analysis and with the 15 WordNet verbal lexicographer classes [10] as root nodes. 2. FCA ontology without lexicographer classes: Ontology built using Formal Concept Analysis without any particular root nodes specied. 3. Vertical Relation Heuristics ontology: Ontology built using compound words found in documents from the corpus. 4. Hearst patterns ontology: Ontology built using Hearst linguistic patterns [6] to build is a relationships.
248
5. Combination of Vertical Relation Heuristics and Hearst patterns ontology: Ontology built using both previous methods. 6. Combination of Vertical Relation Heuristics, Hearst patterns and WordNet ontology: Ontology built using Vertical Relation Heuristics, Hearst patterns and WordNet to build the ontology. 7. Domain adapted WordNet : A modied version of WordNet where specic concepts from the corpus have been added and concepts too general from WordNet have been removed. For more details about Text-To-Onto and TaxoBuilder ontology construction methods, see [1]. After that our method was clearly dened, we then needed a proof-of-concept in a real environment. We therefore had to develop a strategy to analyze the potential of SeseiOnto. 3.3 Past Results
We had the oportunity to test SeseiOnto on one corpus in the past, the Cystic Fibrosis Database (CF Database) [18]. The CF Database contains a set of 1,239 documents together with a set of 100 queries, each one individually matched to corresponding relevant documents. These documents are all abstract of scientic papers about research made on cystic brosis during the 1970s. Domain experts have performed the matching between queries and documents. We managed to achieved interesting results, compared to the ones obtained by a classical Boolean search engine, Coveo2 . To evaluate SeseiOntos performance, we used recall, precision and the F-Measure [19] as a combined metric for that particular purpose. The following formula denes it: F = ( 2 + 1) P recision Recall 2 P recision + Recall
The F-Measure can therefore be considered as the weighted harmonic mean of precision and recall. The weight given to either recall or precision in the formula is expressed with the symbol. In the measure, a lower than 1 gives more importance to precision while a higher than 1 gives more importance to recall. In our tests, we used a of 0.5 to emphasize the importance of precision over recall in our type of application domain. SeseiOnto managed to achieve a recall of 44% and a precision of 41%, which gives an F-Measure of 42%. As for Coveo, the search engines manage to reach a recall of 11% and a precision of 35%, yielding an F-Measure of 25%. However, in terms of processing time, Coveo performs better than SeseiOnto. Coveo can answer a query in milliseconds while SeseiOnto can take up to ve minutes. Nevertheless, SeseiOnto has the major advantage of being able to identify precise sentences within a document that indicate to the user where exactly is the information he is looking for. SeseiOnto also remains a research prototype
2
www.coveo.com
249
and we are convinced search time could easily be improved by using parallel computing, search indexing and preprocessing of documents in the corpus. Furthermore, SeseiOnto provides a natural language search interface which is much more intuitive for a user than using keywords and Boolean operators. SeseiOnto is able to link two dierent concepts without them necessarily being homographs, thus improving recall. A regular search will usually eliminate a document if it does not contain one of the keyword contained in the initial query. By taking into account the semantic structure of the sentences (through Connexor and transformation rules), SeseiOnto manages to improve precision. Having seen that that SeseiOnto had potential, we thereafter started experimentations on other databases.
To evaluate SeseiOnto in a new environment, we selected the 2008 ICCS Challenge as our test bed. We wanted to apply the SeseiOnto techniques on the Wikipedia page on climate change3 . All our tests were performed with the page that was available on Wikipedia on November 28th, 2007. The reader can see in Figure 3 the SeseiOntos workow, in the context of using the software in a restricted domain Web environment. Hence, the user must initially start by sending a natural language query to SeseiOnto and selecting an ontology to perform his search. Afterwards, the query sentence is parsed using Connexor, query words are matched to the selected ontology and a CG is obtained using the set of transformation rules of SeseiOnto. As for resource CGs, documents are obtained from the appropriate Web page (in our case, the Wikipedia page on climate change) and sentences are extracted from the resource document. Resource CGs are built using the same process as with the query CGs. Resource and query CGs are then compared using the method presented in Section 3. Sentences with the highest semantic score are returned to the user. To perform our tests, we applied an empirical approach. We used a set of nite queries on climate change taken from the World Wide Web4 and we manually evaluated how SeseiOnto could answer them. We compared dierent ontologies to see how they each individually performed at answering these specic queries. We also wanted to compare the approach of manually disambiguating the words from query using the denitions from WordNet, i.e., using the search approach of Sesei, SeseiOntos ancestor. We are showing below examples of how SeseiOnto performed at answering some of these queries. For the question How could climate change aect us in the future?, SeseiOnto gave the three following sentences as the rst three answers: An October 29, 2006 report by former Chief Economist and Senior VicePresident of the World Bank Nicholas Stern states that climate change could aect growth, which could be cut by one-fth unless drastic action is taken
3 4
250
A single eruption of the kind that occurs several times per century can aect climate causing cooling for a period of a few years In short climate change can be a self-perpetuating process because dierent aspects of the environment respond at dierent rates and in dierent ways to the uctuations that inevitably occur It is possible to see that SeseiOnto correctly identied that climate change could aect economical growth. To the question How could greenhouse gas emissions be reduced?, SeseiOnto answered: According to a 2006 United Nations report, Livestocks Long Shadow, livestock is responsible for 18% of the worlds greenhouse gas emissions as measured in CO2 equivalents Similarly rising temperatures caused for example by anthropogenic emissions of greenhouse gases could lead to retreating snow lines revealing darker ground underneath and consequently result in more absorption of sunlight These principals can be observed as bubbles which rise in a pot of water heated on a stove or in a glass of cold beer allowed to sit at room temperature gases dissolved in liquids are released under certain circumstances One can deduce that livestock and human activity (anthropogenic emissions) inuenced greenhouse gas emissions. To the question Why should a few degrees of warming be a cause for concern?, SeseiOnto answered:
251
According to these studies, the greenhouse eect, which is the warming produced as greenhouse gases trap heat, plays a key role in regulating Earths temperature There are several examples of rapid changes in the concentrations of greenhouse gases in the Earths atmosphere that do appear to correlate to strong warming, including the PaleoceneEocene thermal maximum, the PermianTriassic extinction event, and the end of the Varangian snowball earth event The biggest factor of present concern is the increase in CO2 levels due to emissions from fossil fuel combustion followed by aerosols matter in the which exerts a cooling eect and cement manufacture With the rst answers, we understand that greenhouse gases play a key role in regulating Earths temperature. To the question How do we know that the atmospheric build-up of greenhouse gases is due to human activity?, SeseiOnto answered: As far as is known the climate system is generally stable with respect to these feedbacks positive feedbacks do not Similarly rising temperatures caused for example by anthropogenic emissions of greenhouse gases could lead to retreating snow lines revealing darker ground underneath and consequently result in more absorption of sunlight According to a 2006 United Nations report, Livestocks Long Shadow, livestock is responsible for 18% of the worlds greenhouse gas emissions as measured in CO2 equivalents. Once again, livestock and human activity is identied as a source of greenhouse gases. Thus we can see that SeseiOnto has the potential to answer natural language queries that contain many dierent linguistic phenomena. The processing time for each query varies between two to three minutes. The answers presented here were produced using WordNet concept hierarchy. Therefore, we had to disambiguate the words composing the query using WordNets denitions. We also did some tests using other ontologies generated by Text-To-Onto and which were based on the Wikipedia page on climate change. We obtained small dierences by using these ontologies. Answers were similar to the one presented here. The main dierence was the order they were presented to the user. The primary advantage of using a Text-To-Onto ontology with SeseiOnto is that the user does not need to disambiguate words composing his query. This can be important because it eases the search process while at the same time preventing the hassle of selecting the correct denitions from WordNet. The dierence between denitions is often subtle and dierent users could choose separate denitions with the same concept meaning in mind. For that reason, we would recommend using a Text-To-Onto generated ontology to perform this type of Web search but thorough testing is necessary to prove this theory. The ontology used by SeseiOnto, whether it be WordNet or Text-To-Onto, should not be considered as a thorough representation of the knowledge contained in the domain. It should be viewed as a sucient semantic representation
252
that allows the system to answer natural language queries. Certainly, an accurate and extensive human-built ontology would provide more information about the domain. However, we assumed that by using a WordNet or a Text-To-Onto ontology, we could more rapidly test our system with dierent corpora without needing to create an ontology by hand for each domain. Consequently, we can suppose that SeseiOnto has the potential to be employed in the context of Web searches done on a specic eld of knowledge. SeseiOnto is able to answer the users query by providing pointers to relevant sentences within a document. Past results have shown that SeseiOnto could also be used in the context of larger documents collections [11].
Conclusion
With this research, we provide a unique analysis on how SeseiOnto can answer a query by selecting sentences within a relatively long document (more than 5,000 words), the Wikipedia page on climate change. This type of tests was never done with SeseiOnto in the past. We found out that within the rst three answers given by our system, at least one is relevant to the query in most cases. The main advantages of SeseiOnto are: It provides a natural language interface; it can pinpoint exact relevant sentences within a document to help the user answer his query; it provides an automatic ontology construction mechanism that can adapt to corpus updates; it seems to improve precision and recall for domain specic corpora or restricted domain Web search; it does not require experts interventions; Its main weaknesses are: Its processing time is relatively long (between two or three minutes); it needs to have a simple mechanism to select the correct ontology for a particular corpus; when WordNet is used, the disambiguation process can be confusing for the user. Despite these drawbacks, we think that with additional testing and minor improvements, we could easily achieve even better results and apply the search methods of SeseiOnto in a professional environment. 5.1 Future Work
For the time being, SeseiOnto only functions as a standalone application. In the coming months, we intend to make it publicly available on the Web. We will start by making a system that can answer any queries relating to the current
253
Wikipedia page on climate change. The user will be able to select the method he wants to use to perform his query, i.e., selecting a particular ontology or using WordNet to disambiguate words from his question. Although we think that SeseiOnto was up to task of the ICCS Challenge, extensive testing is still necessary to assess its full potential. In the case of Wikipedia, including sub-pages that are referenced by a link on the climate change page could provide an interesting way of seeing if SeseiOnto still performs well with additional documents. Testing SeseiOnto on new corpora would also provide an interesting feedback about its possibilities. Moreover, the development of Text-To-Onto is now stopped and has been replaced by the new ontology creation framework, Text2Onto. It could be very interesting to use SeseiOnto with this new environment. Incorporating formal ontologies to our application could also be pertinent since it would more easily permit the evaluation of the coherence of ontologies generated with Text-To-Onto. Improving SeseiOnto processing time is very important if we ever want it to make it publicly available. To do so, indexing documents from the corpus and preconverting documents sentences to CGs would assuredly reduce its search time. In conclusion, SeseiOnto shows that conceptual graphs have an immense potential to represent natural language. Although completely hidden to the user in our software, they are a key element to converting the syntactic representation given by Connexor to a semantic one. The ontology used by SeseiOnto is either automatically constructed with Text-To-Onto or taken from WordNet following the users query words disambiguation. With this ontology, we obtain a simple yet eective way of comparing a query with many sentences coming from the corpus the search is being made on. SeseiOnto was assembled using tools coming from the industry as well as the open-source, research and CG communities. This software is a concrete example on how conceptual structures can be used in an application and how their representation power can be employed to process information. With additional thorough testing, a tool such as SeseiOnto could probably be coupled to other information retrieval and information extraction applications. Such a coupling could provide a whole new range of possibilities in the Semantic Web context.
References
1. Bloehdorn, S., Cimiano, P., Hotho, A., Staab, S.: An ontology-based framework for text mining. GLDV-Journal for Computational Linguistics and Language Technology 20, 87112 (2005) 2. Cui, H., Sun, R., Li, K., Kan, M.-Y., Chua, T.-S.: Question answering passage retrieval using dependency relations. In: International Conference on Research and Development in Information Retrieval (SIGIR), Salvador, Brazil, pp. 400407. ACM Press, New York (2005) 3. de Salvo Braz, R., Girju, R., Punyakanok, V., Roth, D., Sammons, M.: An inference model for semantic entailment in natural language. In: Machine Learning Challenges Workshop, Pittsburgh, USA, pp. 261286 (2005)
254
4. Embley, D.W.: Towards semantic understanding an approach based on information extraction ontologies. In: Database Technologies 2004, Proceedings of the fteenth Australasian database conference, Dunedin, New Zealand (2004) 5. Gmez-Prez, A., Fernndez-Lpez, M., Corcho, O.: Ontological Engineering with o e a o examples from the areas of Knowledge Management, e-Commerce and the Semantic Web. Springer, Heidelberg (2004) 6. Hearst, M.: Automated discovery of wordnet relations. In: Fellbaum, C. (ed.) In WordNet: An Electronic Lexical Database and Some of its Applications. MIT Press, Cambridge (1998) 7. Jrvinen, T., Tapanainen, P.: Towards an implementable dependency grammar. a CoRR cmp-lg/9809001 (1998) 8. Maedche, A.: Ontology Learning for the Semantic Web. Kluwer Academic Publishers, Norwell, USA (2002) 9. Maedche, A., Staab, S.: Mining ontologies from text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189202. Springer, Heidelberg (2000) 10. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An on-line lexical database. Journal of Lexicography 3(4), 234244 (1990), ftp://ftp.cogsci.princeton.edu/pub/wordnet/5papers.ps 11. Morneau, M., Mineau, G.W., Corbett, D.: SeseiOnto: Interfacing NLP and ontology extraction. In: WI 2006: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, China, pp. 449455. IEEE Computer Society Press, Los Alamitos (2006) 12. Morneau, M., Mineau, G.W., Corbett, D.: Using an automatically generated ontology to improve information retrieval. In: First Conceptual Structures Tool Interoperability Workshop (CS-TIW 2006), Aalborg, Denmark, pp. 119134. Aalborg University Press (2006) 13. Nicolas, S.: Sesei: un ltre semantique pour les moteurs de recherche conventionnels par comparaison de structures de connaissance extraites depuis des textes en langage naturel. Masters thesis, Dpartement dinformatique et de gnie logiciel, e e Universit Laval (2003) e 14. Nicolas, S., Mineau, G., Moulin, B.: Extracting conceptual structures from english texts using a lexical ontology and a grammatical parser. In: Sup.Proc. of 10th International Conference on Conceptual Structures, ICCS 2002, Borovets, Bulgaria (2002) 15. Nicolas, S., Moulin, B., Mineau, G.W.: Sesei: A CG-based lter for internet search engines. In: Conceptual Structures for Knowledge Creation and Communication. 11th International Conference on Conceptual Structures, Dresden, Germany, pp. 362377. Springer, Heidelberg (2003) 16. Paliouras, G.: On the need to bootstrap ontology learning with extraction grammar learning. In: Dau, F., Mugnier, M.-L., Stumme, G. (eds.) ICCS 2005. LNCS (LNAI), vol. 3596, pp. 119135. Springer, Heidelberg (2005) 17. Snchez, D., Moreno, A.: Automatic generation of taxonomies from the WWW. a In: Karagiannis, D., Reimer, U. (eds.) PAKM 2004. LNCS (LNAI), vol. 3336, pp. 208219. Springer, Heidelberg (2004) 18. Shaw, W., Wood, J., Wood, R., Tibbo, H.: The cystic brosis database: Content and research opportunities. Library and Information Science Research 13, 347366 (1991) 19. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp. 4249. ACM Press, New York (1999)