Skip to main content

Michele Vindigni

Followers

10

Following

4

Co-authors

4

Public Views

Vrije Universiteit Amsterdam

University of Alicante / Universidad de Alicante

The University of Hong Kong

Armando Stellato

"Tor Vergata" University of Rome

Adam Kilgarriff

Lexical Computing Ltd, UK

Stanisław Szpakowicz

Rodolfo Delmonte

Vincenzo Pallotta

HEIG-VD

Interests

Uploads

Papers by Michele Vindigni

An empirical approach to Lexical Tuning

NLP systems crucially depend on the knowledge structures devoted to describing and representing w... more NLP systems crucially depend on the knowledge structures devoted to describing and representing word senses. Although automatic Word Sense Disambiguation (WSD) is now an established task within empirically-based computational approaches to NLP, the suitability of the available set (and granularity) of senses is still a problem. Application domains exhibit specic behaviors that cannot be fully predicted in advance. Suitable adaptation mechanisms have to be made available to NLP systems to tune existing large scale sense repositories to the practical needs of the target application, such as information extraction or machine translation. In this paper we describe a model of "lexical tuning" {the systematic adaptation of a lexicon to a corpus|that specializes the set of verb senses required for an NLP application, and builds inductively the corresponding lexical descriptions for those senses.

Look-ahead heuristics for the dynamic traveling purchaser problem

by Michele Vindigni and R. Mansini

Computers and Operations Research, Dec 1, 2011

ABSTRACT Given a set of products each with positive discrete demand, and a set of markets selling... more ABSTRACT Given a set of products each with positive discrete demand, and a set of markets selling products at given prices, the traveling purchaser problem (TPP) looks for a tour visiting a subset of markets such that products demand is satisfied at minimum purchasing and traveling costs. In this paper we analyze a dynamic variant of the problem, where quantities may decrease as time goes on. Complete information is assumed on current state of the world, i.e.decision maker knows quantities available for each product in each market at present time and is informed about any consumption event when it occurs. Nevertheless, planner does not have any information on future events. Two groups of heuristics are described and compared. The first group consists of simplified approaches deciding which market to visit next on the basis of some greedy criteria considering only one of the two objective costs. The second one includes heuristics based on a look-ahead approach taking into account both traveling and purchasing costs and inserting some future prediction. Heuristics behavior has been tested on a large set of randomly generated instances under different levels of dynamism.

Integrating Semantic Lexicons and Domain Ontologies

The cross-fertilization between advanced data modeling technologies (as aimed in the Semantic Web... more The cross-fertilization between advanced data modeling technologies (as aimed in the Semantic Web) and NLP is a very interesting research line. In this paper, we investigate a solution of a particular problem in this area: the intregration between a concept hierarchies (DCH) with a general-purpose linguistic knowledge base (LKB). The method we propose relies only on the taxonomical knowledge of

Tuning lexicons to new operational scenarios

In this paper the role of the lexicon within typical applicat ion tasks based on NLP is analysed.... more In this paper the role of the lexicon within typical applicat ion tasks based on NLP is analysed. A large scale semantic lexicon is studied within the framework of a NLP application. The coverage of the lexicon with respect the target domain and a (semi)automatic tuning approach have been evaluated. The impact of a corpus-driven inductive architecture aiming to compensate

Integrating Ontological and Linguistic Knowledge for E-Learning

In computer-aided and Web-based learning a critical role is played by the methods and frameworks ... more In computer-aided and Web-based learning a critical role is played by the methods and frameworks for Information Access. First of all, intelligent methods for retrieval and synthesis of information are crucial for making available timely and correct information during the training process. Moreover, the level of interaction between the target user and the retrieval subsystem constitutes an important quality factor

Purchasing the Web: an Agent based E-retail System with Multilingual Knowledge

The more than enthusiastic success obtained by e-commerce and the continuous growth of the WWW ha... more The more than enthusiastic success obtained by e-commerce and the continuous growth of the WWW has radically changed the way people look for and purchase commercial products. E-retail stores offer any sort of goods through evermore appealing web pages, sometimes even including their own search-engines to help customers find their loved products. With all this great mass of information available,

Adapting a sub-categorization lexicon to a domain

ALINAs: un'architettura multi-layer ad agenti per il supporto alla comunicazione linguistica

Workshop From Objects to Agents, 2002

Si presenta nel seguito ALINAs (Architecture for LINguistic Agents): una architettura multi-layer... more Si presenta nel seguito ALINAs (Architecture for LINguistic Agents): una architettura multi-layer ad agenti specializzata al supporto della comunicazione linguistica. Essa si compone di una infrastruttura nella quale sono implementati i meccanismi di Percezione-Ragionamento-Azione (PRA) di base oltre al supporto per la comunicazione comprendente i performativi più comuni ai diversi linguaggi di comunicazione tra agenti (ACL), e di un layer specifico dedicato alla comunicazione linguistica.

The Stochastic and Dynamic Traveling Purchaser Problem

by Michele Vindigni and R. Mansini

Transportation Science, 2016

Adaptive, Multilingual Named Entity Recognition in Web Pages

European Conference on Artificial Intelligence, 2004

Abstract. Most of the information on the Web today is in the form of HTML documents, which are de... more

Agent to Agent Talk: “Nobody There?” Supporting Agents Linguistic Communication

Whitestein Series in Software Agent Technologies, 2005

ABSTRACT World-Wide Web technologies and the vision of Semantic Web have pushed for adaptive SW a... more ABSTRACT World-Wide Web technologies and the vision of Semantic Web have pushed for adaptive SW applications to scale up information technologies to the Web, where information is organized following different underlying knowledge and/or presentation models. Interoperability among heterogeneous intelligent agents has become an important research topic in the context of distributed information systems. Communication among heterogeneous agents involves several dimensions. “Ontological commitment” on a shared knowledge model cannot be assumed as a default. To overcome this problem, we will describe in this article a communication model that bases on the use of natural language. We will argue on main topics involved in using natural language to achieve semantic agreement in agents communication. The model foresees a strong separation among terms and concepts, this difference being often undervalued in the literature, where terms play the ambiguous role of both concept labels and of communication lexicon. For agents communicating through the language, lexical information embodies instead the possibility to “express” the underlying conceptualizations thus agreeing to a shared representation. We will examine in details the different layers involved in agents communication and we will focus on a the different roles played by each element. A novel agent architecture able to tackle with possible linguistic ambiguities by focusing on the conversational level will be deeply described. Three different agent typologies will be presented: Resource agents, embodying the target knowledge, Service agents, providing basic skills to support complex activities and control agents, supplying the structural knowledge of the task, with coordination and control capabilities. NL communication is supported by two dedicated Service agents: a Mediator, that will handle conceptual mismatches arising during communication, and a Translator, dealing with lexical misalignments due to different languages/idioms.

Ontology integration in a multilingual e-retail system

The advent of e-commerce and the continuous growth of the WWW led to a new generation of e- retai... more The advent of e-commerce and the continuous growth of the WWW led to a new generation of e- retail stores. A number of commercial agent-based systems have been developed to help Internet shoppers decide what to buy and where to buy it from. In such systems, ontologies play a crucial role in supporting the exchange of business data, as they provide a formal vocabulary for the information and unify different views of a domain in a safe cognitive approach. Based on this assumption, inside CROSSMARC (a European research project supporting development of an agent-based multilingual information extraction system from web pages), an ontology architecture has been developed in order to organize the information provided by different resources in several languages. CROSSMARC ontology aims to support all the different activities carried on by the system's agents. The ontological architecture is based on three different layers: (1) a meta-layer that represents the common semantics that will...

Un Pool di Agenti a Supporto della Comunicazione Linguistica

Gli agenti rappresentano un paradigma per la realiz- zazione di applicazioni distribuite su larga... more Gli agenti rappresentano un paradigma per la realiz- zazione di applicazioni distribuite su larga scala che si focalizza sulle interazioni tra processi autonomi ed etero- genei. Il problema della comunicazione di informazioni tra due entitae legato ad una serie di dimensioni diverse per le quali non sempre esiste una standardizzazione "de fac- to" sufficiente a garantire la consistenza dell'informazione scambiata. In questo articolo si analizzano le problem- atiche riguardanti la trasmissione di informazione attraver- so il linguaggio naturale, nell'ipotesi che gli agenti coin- volti possano o meno condividere una stessa ontologia, e quindi si descrive un modello originale di architettura a supporto alla comunicazione basata su un insieme di agenti cooperanti finalizzati alla risoluzione delle specifiche necessit` a emergenti dall'interazione.

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1

by Michele Vindigni and A. Stellato

E-commerce and the continuous growth of the WWW has seen the rising of a new generation of e-reta... more E-commerce and the continuous growth of the WWW has seen the rising of a new generation of e-retail sites. A number of commercia l agent-based systems has been developed to help Internet shoppers decide what to buy and wh ere to buy it from. In such systems, ontologies play a crucial role in supporting the ex change of business data, as they provide a formal vocabulary for the information and unify dif ferent views of a domain in a shared and safe cognitive approach. In CROSSMARC (a European research project supporting development of an agent-based multilingual/multi-domain system for information extraction (IE) from web pages), a knowledge based approach has been combined with machine learning techniques (in particular, wrapper induction based components) in order to design a robust system for extracting information from relevant web sites. In the ever-changing Web framework this hybrid approach supports adaptivity to new emerging concepts and a certain degree of independence from ...

AI/NLP Technologies Applied to Spacecraft Mission Design

Lecture Notes in Computer Science, 2005

In this paper we propose the model of a prototypical NLP architecture of an information access sy... more In this paper we propose the model of a prototypical NLP architecture of an information access system to support a team of experts in a scientific design task, in a shared and heterogeneous framework. Specifically, we believe AI/NLP can be helpful in several tasks, such as the extraction of implicit information needs enclosed in meeting minutes or other documents, analysis of explicit information needs expressed through Natural Language, processing and indexing of document collections, extraction of required information from documents, modeling of a common knowledge base, and, finally, identification of important concepts through the automatic extraction of terms. In particular, we envisioned this architecture in the specific and practical scenario of the Concurrent Design Facility (CDF) of the European Space Agency (ESA), in the framework of the SHUMI project (Support To HUman Machine Interaction) developed in collaboration with the ESA/ESTEC -ACT (Advanced Concept Team).

Corpus-driven unsupervised learning of verb subcategorization frames

Lecture Notes in Computer Science, 1997

Abstract. The behavior of verbs in sublanguages is highly specific and does not follow general pr... more

Agents Based Ontological Mediation in IE Systems

Lecture Notes in Computer Science, 2003

Building more adaptive SW applications is a crucial issue to scale up IE technology to the Web, w... more Building more adaptive SW applications is a crucial issue to scale up IE technology to the Web, where information is organized following different underlying knowledge and/or presentation models. Information agents are more and more being adopted to support extraction of relevant information from semi-structured web sources. To efficiently manage heterogeneous information sources they must be able to cooperate, to share

Integrating ontological and linguistic knowledge for conceptual information extraction

Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003

In computer-aided and Web-based learning a critical role is played by the methods and frameworks ... more In computer-aided and Web-based learning a critical role is played by the methods and frameworks for Information Access. First of all, intelligent methods for retrieval and synthesis of information are crucial for making available timely and correct information during the training process. Moreover, the level of interaction between the target user and the retrieval subsystem constitutes an important quality factor for the learning process. The abstraction supported by the linguistic generalization adopted in the interaction is an inherent component of the student development: the higher is the linguistic level, the faster and more effective is the training. In order to ensure the proper abstraction level, the retrieval component should make use of capabilities for generalization throughout a number of phases: indexing, retrieval, organisation and presentation. All these tasks require thus an underlying concept-oriented approach usually relying on ontological resources. In particular tasks like indexing and presentation are also faced (in both directions of input/recognition and output/production) with linguistic data: source texts and dialogue/interaction sessions, respectively. In both cases, i.e. text understading and natural language generation, a non trivial process of semantic recognition is involved. All the above implies that strong assumptions about the conceptualisation of the underlying knowledge domain are usually made in e-learnig. However, building domain conceptualisations from scratch is a very complex and timeconsuming task. Traditionally, the reuse of available domain resources, although not constituting always the best, has been applied as an accurate and cost effective solution. This paper presents a method to exploit sources of domain knowledge (e.g. a subject reference system as a controlled language for document indexing and classification) used to build a linguistically motivated domain concept hierarchy. Because in the specific perspective of supporting linguistic inference in Information Extraction and Retrieval (IE/IR) for e-learning, the use of domain taxonomies as ontological resources is not straightforward. We discuss here how a method for integrating the taxonomical domain knowledge and a general-purpose lexical knowledge base (like WordNet) can be used for improving the accuracy and flexibility of IE.

Multi-lingual XML-Based Named Entity

We describe the multi-lingual Named Entity Recognition and Classification (NERC) subpart of an in... more We describe the multi-lingual Named Entity Recognition and Classification (NERC) subpart of an information extraction system, which is currently under development as part of the EU-funded project CROSSMARC. The two main CROSSMARC goals are to develop commercial-strength technologies based on language processing methodologies for information extraction from web pages and to provide automated techniques for efficient customisation i.e. extension of the system to new product domains and languages. To achieve our goals we use XML as a common exchange format, we exploit a common ontology and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages, which contain heavily structured data where text is intermingled with HTML and other code. Our evaluation results demonstrate the viability of our approach.

I rischi e la gestione della sicurezza informatica

An empirical approach to Lexical Tuning

NLP systems crucially depend on the knowledge structures devoted to describing and representing w... more NLP systems crucially depend on the knowledge structures devoted to describing and representing word senses. Although automatic Word Sense Disambiguation (WSD) is now an established task within empirically-based computational approaches to NLP, the suitability of the available set (and granularity) of senses is still a problem. Application domains exhibit specic behaviors that cannot be fully predicted in advance. Suitable adaptation mechanisms have to be made available to NLP systems to tune existing large scale sense repositories to the practical needs of the target application, such as information extraction or machine translation. In this paper we describe a model of "lexical tuning" {the systematic adaptation of a lexicon to a corpus|that specializes the set of verb senses required for an NLP application, and builds inductively the corresponding lexical descriptions for those senses.

Look-ahead heuristics for the dynamic traveling purchaser problem

by Michele Vindigni and R. Mansini

Computers and Operations Research, Dec 1, 2011

ABSTRACT Given a set of products each with positive discrete demand, and a set of markets selling... more ABSTRACT Given a set of products each with positive discrete demand, and a set of markets selling products at given prices, the traveling purchaser problem (TPP) looks for a tour visiting a subset of markets such that products demand is satisfied at minimum purchasing and traveling costs. In this paper we analyze a dynamic variant of the problem, where quantities may decrease as time goes on. Complete information is assumed on current state of the world, i.e.decision maker knows quantities available for each product in each market at present time and is informed about any consumption event when it occurs. Nevertheless, planner does not have any information on future events. Two groups of heuristics are described and compared. The first group consists of simplified approaches deciding which market to visit next on the basis of some greedy criteria considering only one of the two objective costs. The second one includes heuristics based on a look-ahead approach taking into account both traveling and purchasing costs and inserting some future prediction. Heuristics behavior has been tested on a large set of randomly generated instances under different levels of dynamism.

Integrating Semantic Lexicons and Domain Ontologies

The cross-fertilization between advanced data modeling technologies (as aimed in the Semantic Web... more The cross-fertilization between advanced data modeling technologies (as aimed in the Semantic Web) and NLP is a very interesting research line. In this paper, we investigate a solution of a particular problem in this area: the intregration between a concept hierarchies (DCH) with a general-purpose linguistic knowledge base (LKB). The method we propose relies only on the taxonomical knowledge of

Tuning lexicons to new operational scenarios

In this paper the role of the lexicon within typical applicat ion tasks based on NLP is analysed.... more In this paper the role of the lexicon within typical applicat ion tasks based on NLP is analysed. A large scale semantic lexicon is studied within the framework of a NLP application. The coverage of the lexicon with respect the target domain and a (semi)automatic tuning approach have been evaluated. The impact of a corpus-driven inductive architecture aiming to compensate

Integrating Ontological and Linguistic Knowledge for E-Learning

In computer-aided and Web-based learning a critical role is played by the methods and frameworks ... more In computer-aided and Web-based learning a critical role is played by the methods and frameworks for Information Access. First of all, intelligent methods for retrieval and synthesis of information are crucial for making available timely and correct information during the training process. Moreover, the level of interaction between the target user and the retrieval subsystem constitutes an important quality factor

Purchasing the Web: an Agent based E-retail System with Multilingual Knowledge

The more than enthusiastic success obtained by e-commerce and the continuous growth of the WWW ha... more The more than enthusiastic success obtained by e-commerce and the continuous growth of the WWW has radically changed the way people look for and purchase commercial products. E-retail stores offer any sort of goods through evermore appealing web pages, sometimes even including their own search-engines to help customers find their loved products. With all this great mass of information available,

Adapting a sub-categorization lexicon to a domain

ALINAs: un'architettura multi-layer ad agenti per il supporto alla comunicazione linguistica

Workshop From Objects to Agents, 2002

Si presenta nel seguito ALINAs (Architecture for LINguistic Agents): una architettura multi-layer... more Si presenta nel seguito ALINAs (Architecture for LINguistic Agents): una architettura multi-layer ad agenti specializzata al supporto della comunicazione linguistica. Essa si compone di una infrastruttura nella quale sono implementati i meccanismi di Percezione-Ragionamento-Azione (PRA) di base oltre al supporto per la comunicazione comprendente i performativi più comuni ai diversi linguaggi di comunicazione tra agenti (ACL), e di un layer specifico dedicato alla comunicazione linguistica.

The Stochastic and Dynamic Traveling Purchaser Problem

by Michele Vindigni and R. Mansini

Transportation Science, 2016

Adaptive, Multilingual Named Entity Recognition in Web Pages

European Conference on Artificial Intelligence, 2004

Abstract. Most of the information on the Web today is in the form of HTML documents, which are de... more

Agent to Agent Talk: “Nobody There?” Supporting Agents Linguistic Communication

Whitestein Series in Software Agent Technologies, 2005

ABSTRACT World-Wide Web technologies and the vision of Semantic Web have pushed for adaptive SW a... more ABSTRACT World-Wide Web technologies and the vision of Semantic Web have pushed for adaptive SW applications to scale up information technologies to the Web, where information is organized following different underlying knowledge and/or presentation models. Interoperability among heterogeneous intelligent agents has become an important research topic in the context of distributed information systems. Communication among heterogeneous agents involves several dimensions. “Ontological commitment” on a shared knowledge model cannot be assumed as a default. To overcome this problem, we will describe in this article a communication model that bases on the use of natural language. We will argue on main topics involved in using natural language to achieve semantic agreement in agents communication. The model foresees a strong separation among terms and concepts, this difference being often undervalued in the literature, where terms play the ambiguous role of both concept labels and of communication lexicon. For agents communicating through the language, lexical information embodies instead the possibility to “express” the underlying conceptualizations thus agreeing to a shared representation. We will examine in details the different layers involved in agents communication and we will focus on a the different roles played by each element. A novel agent architecture able to tackle with possible linguistic ambiguities by focusing on the conversational level will be deeply described. Three different agent typologies will be presented: Resource agents, embodying the target knowledge, Service agents, providing basic skills to support complex activities and control agents, supplying the structural knowledge of the task, with coordination and control capabilities. NL communication is supported by two dedicated Service agents: a Mediator, that will handle conceptual mismatches arising during communication, and a Translator, dealing with lexical misalignments due to different languages/idioms.

Ontology integration in a multilingual e-retail system

The advent of e-commerce and the continuous growth of the WWW led to a new generation of e- retai... more The advent of e-commerce and the continuous growth of the WWW led to a new generation of e- retail stores. A number of commercial agent-based systems have been developed to help Internet shoppers decide what to buy and where to buy it from. In such systems, ontologies play a crucial role in supporting the exchange of business data, as they provide a formal vocabulary for the information and unify different views of a domain in a safe cognitive approach. Based on this assumption, inside CROSSMARC (a European research project supporting development of an agent-based multilingual information extraction system from web pages), an ontology architecture has been developed in order to organize the information provided by different resources in several languages. CROSSMARC ontology aims to support all the different activities carried on by the system's agents. The ontological architecture is based on three different layers: (1) a meta-layer that represents the common semantics that will...

Un Pool di Agenti a Supporto della Comunicazione Linguistica

Gli agenti rappresentano un paradigma per la realiz- zazione di applicazioni distribuite su larga... more Gli agenti rappresentano un paradigma per la realiz- zazione di applicazioni distribuite su larga scala che si focalizza sulle interazioni tra processi autonomi ed etero- genei. Il problema della comunicazione di informazioni tra due entitae legato ad una serie di dimensioni diverse per le quali non sempre esiste una standardizzazione "de fac- to" sufficiente a garantire la consistenza dell'informazione scambiata. In questo articolo si analizzano le problem- atiche riguardanti la trasmissione di informazione attraver- so il linguaggio naturale, nell'ipotesi che gli agenti coin- volti possano o meno condividere una stessa ontologia, e quindi si descrive un modello originale di architettura a supporto alla comunicazione basata su un insieme di agenti cooperanti finalizzati alla risoluzione delle specifiche necessit` a emergenti dall'interazione.

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1

by Michele Vindigni and A. Stellato

E-commerce and the continuous growth of the WWW has seen the rising of a new generation of e-reta... more E-commerce and the continuous growth of the WWW has seen the rising of a new generation of e-retail sites. A number of commercia l agent-based systems has been developed to help Internet shoppers decide what to buy and wh ere to buy it from. In such systems, ontologies play a crucial role in supporting the ex change of business data, as they provide a formal vocabulary for the information and unify dif ferent views of a domain in a shared and safe cognitive approach. In CROSSMARC (a European research project supporting development of an agent-based multilingual/multi-domain system for information extraction (IE) from web pages), a knowledge based approach has been combined with machine learning techniques (in particular, wrapper induction based components) in order to design a robust system for extracting information from relevant web sites. In the ever-changing Web framework this hybrid approach supports adaptivity to new emerging concepts and a certain degree of independence from ...

AI/NLP Technologies Applied to Spacecraft Mission Design

Lecture Notes in Computer Science, 2005

In this paper we propose the model of a prototypical NLP architecture of an information access sy... more In this paper we propose the model of a prototypical NLP architecture of an information access system to support a team of experts in a scientific design task, in a shared and heterogeneous framework. Specifically, we believe AI/NLP can be helpful in several tasks, such as the extraction of implicit information needs enclosed in meeting minutes or other documents, analysis of explicit information needs expressed through Natural Language, processing and indexing of document collections, extraction of required information from documents, modeling of a common knowledge base, and, finally, identification of important concepts through the automatic extraction of terms. In particular, we envisioned this architecture in the specific and practical scenario of the Concurrent Design Facility (CDF) of the European Space Agency (ESA), in the framework of the SHUMI project (Support To HUman Machine Interaction) developed in collaboration with the ESA/ESTEC -ACT (Advanced Concept Team).

Corpus-driven unsupervised learning of verb subcategorization frames

Lecture Notes in Computer Science, 1997

Abstract. The behavior of verbs in sublanguages is highly specific and does not follow general pr... more

Agents Based Ontological Mediation in IE Systems

Lecture Notes in Computer Science, 2003

Building more adaptive SW applications is a crucial issue to scale up IE technology to the Web, w... more Building more adaptive SW applications is a crucial issue to scale up IE technology to the Web, where information is organized following different underlying knowledge and/or presentation models. Information agents are more and more being adopted to support extraction of relevant information from semi-structured web sources. To efficiently manage heterogeneous information sources they must be able to cooperate, to share

Integrating ontological and linguistic knowledge for conceptual information extraction

Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003), 2003

In computer-aided and Web-based learning a critical role is played by the methods and frameworks ... more In computer-aided and Web-based learning a critical role is played by the methods and frameworks for Information Access. First of all, intelligent methods for retrieval and synthesis of information are crucial for making available timely and correct information during the training process. Moreover, the level of interaction between the target user and the retrieval subsystem constitutes an important quality factor for the learning process. The abstraction supported by the linguistic generalization adopted in the interaction is an inherent component of the student development: the higher is the linguistic level, the faster and more effective is the training. In order to ensure the proper abstraction level, the retrieval component should make use of capabilities for generalization throughout a number of phases: indexing, retrieval, organisation and presentation. All these tasks require thus an underlying concept-oriented approach usually relying on ontological resources. In particular tasks like indexing and presentation are also faced (in both directions of input/recognition and output/production) with linguistic data: source texts and dialogue/interaction sessions, respectively. In both cases, i.e. text understading and natural language generation, a non trivial process of semantic recognition is involved. All the above implies that strong assumptions about the conceptualisation of the underlying knowledge domain are usually made in e-learnig. However, building domain conceptualisations from scratch is a very complex and timeconsuming task. Traditionally, the reuse of available domain resources, although not constituting always the best, has been applied as an accurate and cost effective solution. This paper presents a method to exploit sources of domain knowledge (e.g. a subject reference system as a controlled language for document indexing and classification) used to build a linguistically motivated domain concept hierarchy. Because in the specific perspective of supporting linguistic inference in Information Extraction and Retrieval (IE/IR) for e-learning, the use of domain taxonomies as ontological resources is not straightforward. We discuss here how a method for integrating the taxonomical domain knowledge and a general-purpose lexical knowledge base (like WordNet) can be used for improving the accuracy and flexibility of IE.

Multi-lingual XML-Based Named Entity

We describe the multi-lingual Named Entity Recognition and Classification (NERC) subpart of an in... more We describe the multi-lingual Named Entity Recognition and Classification (NERC) subpart of an information extraction system, which is currently under development as part of the EU-funded project CROSSMARC. The two main CROSSMARC goals are to develop commercial-strength technologies based on language processing methodologies for information extraction from web pages and to provide automated techniques for efficient customisation i.e. extension of the system to new product domains and languages. To achieve our goals we use XML as a common exchange format, we exploit a common ontology and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages, which contain heavily structured data where text is intermingled with HTML and other code. Our evaluation results demonstrate the viability of our approach.

I rischi e la gestione della sicurezza informatica