The size of digital libraries is increasing, making navigation and access to information more cha... more The size of digital libraries is increasing, making navigation and access to information more challenging. Improving the system by observing the users' activities can help at providing better services to users of very large digital libraries. In this paper we explain how the Invenio open-source software, used by the CERN Document Server (CDS) allows fine grained logging of user behavior. In the first phase, the sequence of actions performed by users of CDS is captured, while in the second phase statistical data is calculated offline. This paper explains these two steps and the results. Although the analyzed system focuses on the high energy physics literature, the process could be applicable to other scientific communities, with and international, large user base.
Applications of Natural Language to Data Bases, 2003
This paper describes an integrated system that enables the storage and retrieval of meeting trans... more This paper describes an integrated system that enables the storage and retrieval of meeting transcripts (e.g. staff meetings). The system gives users who have not attended a meeting, or who want to review a particular point, enhanced access to an annotated version of the recorded data. This paper describes the various stages in the processing, storage and query of the data. First, we put forward the idea of shallow dialogue processing, in order to extract significant features of the meeting transcriptions for storage in a database, whose structure is briefly outlined. Low-level access to the database is provided as a Web service, which can be connected to several interfaces. A description of how multimodal input can be used with VoiceXML is also provided, thus offering an easy solution for voice and web based access to the dialogue data. The paper ends with considerations about the available data and its use in the current version of the system.
This paper reports on the main issues arisen during the development and test of a coding scheme f... more This paper reports on the main issues arisen during the development and test of a coding scheme for the argumentative annotation of meeting discussions. A corpus of meeting discussions has been collected in the framework of a research project on multimodal dialogue analysis and a coding scheme has been proposed. Annotations have been gathered by a set of annotators with different skills in argumentative discourse analysis and the reliability of the coding schema has been assessed against standard statistical measures.
People are increasingly using provider services through the Internet. While a web site provides i... more People are increasingly using provider services through the Internet. While a web site provides information about the contract terms and conditions that the clients have to assent to in order to use its services, in web services there is no such way for taking legal issues into account. There are some attempts to build machine readable eContract languages that can be used to express the contractual terms between the participants but they are mainly designed to govern the distribution and use of electronic content. We propose an architecture for the definition of and assent to eContracts for Web Services.
Joint Conference on Knowledge-Based Software Engineering, Jun 29, 2008
ABSTRACT Document ranking for scientific publications involves a variety of specialized resources... more ABSTRACT Document ranking for scientific publications involves a variety of specialized resources (e.g. author or citation indexes) that are usually difficult to use within standard general purpose search engines that usually operate on large-scale heterogeneous document collections for which the required specialized resources are not always available for all the documents present in the collections. Integrating such resources into specialized information retrieval engines is therefore important to cope with community-specific user expectations that strongly influence the perception of relevance within the considered community. In this perspective, this paper extends the notion of ranking with various methods exploiting different types of bibliographic knowledge that represent a crucial resource for measuring the relevance of scientific publications. In our work, we experimentally evaluated the adequacy of two such ranking methods (one based on freshness, i.e. the publication date, and the other on a novel index, the download-Hirsch index, based on download frequencies) for information retrieval from the CERN scientific publication database in the domain of particle physics. Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments wi Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments with the specialized ranking method based on the download-Hirsch index to further improve the performance of our aggregative approach.
This paper presents a text classification procedure that has been developed in the context of an ... more This paper presents a text classification procedure that has been developed in the context of an information extraction project. In the prototype that has been developed for this project, newspaper advertisements are processed by three main modules: first of all, a classification module associates a category to the advertisement. Then, a tagging module identifies textual information units that are related to the associated category, and finally a predefined form for that category is filled with the tagged text. The classification module, which is the main focus of this paper, consists in using a naive Bayes classifier and, at the same time, trying to fill all the predefined forms associated with all categories. Results of both methods (classification probabilities and filling scores) are then combined to provide a final classification decision. This mixed classification method is described and evaluated on the basis of concrete experiments carried out on real data. The purpose of the presented experiments is to precisely evaluate the impact of the information extraction step on classification accuracy. As one could reasonably expect, classification relying on information extraction alone doesn't perform very well but when used as a complement to the statistical approach it significantly improves the classification results.
This paper compares two techniques for robust parsing of extragrammatical natural language. Both ... more This paper compares two techniques for robust parsing of extragrammatical natural language. Both are based on well-known approaches; one selects the optimal combination of partial analyses, the other relaxes grammar rules. Both techniques use a stochastic parser to select the "best" solution among multiple analyses. Experimental results show that regardless of the grammar, the best results are obtained by sequentially combining the two techniques, by first relaxing the rules and only when that fails by then selecting a combination of partial analyses.
Recent Advances in Natural Language Processing, 2001
Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochasti... more Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 92), has proven to be NP-hard in the most general case (Sima'an 96a). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to interesting DOP instances with polynomial time-complexity. The purpose of this paper is to present such an instance, based on the minimal-maximal selection principle, and to evaluate its performances on two different corpora.
In this paper we report an experiment of an automated metric used to analyze the grammaticality o... more In this paper we report an experiment of an automated metric used to analyze the grammaticality of machine translation output. The approach (Rajman, Hartley, 2001) is based on the distribution of the linguistic information within a translated text, which is supposed similar between a learning corpus and the translation. This method is quite inexpensive, since it does not need any reference translation. First we describe the experimental method and the different tests we used. Then we show the promising results we obtained on the CESTA 1 data, and how they correlate well with human judgments.
... BibTeX | Add To MetaCart. @MISC{Chappelier01parsingdop, author = {Jean-Cedric Chappelier and ... more ... BibTeX | Add To MetaCart. @MISC{Chappelier01parsingdop, author = {Jean-Cedric Chappelier and Martin Rajman}, title = { Parsing DOP with Monte-Carlo Techniques}, year = {2001} }. ... 5, A property of the multinomial distribution – Kesten, Morse - 1959. 3, Parsing Inside-Out. ...
This paper gives an overview of the assessment and evaluation methods which have been used to det... more This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The performance of these components is first assessed individually, and then the entire system is evaluated in an interaction experiment with test users. Initial results of the assessment and evaluation are given, in particular with respect to the transmission channel impact on speech and speaker recognition, and the assessment of speech output for different system metaphors.
In earlier work, we succeeded in automatically predicting the relative rankings of MT systems der... more In earlier work, we succeeded in automatically predicting the relative rankings of MT systems derived from human judgments on the Fluency, Adequacy or Informativeness of their output. In this paper, we present an experiment-using human evaluators and additional data-designed to test the robustness of our earlier results. These had yielded two promising automatically computable predictors, the D-score based on semantic features of the MT output, and the X-score based on syntactic features. We conclude that the X-score is indeed a robust and reliable predictor, even on new data for which it has not been specifically tuned.
In this paper, we report on the results of a full-size evaluation campaign of various MT systems.... more In this paper, we report on the results of a full-size evaluation campaign of various MT systems. This campaign is novel compared to the classical DARPA/NIST MT evaluation campaigns in the sense that French is the target language, and that it includes an experiment of meta-evaluation of various metrics claiming to better predict different attributes of translation quality. We first describe the campaign, its context, its protocol and the data we used. Then we summarise the results obtained by the participating systems and discuss the meta-evaluation of the metrics used.
Abstract: The goal of this contribution is to present how the notion of dialogue management evalu... more Abstract: The goal of this contribution is to present how the notion of dialogue management evaluation was integrated in the rapid dialogue prototyping methodology (RDPM), designed and experimented by the authors in the framework of the InfoVox project. We first describe the proposed RDPM. The general idea of this methodology is to produce, for any given application, a quickly deployable dialogue-driven interface and to improve this interface through an iterative process based on Wizard-of-Oz experiments (ie dialogue simulations) ...
The size of digital libraries is increasing, making navigation and access to information more cha... more The size of digital libraries is increasing, making navigation and access to information more challenging. Improving the system by observing the users' activities can help at providing better services to users of very large digital libraries. In this paper we explain how the Invenio open-source software, used by the CERN Document Server (CDS) allows fine grained logging of user behavior. In the first phase, the sequence of actions performed by users of CDS is captured, while in the second phase statistical data is calculated offline. This paper explains these two steps and the results. Although the analyzed system focuses on the high energy physics literature, the process could be applicable to other scientific communities, with and international, large user base.
Applications of Natural Language to Data Bases, 2003
This paper describes an integrated system that enables the storage and retrieval of meeting trans... more This paper describes an integrated system that enables the storage and retrieval of meeting transcripts (e.g. staff meetings). The system gives users who have not attended a meeting, or who want to review a particular point, enhanced access to an annotated version of the recorded data. This paper describes the various stages in the processing, storage and query of the data. First, we put forward the idea of shallow dialogue processing, in order to extract significant features of the meeting transcriptions for storage in a database, whose structure is briefly outlined. Low-level access to the database is provided as a Web service, which can be connected to several interfaces. A description of how multimodal input can be used with VoiceXML is also provided, thus offering an easy solution for voice and web based access to the dialogue data. The paper ends with considerations about the available data and its use in the current version of the system.
This paper reports on the main issues arisen during the development and test of a coding scheme f... more This paper reports on the main issues arisen during the development and test of a coding scheme for the argumentative annotation of meeting discussions. A corpus of meeting discussions has been collected in the framework of a research project on multimodal dialogue analysis and a coding scheme has been proposed. Annotations have been gathered by a set of annotators with different skills in argumentative discourse analysis and the reliability of the coding schema has been assessed against standard statistical measures.
People are increasingly using provider services through the Internet. While a web site provides i... more People are increasingly using provider services through the Internet. While a web site provides information about the contract terms and conditions that the clients have to assent to in order to use its services, in web services there is no such way for taking legal issues into account. There are some attempts to build machine readable eContract languages that can be used to express the contractual terms between the participants but they are mainly designed to govern the distribution and use of electronic content. We propose an architecture for the definition of and assent to eContracts for Web Services.
Joint Conference on Knowledge-Based Software Engineering, Jun 29, 2008
ABSTRACT Document ranking for scientific publications involves a variety of specialized resources... more ABSTRACT Document ranking for scientific publications involves a variety of specialized resources (e.g. author or citation indexes) that are usually difficult to use within standard general purpose search engines that usually operate on large-scale heterogeneous document collections for which the required specialized resources are not always available for all the documents present in the collections. Integrating such resources into specialized information retrieval engines is therefore important to cope with community-specific user expectations that strongly influence the perception of relevance within the considered community. In this perspective, this paper extends the notion of ranking with various methods exploiting different types of bibliographic knowledge that represent a crucial resource for measuring the relevance of scientific publications. In our work, we experimentally evaluated the adequacy of two such ranking methods (one based on freshness, i.e. the publication date, and the other on a novel index, the download-Hirsch index, based on download frequencies) for information retrieval from the CERN scientific publication database in the domain of particle physics. Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments wi Our experiments show that (i) the considered specialized ranking methods indeed represent promising candidates for extending the base line ranking (relying on the download frequency), as they both lead to fairly small search result overlaps; and (ii) that extending the base line ranking with the specialized ranking method based on freshness significantly improves the quality of the retrieval: 16.2% of relative increase for the Mean Reciprocal Rank (resp. 5.1% of relative increase for the Success@10, i.e. the estimated probability of finding at least one relevant document among the top ten retrieved) when a local rank sum is used for aggregation. We plan to further validate the presented results by carrying out additional experiments with the specialized ranking method based on the download-Hirsch index to further improve the performance of our aggregative approach.
This paper presents a text classification procedure that has been developed in the context of an ... more This paper presents a text classification procedure that has been developed in the context of an information extraction project. In the prototype that has been developed for this project, newspaper advertisements are processed by three main modules: first of all, a classification module associates a category to the advertisement. Then, a tagging module identifies textual information units that are related to the associated category, and finally a predefined form for that category is filled with the tagged text. The classification module, which is the main focus of this paper, consists in using a naive Bayes classifier and, at the same time, trying to fill all the predefined forms associated with all categories. Results of both methods (classification probabilities and filling scores) are then combined to provide a final classification decision. This mixed classification method is described and evaluated on the basis of concrete experiments carried out on real data. The purpose of the presented experiments is to precisely evaluate the impact of the information extraction step on classification accuracy. As one could reasonably expect, classification relying on information extraction alone doesn't perform very well but when used as a complement to the statistical approach it significantly improves the classification results.
This paper compares two techniques for robust parsing of extragrammatical natural language. Both ... more This paper compares two techniques for robust parsing of extragrammatical natural language. Both are based on well-known approaches; one selects the optimal combination of partial analyses, the other relaxes grammar rules. Both techniques use a stochastic parser to select the "best" solution among multiple analyses. Experimental results show that regardless of the grammar, the best results are obtained by sequentially combining the two techniques, by first relaxing the rules and only when that fails by then selecting a combination of partial analyses.
Recent Advances in Natural Language Processing, 2001
Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochasti... more Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 92), has proven to be NP-hard in the most general case (Sima'an 96a). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to interesting DOP instances with polynomial time-complexity. The purpose of this paper is to present such an instance, based on the minimal-maximal selection principle, and to evaluate its performances on two different corpora.
In this paper we report an experiment of an automated metric used to analyze the grammaticality o... more In this paper we report an experiment of an automated metric used to analyze the grammaticality of machine translation output. The approach (Rajman, Hartley, 2001) is based on the distribution of the linguistic information within a translated text, which is supposed similar between a learning corpus and the translation. This method is quite inexpensive, since it does not need any reference translation. First we describe the experimental method and the different tests we used. Then we show the promising results we obtained on the CESTA 1 data, and how they correlate well with human judgments.
... BibTeX | Add To MetaCart. @MISC{Chappelier01parsingdop, author = {Jean-Cedric Chappelier and ... more ... BibTeX | Add To MetaCart. @MISC{Chappelier01parsingdop, author = {Jean-Cedric Chappelier and Martin Rajman}, title = { Parsing DOP with Monte-Carlo Techniques}, year = {2001} }. ... 5, A property of the multinomial distribution – Kesten, Morse - 1959. 3, Parsing Inside-Out. ...
This paper gives an overview of the assessment and evaluation methods which have been used to det... more This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The performance of these components is first assessed individually, and then the entire system is evaluated in an interaction experiment with test users. Initial results of the assessment and evaluation are given, in particular with respect to the transmission channel impact on speech and speaker recognition, and the assessment of speech output for different system metaphors.
In earlier work, we succeeded in automatically predicting the relative rankings of MT systems der... more In earlier work, we succeeded in automatically predicting the relative rankings of MT systems derived from human judgments on the Fluency, Adequacy or Informativeness of their output. In this paper, we present an experiment-using human evaluators and additional data-designed to test the robustness of our earlier results. These had yielded two promising automatically computable predictors, the D-score based on semantic features of the MT output, and the X-score based on syntactic features. We conclude that the X-score is indeed a robust and reliable predictor, even on new data for which it has not been specifically tuned.
In this paper, we report on the results of a full-size evaluation campaign of various MT systems.... more In this paper, we report on the results of a full-size evaluation campaign of various MT systems. This campaign is novel compared to the classical DARPA/NIST MT evaluation campaigns in the sense that French is the target language, and that it includes an experiment of meta-evaluation of various metrics claiming to better predict different attributes of translation quality. We first describe the campaign, its context, its protocol and the data we used. Then we summarise the results obtained by the participating systems and discuss the meta-evaluation of the metrics used.
Abstract: The goal of this contribution is to present how the notion of dialogue management evalu... more Abstract: The goal of this contribution is to present how the notion of dialogue management evaluation was integrated in the rapid dialogue prototyping methodology (RDPM), designed and experimented by the authors in the framework of the InfoVox project. We first describe the proposed RDPM. The general idea of this methodology is to produce, for any given application, a quickly deployable dialogue-driven interface and to improve this interface through an iterative process based on Wizard-of-Oz experiments (ie dialogue simulations) ...
Uploads
Papers by Martin Rajman