Papers by Rodolfo Delmonte
Studies in Computational Intelligence, 2011
In this paper we argue in favour of an integration between statistically and syntactically based ... more In this paper we argue in favour of an integration between statistically and syntactically based parsing, where syntax is intended in terms of shallow parsing with elementary trees. None of the statistically based analyses produce an accuracy level comparable to the one obtained by means of linguistic rules (1). Of course their data are strictly referred to English, with the
Emanuele Pianta -IRST -Povo (Trento)
In this paper we argue in favour of an integration between statistically and syntactically based ... more In this paper we argue in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian. Most papers present approaches on tagging which are statistically based. None of the statistically based analyses, however, produce an accuracy level comparable to the one obtained by means of linguistic rules [1]. Of course their data are strictly referred to English, with the exception of [2, 3, 4]. As to Italian, we argue that purely statistically based approaches are inefficient basically due to great sparsity of tag distribution -50% or less of unambiguous tags when punctuation is subtracted from the total count. In addition, the level of homography is also very high: readings per word are 1.7 compared to 1.07 computed for English by [2] with a similar tagset. The current work includes a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora. In a preliminary experiment we made with automatic tagger, we obtained 99,97% accuracy in the training set and 99,03% in the test set using combined approaches: data derived from statistical tagging is well below 95% even when referred to the training set, and the same applies to syntactic tagging. As to the shallow parser and GF-assigner we shall report on a first preliminary experiment on a manually verified subset made of 10,000 words.
Procesamiento Del Lenguaje Natural, Sep 1, 2005
The system for semantic evaluation VENSES (Venice Semantic Evaluation System) is organized as a p... more The system for semantic evaluation VENSES (Venice Semantic Evaluation System) is organized as a pipeline of two subsystems: the first is a reduced version of GETARUN, our system for Text Understanding. The output of the system is a flat list of head-dependent structures (HDS) with Grammatical Relations (GRs) and Semantic Roles (SRs) labels. The evaluation system is made up of two main modules: the first is a sequence of linguistic rule-based subcalls; the second is a quantitatively based measurement of input structures. VENSES measures semantic similarity which may range from identical linguistic items, to synonymous or just morphologically derivable. Both modules go through General Consistency checks which are targeted to high level semantic attributes like presence of modality, negation, and opacity operators, temporal and spatial location checks. Results in cws, accuracy and precision are homogenoues for both training and test corpus and fare higher than 60%.
Ldv, 2000
We implemented in our parser four parsing strategies that obey LFG grammaticality conditions and ... more We implemented in our parser four parsing strategies that obey LFG grammaticality conditions and follow the hypothesis that knowledge of language is used in a "modular" fashion. The parsing strategies are the following: Minimal Attachment (MA), Functional Preference (FP), Semantic Evaluation (SE), Referential Individuation (RI). From the way in which we experimented with them in our implementation it appears that they are strongly interwoven. In particular, MA is dependent upon FP to satisfy argument/ function interpretation principles; with semantically biased sentences, MA, FP and SE apply in hierarchical order to license a phrase as argument or adjunct. RI is required and activated every time a singular definite NP has to be computed and is dependent upon the presence of a discourse model. The parser shows garden path effects and concurrently produces a processing breakdown which is linguistically motivated. Our parser is a DCG is implemented in Prolog and obeys a topdown depth-first deterministic parsing policy.
Nous présentons des données préliminaires pour un détecteur-correcteur d'erreurs orthographiques ... more Nous présentons des données préliminaires pour un détecteur-correcteur d'erreurs orthographiques et grammaticales automatique en italien, une langue riche en morphologie. Nous affirmons que la morphologie joue un rôle fondamental dans la création de ces outils et nous basons notre hypothèse sur le résultat de l'analyse d'un corpus d'italien contemporain qui s'élève environ à 1 million de mots. L'analyse a été générée en différentes phases, que nous examinerons en détail, avec un analyseur morphosyntaxique qui s'appelle Immortal (cfr. Delmonte, Pianta 1996). Ce système produit à la fin un etiquéttage lexical avec 100 étiquettes, un nombre proche de ceux qui produits pour les autres langues. Le système génère automatiquement une lemmatisation pour chaque mot et il a un correcteur orthographique et grammatical. Les processus de correction automatique sont basés sur la reconnaisance des morphèmes et de la structure syllabique de l'italien; il travail à partir d'une base de données de 4000 erreurs réelles detectées dans l'analyse du corpus et à partir d'autre sources qui ont étés classées de façon à pouvoir détecter ce type d'erreurs. Afin de produir des candidats utilisables -plutôt que de solutions fausses -pour corriger les mots erronés, il nous a semblé que le découpage morphologique et syllabique constituait une solution faisable. La correction grammaticale a été dévelopée en augmentant une grammaire libre du contexte représentée en forme d'un RTR.
Proceedings of the Acl 2011 Workshop on Relational Models of Semantics, Jun 23, 2011
ABSTRACT In this paper, we address the issue of automatically identifying null instantiated argum... more ABSTRACT In this paper, we address the issue of automatically identifying null instantiated arguments in text. We refer to Fillmore's theory of pragmatically controlled zero anaphora (Fillmore, 1986), which accounts for the phenomenon of omissible arguments using a lexically-based approach, and we propose a strategy for identifying implicit arguments in a text and finding their antecedents, given the overtly expressed semantic roles in the form of frame elements. To this purpose, we primarily rely on linguistic knowledge enriched with role frequency information collected from a training corpus. We evaluate our approach using the test set developed for the SemEval task 10 and we highlight some issues of our approach. Besides, we also point out some open problems related to the task definition and to the general phenomenon of null instantiated arguments, which needs to be better investigated and described in order to be captured from a computational point of view.
summarization of conversations is a very challenging task that requires full understanding of the... more summarization of conversations is a very challenging task that requires full understanding of the dialog turns, their roles and relationships in the conversations. We present an efficient system, derived from a full-fledged text analysis system, that performs the necessary linguistic analysis of turns in conversations and provides useful argumentative labels to build synthetic abstractive summaries.
In this paper we will present work carried out to scale up the system for text understanding call... more In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkely project. In a final section we present preliminary evaluation of
The VENEX corpus is a corpus of Italian annotated with information about anaphora and deixis, cre... more The VENEX corpus is a corpus of Italian annotated with information about anaphora and deixis, created in a joint project between the Università di Venezia and the University of Essex. The corpus includes both texts (articles from a financial newspaper) and dialogues (an Italian version of the MapTask corpus). The annotation scheme is an almost complete implementation of the scheme proposed in MATE, and the markup scheme is the simplified form of standoff adopted in the MMAX annotation tool.
Conference of the European Chapter of the Association for Computational Linguistics, 1985
A recognition grammar to supply information to a text-to-speech system for the synthesis of Itali... more A recognition grammar to supply information to a text-to-speech system for the synthesis of Italian must rely heavily upon lexical information, in order to instantiate the appropriate grammatical relations.Italian is an almost free word order language which nonetheless adopts fairly analysable strategies to move major constituents: some of these can strongly affect the functioning of the phonological component. Two basic
Wecreated an application specialized in prosodic tutoring, called the Prosodic Module(PM). The PM... more Wecreated an application specialized in prosodic tutoring, called the Prosodic Module(PM). The PM is composed,of two different sets of Learning Activities, the first one dealing with prosodic problems at word syllabic level, the second one dealing with prosodic problems ,at phonological ,phrase and utterance level. The PM is able to detect significant deviations from a master's word/ phrase/ utterance
Uploads
Papers by Rodolfo Delmonte