Rodolfo Delmonte

Followers

Following

Co-authors

Mentions

Public Views

Big Data are not enough for language analytics. I have better ideas on how it works and they are the result of some 30 years linguistic research and over 200 publications.

less

Interests

Uploads

Papers by Rodolfo Delmonte

Linguistically-Based Reranking of Google’s Snippets with GreG

Studies in Computational Intelligence, 2011

Download

Deep Linguistic Processing with GETARUNS for spoken dialogue understanding

ROMAND 2004 Workshop on Robust Methods in Analysis of Natural Language Data

by Vincenzo Pallotta and Rodolfo Delmonte

Download

parsing Italian with a context-free recognizer

by Rodolfo Delmonte and Roberto Dolci

Download

Computing linguistic knolwledge for text-to-speech systems with PROSO

by Roberto Dolci and Rodolfo Delmonte

Download

Elementary Trees for Syntactic and Statistical Disambiguation

In this paper we argue in favour of an integration between statistically and syntactically based ... more In this paper we argue in favour of an integration between statistically and syntactically based parsing, where syntax is intended in terms of shallow parsing with elementary trees. None of the statistically based analyses produce an accuracy level comparable to the one obtained by means of linguistic rules (1). Of course their data are strictly referred to English, with the

Download

IMMORTALE-Analizzatore Morfologico, Tagger e Lemmatizzatore per l''Italiano

Emanuele Pianta -IRST -Povo (Trento)

Download

Linguistically Based Qa by Dynamic Lod Access from Logical Forms

Shallow parsing and functional structure in Italian corpora

In this paper we argue in favour of an integration between statistically and syntactically based ... more In this paper we argue in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian. Most papers present approaches on tagging which are statistically based. None of the statistically based analyses, however, produce an accuracy level comparable to the one obtained by means of linguistic rules [1]. Of course their data are strictly referred to English, with the exception of [2, 3, 4]. As to Italian, we argue that purely statistically based approaches are inefficient basically due to great sparsity of tag distribution -50% or less of unambiguous tags when punctuation is subtracted from the total count. In addition, the level of homography is also very high: readings per word are 1.7 compared to 1.07 computed for English by [2] with a similar tagset. The current work includes a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora. In a preliminary experiment we made with automatic tagger, we obtained 99,97% accuracy in the training set and 99,03% in the test set using combined approaches: data derived from statistical tagging is well below 95% even when referred to the training set, and the same applies to syntactic tagging. As to the shallow parser and GF-assigner we shall report on a first preliminary experiment on a manually verified subset made of 10,000 words.

Download

VENSES, a linguistically-based system for semantic evaluation

Procesamiento Del Lenguaje Natural, Sep 1, 2005

The system for semantic evaluation VENSES (Venice Semantic Evaluation System) is organized as a p... more The system for semantic evaluation VENSES (Venice Semantic Evaluation System) is organized as a pipeline of two subsystems: the first is a reduced version of GETARUN, our system for Text Understanding. The output of the system is a flat list of head-dependent structures (HDS) with Grammatical Relations (GRs) and Semantic Roles (SRs) labels. The evaluation system is made up of two main modules: the first is a sequence of linguistic rule-based subcalls; the second is a quantitatively based measurement of input structures. VENSES measures semantic similarity which may range from identical linguistic items, to synonymous or just morphologically derivable. Both modules go through General Consistency checks which are targeted to high level semantic attributes like presence of modality, negation, and opacity operators, temporal and spatial location checks. Results in cws, accuracy and precision are homogenoues for both training and test corpus and fare higher than 60%.

Download

Computing linguistic knowledge for text-to-speech systems with PROSO

Interspeech, 1991

Parsing Preferences and Linguistic Strategies

Ldv, 2000

We implemented in our parser four parsing strategies that obey LFG grammaticality conditions and ... more We implemented in our parser four parsing strategies that obey LFG grammaticality conditions and follow the hypothesis that knowledge of language is used in a "modular" fashion. The parsing strategies are the following: Minimal Attachment (MA), Functional Preference (FP), Semantic Evaluation (SE), Referential Individuation (RI). From the way in which we experimented with them in our implementation it appears that they are strongly interwoven. In particular, MA is dependent upon FP to satisfy argument/ function interpretation principles; with semantically biased sentences, MA, FP and SE apply in hierarchical order to license a phrase as argument or adjunct. RI is required and activated every time a singular definite NP has to be computed and is dependent upon the presence of a discourse model. The parser shows garden path effects and concurrently produces a processing breakdown which is linguistically motivated. Our parser is a DCG is implemented in Prolog and obeys a topdown depth-first deterministic parsing policy.

Download

Immortal: How to Detect Misspelled from Unknown Words

Nous présentons des données préliminaires pour un détecteur-correcteur d'erreurs orthographiques ... more Nous présentons des données préliminaires pour un détecteur-correcteur d'erreurs orthographiques et grammaticales automatique en italien, une langue riche en morphologie. Nous affirmons que la morphologie joue un rôle fondamental dans la création de ces outils et nous basons notre hypothèse sur le résultat de l'analyse d'un corpus d'italien contemporain qui s'élève environ à 1 million de mots. L'analyse a été générée en différentes phases, que nous examinerons en détail, avec un analyseur morphosyntaxique qui s'appelle Immortal (cfr. Delmonte, Pianta 1996). Ce système produit à la fin un etiquéttage lexical avec 100 étiquettes, un nombre proche de ceux qui produits pour les autres langues. Le système génère automatiquement une lemmatisation pour chaque mot et il a un correcteur orthographique et grammatical. Les processus de correction automatique sont basés sur la reconnaisance des morphèmes et de la structure syllabique de l'italien; il travail à partir d'une base de données de 4000 erreurs réelles detectées dans l'analyse du corpus et à partir d'autre sources qui ont étés classées de façon à pouvoir détecter ce type d'erreurs. Afin de produir des candidats utilisables -plutôt que de solutions fausses -pour corriger les mots erronés, il nous a semblé que le découpage morphologique et syllabique constituait une solution faisable. La correction grammaticale a été dévelopée en augmentant une grammaire libre du contexte représentée en forme d'un RTR.

Download

Desperately seeking implicit arguments in text

Proceedings of the Acl 2011 Workshop on Relational Models of Semantics, Jun 23, 2011

ABSTRACT In this paper, we address the issue of automatically identifying null instantiated argum... more ABSTRACT In this paper, we address the issue of automatically identifying null instantiated arguments in text. We refer to Fillmore&#39;s theory of pragmatically controlled zero anaphora (Fillmore, 1986), which accounts for the phenomenon of omissible arguments using a lexically-based approach, and we propose a strategy for identifying implicit arguments in a text and finding their antecedents, given the overtly expressed semantic roles in the form of frame elements. To this purpose, we primarily rely on linguistic knowledge enriched with role frequency information collected from a training corpus. We evaluate our approach using the test set developed for the SemEval task 10 and we highlight some issues of our approach. Besides, we also point out some open problems related to the task definition and to the general phenomenon of null instantiated arguments, which needs to be better investigated and described in order to be captured from a computational point of view.

Abstractive Summarization of Voice Communications

summarization of conversations is a very challenging task that requires full understanding of the... more summarization of conversations is a very challenging task that requires full understanding of the dialog turns, their roles and relationships in the conversations. We present an efficient system, derived from a full-fledged text analysis system, that performs the necessary linguistic analysis of turns in conversations and provides useful argumentative labels to build synthetic abstractive summaries.

Download

Scaling up a NLU system from text to dialogue understanding

In this paper we will present work carried out to scale up the system for text understanding call... more In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkely project. In a final section we present preliminary evaluation of

Download

Deep Linguistic Processing with GETARUNS for Spoken Dialogue Understanding

Lrec, 2010

The VENEX corpus of anaphora and deixis in spoken and written Italian

The VENEX corpus is a corpus of Italian annotated with information about anaphora and deixis, cre... more The VENEX corpus is a corpus of Italian annotated with information about anaphora and deixis, created in a joint project between the Università di Venezia and the University of Essex. The corpus includes both texts (articles from a financial newspaper) and dialogues (an Italian version of the MapTask corpus). The annotation scheme is an almost complete implementation of the scheme proposed in MATE, and the markup scheme is the simplified form of standoff adopted in the MMAX annotation tool.

Download

Parsing Defficulties & Phonological Processing in Italian

Conference of the European Chapter of the Association for Computational Linguistics, 1985

A recognition grammar to supply information to a text-to-speech system for the synthesis of Itali... more A recognition grammar to supply information to a text-to-speech system for the synthesis of Italian must rely heavily upon lexical information, in order to instantiate the appropriate grammatical relations.Italian is an almost free word order language which nonetheless adopts fairly analysable strategies to move major constituents: some of these can strongly affect the functioning of the phonological component. Two basic

A Prosodic Module for Self-Learning Activities

Wecreated an application specialized in prosodic tutoring, called the Prosodic Module(PM). The PM... more Wecreated an application specialized in prosodic tutoring, called the Prosodic Module(PM). The PM is composed,of two different sets of Learning Activities, the first one dealing with prosodic problems at word syllabic level, the second one dealing with prosodic problems ,at phonological ,phrase and utterance level. The PM is able to detect significant deviations from a master's word/ phrase/ utterance

Download

Linguistically-Based Reranking of Google’s Snippets with GreG

Studies in Computational Intelligence, 2011

Download

Deep Linguistic Processing with GETARUNS for spoken dialogue understanding

ROMAND 2004 Workshop on Robust Methods in Analysis of Natural Language Data

by Vincenzo Pallotta and Rodolfo Delmonte

Download

parsing Italian with a context-free recognizer

by Rodolfo Delmonte and Roberto Dolci

Download

Computing linguistic knolwledge for text-to-speech systems with PROSO

by Roberto Dolci and Rodolfo Delmonte

Download

Elementary Trees for Syntactic and Statistical Disambiguation

In this paper we argue in favour of an integration between statistically and syntactically based ... more In this paper we argue in favour of an integration between statistically and syntactically based parsing, where syntax is intended in terms of shallow parsing with elementary trees. None of the statistically based analyses produce an accuracy level comparable to the one obtained by means of linguistic rules (1). Of course their data are strictly referred to English, with the

Download

IMMORTALE-Analizzatore Morfologico, Tagger e Lemmatizzatore per l''Italiano

Emanuele Pianta -IRST -Povo (Trento)

Download

Linguistically Based Qa by Dynamic Lod Access from Logical Forms

Shallow parsing and functional structure in Italian corpora

In this paper we argue in favour of an integration between statistically and syntactically based ... more In this paper we argue in favour of an integration between statistically and syntactically based parsing by presenting data from a study of a 500,000 word corpus of Italian. Most papers present approaches on tagging which are statistically based. None of the statistically based analyses, however, produce an accuracy level comparable to the one obtained by means of linguistic rules [1]. Of course their data are strictly referred to English, with the exception of [2, 3, 4]. As to Italian, we argue that purely statistically based approaches are inefficient basically due to great sparsity of tag distribution -50% or less of unambiguous tags when punctuation is subtracted from the total count. In addition, the level of homography is also very high: readings per word are 1.7 compared to 1.07 computed for English by [2] with a similar tagset. The current work includes a syntactic shallow parser and a ATN-like grammatical function assigner that automatically classifies previously manually verified tagged corpora. In a preliminary experiment we made with automatic tagger, we obtained 99,97% accuracy in the training set and 99,03% in the test set using combined approaches: data derived from statistical tagging is well below 95% even when referred to the training set, and the same applies to syntactic tagging. As to the shallow parser and GF-assigner we shall report on a first preliminary experiment on a manually verified subset made of 10,000 words.

Download

VENSES, a linguistically-based system for semantic evaluation

Procesamiento Del Lenguaje Natural, Sep 1, 2005

Download

Computing linguistic knowledge for text-to-speech systems with PROSO

Interspeech, 1991