Papers by Stan Szpakowicz
The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpa... more The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpakowicz, Preslav Nakov and Diarmuid \'O S\'eaghdha) will be published by Morgan & Claypool. A new Chapter 5 of the book discusses relation classification/extraction in the deep-learning paradigm which arose after the first edition appeared. This is a preview of Chapter 5, made public by the kind permission of Morgan & Claypool.
The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpa... more The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpakowicz, Preslav Nakov and Diarmuid \'O S\'eaghdha) will be published by Morgan & Claypool. A new Chapter 5 of the book discusses relation classification/extraction in the deep-learning paradigm which arose after the first edition appeared. This is a preview of Chapter 5, made public by the kind permission of Morgan & Claypool.
We present an approach to Computer-Assisted Assessment of free-text material based on symbolic an... more We present an approach to Computer-Assisted Assessment of free-text material based on symbolic analysis of student input. The theory that underlies this approach arises from previous work on DidaLect, a tutoring system for second-language reading skill enhancement. The theory enables the processing of free-text segments for assessment to operate without pre-encoded reference material. A study based on a corpus of 48 student answers to several types of questions has justified our approach, helped define a methodology and design a prototype. Preliminaries In the field of Computer-Assisted Assessment (CAA), automated processing of free-text material received from students is becoming a necessity. The range of such material may run from single sentences to whole essays. Even as seemingly small a problem as student answers to open-ended questions poses a variety of serious Natural Language Processing (NLP) challenges. It
Previous research has shown that the meaning of many noun-noun compounds N1 N2 can be approximate... more Previous research has shown that the meaning of many noun-noun compounds N1 N2 can be approximated reasonably well by paraphrasing clauses of the form ‘N2 that... N1’, where ‘... ’ stands for a verb with or without a preposition. For example, malaria mosquito is a ‘mosquito that carries malaria’. Evaluating the quality of such paraphrases is the theme of Task 9 at SemEval-2010. This paper describes some background, the task definition, the process of data collection and the task results. We also venture a few general conclusions before the participating teams present their systems at the SemEval-2010 workshop. There were 5 teams who submitted 7 systems. 1
We present a brief overview of the main challenges in understanding the semantics of noun compoun... more We present a brief overview of the main challenges in understanding the semantics of noun compounds and consider some known methods. We introduce a new task to be part of SemEval-2010: the interpretation of noun compounds using paraphrasing verbs and prepositions. The task is meant to provide a standard testbed for future research on noun compound semantics. It should also promote paraphrase-based approaches to the problem, which can benefit many NLP applications. 1
1. Department of Linguistics, University of Illinois at Urbana-Champaign, [email protected] 2. Sc... more 1. Department of Linguistics, University of Illinois at Urbana-Champaign, [email protected] 2. School of Information, University of California, Berkeley, [email protected] 3. Department of Electrical Engineering and Computer Science, University of California, Berkeley, [email protected] 4. School of Information Technology and Engineering, University of Ottawa, [email protected] 5. School of Information Technology and Engineering, University of Ottawa, [email protected] 6. Institute for Information Technology, National Research Council of Canada, [email protected] 7. Department of Computer Engineering, Koc University, [email protected]
It took us nearly ten years to get from no wordnet for Polish to the largest wordnet ever built. ... more It took us nearly ten years to get from no wordnet for Polish to the largest wordnet ever built. We started small but quickly learned to dream big. Now we are about to release plWordNet 3.0-emo – complete with sentiment and emotions annotated – and a domestic version of Princeton WordNet, larger than WordNet 3.1 by nearly ten thousand newly added words. The paper retraces the road we travelled and talks a little about the future.
ArXiv, 2020
The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpa... more The second edition of "Semantic Relations Between Nominals" (by Vivi Nastase, Stan Szpakowicz, Preslav Nakov and Diarmuid O Seaghdha) will be published by Morgan & Claypool. A new Chapter 5 of the book discusses relation classification/extraction in the deep-learning paradigm which arose after the first edition appeared. This is a preview of Chapter 5, made public by the kind permission of Morgan & Claypool.
We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected ... more We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected in wordnets – richly interrelated synsets – it contains sentiment and emotion annotations, a large set of multi-word expressions, and a mapping onto WordNet 3.1. Part of the release is enWordNet 1.0, a substantially enlarged copy of WordNet 3.1, with material added to allow for a more complete mapping. The paper discusses the design principles of plWordNet, its content, its statistical portrait, a comparison with similar resources, and a partial list of applications.
The applications of plWordNet, a very large wordnet for Polish, do not yet include work on sentim... more The applications of plWordNet, a very large wordnet for Polish, do not yet include work on sentiment and emotions. We present a pilot project to annotate plWordNet manually with sentiment polarity values and basic emotion values. We work with lexical units, plWordNet’s basic building blocks.1 So far, we have annotated about 30,000 nominal and adjectival LUs. The resulting lexicon is already one of the largest sentiment and emotion resources, in particular among those based on wordnets. We opted for manual annotation to ensure high accuracy, and to provide a reliable starting point for future semi-automated expansion. The paper lists the principal assumptions, outlines the annotation process, and introduces the resulting resource, plWordNetemo. We discuss the selection of the material for the pilot study, show the distribution of annotations across the wordnet, and consider the statistics, including interannotator agreement and the resolution of disagreement.
Book recommender systems (RSs) are useful in libraries, schools and e-commerce applications. To o... more Book recommender systems (RSs) are useful in libraries, schools and e-commerce applications. To our knowledge, no book RS exploits social networks other than book-cataloguing websites. We propose a recommendation component that learns the user’s interests from social media data and recommends books accordingly. Our new method of modelling users’ interests acquires a user’s distinctive topics using tf-idf and represents them as word embeddings. Even though the system is designed to complement other systems, we evaluated it against content-based RS, a traditional book RS, and obtained similar performance. So, the system’s new user would receive recommendation as accurate as current users.
Multi-word expressions evade a closed definition. Linguists and computational linguists rely on i... more Multi-word expressions evade a closed definition. Linguists and computational linguists rely on intuition or build lists of MWE types; while practical, that is scientifically and aesthetically unsatisfying. Without presuming to solve a daunting theoretical problem, we propose a decision procedure which steers a lexicographer toward acceptance or rejection of an N-gram as a lexical unit: a decision tree classifies N-grams as MWE or not MWE. It will succeed if it agrees with the native speakers’ judgment. We need a small, linguistically credible set of features, to contend with the multiplicity of adequate trees. Decision tree induction works with a fixed set of annotated classification examples, but the lexical material for MWE recognition is too large to make annotation feasible. We rely on small-scale statistically significant sampling, and on intuition. Of a few decision trees produced by informed trial and error, we select one we consider best in our circumstances. That tree, dep...
Adverbs are seldom well represented in wordnets. Princeton WordNet, for example, derives from adj... more Adverbs are seldom well represented in wordnets. Princeton WordNet, for example, derives from adjectives practically all its adverbs and whatever involvement they have. GermaNet stays away from this part of speech. Adverbs in plWordNet will be emphatically present in all their semantic and syntactic distinctness. We briefly discuss the linguistic background of the lexical system of Polish adverbs. We describe an automated generator of accurate candidate adverbs, and introduce the lexicographic procedures which will ensure high consistency of wordnet editors’ decisions about adverbs.
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Metaphor is indispensable in poetry. It showcases the poet's creativity, and contributes to the o... more Metaphor is indispensable in poetry. It showcases the poet's creativity, and contributes to the overall emotional pertinence of the poem while honing its specific rhetorical impact. Previous work on metaphor detection relies on either rulebased or statistical models, none of them applied to poetry. Our method focuses on metaphor detection in a poetry corpus. It combines rule-based and statistical models (word embeddings) to develop a new classification system. Our system has achieved a precision of 0.759 and a recall of 0.804 in identifying one type of metaphor in poetry.
Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. , Proceedings, pages 1–9, Vancouver, BC, August 4, 2017. c © 2017 Association for Computational Linguistics, 2017
Metaphor is indispensable in poetry. It showcases the poet's creativity, and contributes to t... more Metaphor is indispensable in poetry. It showcases the poet's creativity, and contributes to the overall emotional perti-nence of the poem while honing its specific rhetorical impact. Previous work on metaphor detection relies on either rule-based or statistical models, none of them applied to poetry. Our method focuses on metaphor detection in a poetry corpus. It combines rule-based and statistical models (word embeddings) to develop a new classification system. Our system has achieved a precision of 0.759 and a recall of 0.804 in identifying one type of metaphor in poetry.
E-commerce "localizes global markets" by opening remote markets to retail and to small companies.... more E-commerce "localizes global markets" by opening remote markets to retail and to small companies. Newly developed E-commerce tools allow individual and organizational buyers to search for suppliers anywhere and make deals electronically. We propose a software agent that interacts with a buyer and elicits information about the criteria, preferences, and limitations, and that conducts business negotiation on behalf of the buyer. The agent has been implemented and tested in Negoplan, a software system that supports the simulation of decision processes. Results of several negotiation simulations are presented.
Cognitive Studies | Études cognitives, 2015
The System of Register Labels in plWordNetStylistic registers influence word usage. Both traditio... more The System of Register Labels in plWordNetStylistic registers influence word usage. Both traditional dictionaries and wordnets assign lexical units to registers, and there is a wide range of solutions. A system of register labels can be flat or hierarchical, with few labels or many, homogeneous or decomposed into sets of elementary features. We review the register label systems in lexicography, and then discuss our model, designed for plWordNet, a large wordnet for Polish. There follows a detailed comparative analysis of several register systems in Polish lexical resources. We also present the practical effect of the adoption of our flat, small and homogeneous system: a relatively high consistency of register assignment in plWordNet, as measured by inter-annotator agreement on a manageable sample. Large-scale conclusions for the whole plWordNet remain to be made once the annotation has been completed, but the experience half-way through this labour-intensive exercise is very encoura...
Cognitive Studies | Études cognitives, 2015
Semantic relations among adjectives in Polish WordNet 2.0: a new relation set, discussion and eva... more Semantic relations among adjectives in Polish WordNet 2.0: a new relation set, discussion and evaluationAdjectives in wordnets are often neglected: there are many fewer of them than nouns, and relations among them are sometimes not as varied as those among nouns or verbs. Polish WordNet 1.0 was no exception. Version 2.0 aims to correct that. We present an overview of a much larger set of lexical-semantic relations which connect adjectives to the other parts of the network. Our choice of relations has been motivated by linguistic considerations, especially the concerns of the Polish lexical semantics, and by pragmatic reasons. The discussion includes detailed substitution tests, meant to ensure consistency among wordnet editors.
Studies in Computational Intelligence, 2011
Uploads
Papers by Stan Szpakowicz