Papers by Giulio Napolitano
Nature Climate Change, 2015
This is a repository copy of Linguistic analysis of IPCC summaries for policymakers and associate... more This is a repository copy of Linguistic analysis of IPCC summaries for policymakers and associated coverage.
Lecture Notes in Computer Science, 2018
Question answering (QA) systems often consist of several components such as Named Entity Disambig... more Question answering (QA) systems often consist of several components such as Named Entity Disambiguation (NED), Relation Extraction (RE), and Query Generation (QG). In this paper, we focus on the QG process of a QA pipeline on a large-scale Knowledge Base (KB), with noisy annotations and complex sentence structures. We therefore propose SQG, a SPARQL Query Generator with modular architecture, enabling easy integration with other components for the construction of a fully functional QA pipeline. SQG can be used on large open-domain KBs and handle noisy inputs by discovering a minimal subgraph based on uncertain inputs, that it receives from the NED and RE components. This ability allows SQG to consider a set of candidate entities/relations, as opposed to the most probable ones, which leads to a significant boost in the performance of the QG component. The captured subgraph covers multiple candidate walks, which correspond to SPARQL queries. To enhance the accuracy, we present a ranking model based on Tree-LSTM that takes into account the syntactical structure of the question and the tree representation of the candidate queries to find the one representing the correct intention behind the question. SQG outperforms the baseline systems and achieves a macro F1-measure of 75% on the LC-QuAD dataset.
arXiv (Cornell University), Oct 30, 2017
Named Entity Recognition (NER) is an important subtask of information extraction that seeks to lo... more Named Entity Recognition (NER) is an important subtask of information extraction that seeks to locate and recognise named entities. Despite recent achievements, we still face limitations with correctly detecting and classifying entities, prominently in short and noisy text, such as Twitter. An important negative aspect in most of NER approaches is the high dependency on hand-crafted features and domain-specific knowledge, necessary to achieve state-of-the-art results. Thus, devising models to deal with such linguistically complex contexts is still challenging. In this paper, we propose a novel multi-level architecture that does not rely on any specific linguistic resource or encoded rule. Unlike traditional approaches, we use features extracted from images and text to classify named entities. Experimental tests against state-of-the-art NER for Twitter on the Ritter dataset present competitive results (0.59 F-measure), indicating that this approach may lead towards better NER models.
Lecture Notes in Computer Science, 2015
The use of general descriptive names, registered names, trademarks, service marks, etc. in this p... more The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Springer eBooks, 2020
The recent deployments of semantic web tools and the expansion of available linked datasets have ... more The recent deployments of semantic web tools and the expansion of available linked datasets have given users the opportunity of building increasingly complex applications. These emerging use cases often require queries containing mathematical formulas such as euclidean distances or unit conversions. Currently, the latest SPARQL standard (version 1.1) only embeds basic math operators. Thus, to address this shortcoming, some popular SPARQL evaluators provide built-in tools to cover specific needs; however, such tools are not standard yet. To offer users a more generic solution, we propose and share MINDS, a translator of mathematical expressions into SPARQL-compliant bindings which can be understood by any evaluator. MINDS thereby facilitates the query design whenever mathematical computations are needed in a SPARQL query.
The general goal of semantic question answering systems is to provide correct answers to natural ... more The general goal of semantic question answering systems is to provide correct answers to natural language queries, given a number of structured datasets. The increasing broad deployment of question answering (QA) systems in everyday life requires a comparable and reliable rating of how well QA systems perform and how scalable they are. In order to achieve this, we developed a massive dataset of more than 2 million natural language questions and their SPARQL queries for the DBpedia dataset. We combined natural language processing and linked open data to automatically generate this large amount of valid question-query pairs. Our aim is to assist the benchmarking or scoring of QA systems in terms of answering questions in a range of languages, retrieving answers from heterogeneous sources or answering massive amounts of questions within a limited time. This dataset represents an ideal choice for stress-testing systems’ scalability, speed and correctness. As such it has already been included into the Large-scale QA task of the Question Answering Over Linked Data (QALD) Challenge and the HOBBIT project Question Answering Benchmark.
Lecture Notes in Computer Science, 2019
We study question answering systems over knowledge graphs which map an input natural language que... more We study question answering systems over knowledge graphs which map an input natural language question into candidate formal queries. Often, a ranking mechanism is used to discern the queries with higher similarity to the given question. Considering the intrinsic complexity of the natural language, finding the most accurate formal counterpart is a challenging task. In our recent paper [1], we leveraged Tree-LSTM to exploit the syntactical structure of input question as well as the candidate formal queries to compute the similarities. An empirical study shows that taking the structural information of the input question and candidate query into account enhances the performance, when compared to the baseline system.
Sensors
Since life expectancy has increased significantly over the past century, society is being forced ... more Since life expectancy has increased significantly over the past century, society is being forced to discover innovative ways to support active aging and elderly care. The e-VITA project, which receives funding from both the European Union and Japan, is built on a cutting edge method of virtual coaching that focuses on the key areas of active and healthy aging. The requirements for the virtual coach were ascertained through a process of participatory design in workshops, focus groups, and living laboratories in Germany, France, Italy, and Japan. Several use cases were then chosen for development utilising the open-source Rasa framework. The system uses common representations such as Knowledge Bases and Knowledge Graphs to enable the integration of context, subject expertise, and multimodal data, and is available in English, German, French, Italian, and Japanese.
Modern medical research and clinical practice are more dependent than ever on multi-factorial dat... more Modern medical research and clinical practice are more dependent than ever on multi-factorial data sets originating from various sources, such as medical imaging, DNA analysis, patient health records and contextual factors. This data drives research, facilitates correct diagnoses and ultimately helps to develop and select the appropriate treatments. The volume and impact of this data has increased tremendously through technological developments such as high-throughput genomics and high-resolution medical imaging techniques. Additionally, the availability and popularity of different wearable health care devices has allowed the collection and monitoring of fine-grained personal health care data. The fusion and combination of these heterogeneous data sources has already led to many breakthroughs in health research and shows high potential for the development of methods that will push current reactive practices towards predictive, personalized and preventive health care. This potential ...
Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services, 2020
Inflammatory bowel disease (IBD) is a chronic disease characterized by numerous, hard to predict ... more Inflammatory bowel disease (IBD) is a chronic disease characterized by numerous, hard to predict periods of relapse and remission. "Digital twin" approaches, leveraging personalized predictive models, would significantly enhance therapeutic decision-making and cost-effectiveness. However, the associated computational and statistical methods require high quality data from a large population of patients. Such a comprehensive repository is very challenging to build, though, and none is available for IBD. To overcome this, a promising approach is to employ a knowledge graph, which is built from the available data and would help predicting IBD episodes and delivering more relevant personalized therapy at the lowest cost. In this research, we present a knowledge graph developed on the basis of patient records which are collected from one of the largest German gastroentologic outpatient clinic. First, we designed IBD ontology that encompasses the vocabulary, specifications and characteristics associated by physicians with IBD patients, such as disease classification schemas (e.g., Montreal Classification of IBD), status of the disease activity, and medications. Next, we defined the mappings between ontology entities and database variables. Physicians and project members participating in the Fraunhofer MED2ICIN project, validated the ontology and the knowledge graph. Furthermore, the knowledge graph has been validated against the competency questions compiled by physicians.
Recent years have seen a growing amount of research on question answering (QA) over Semantic Web ... more Recent years have seen a growing amount of research on question answering (QA) over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power of Semantic Web standards. At the same time, this system hides their complexity behind an intuitive and easy-touse interface. However, the growing amount of data has led to a heterogeneous data landscape where QA systems struggle to keep up with the volume, variety and veracity of the underlying knowledge. The Question Answering over Linked Data (QALD) challenge aims to provide an up-to-date benchmark for assessing and comparing state-of-the-art-systems that mediate between a user, expressing his or her information need in natural language, and RDF data. In the past few years, more than 38 research groups took part in the last eight QALD challenges. The QALD challenge targets all researchers and practitioners working on querying Linked Data, natural language processing for question answering, ...
We study question answering systems over knowledge graphs which map an input natural language que... more We study question answering systems over knowledge graphs which map an input natural language question into candidate formal queries. Often, a ranking mechanism is used to discern the queries with higher similarity to the given question. Considering the intrinsic complexity of the natural language, finding the most accurate formal counter-part is a challenging task. In our recent paper [1], we leveraged Tree-LSTM to exploit the syntactical structure of input question as well as the candidate formal queries to compute the similarities. An empirical study shows that taking the structural information of the input question and candidate query into account enhances the performance, when compared to the baseline system. Code related to this paper is available at: https://github.com/AskNowQA/SQG.
Business Strategy and the Environment, 2021
Drawing on the literature on framing, we explore the emotional framing differences in radical and... more Drawing on the literature on framing, we explore the emotional framing differences in radical and reformative NGOs over time. We analyse the sentiment of a sample of 5880 press releases issued by five NGOs positioned differently on the reformative‐radical spectrum and examine how they address large companies. Our findings reveal an increasing polarisation of sentiment in these NGOs' framing, with individual NGOs gravitating towards ideal‐type radical or reformative positions, respectively. In alignment with the differences in their framing, we observe differences in their approaches to cross‐sector partnerships. Policymakers need to note the implications of the observed polarisation for the effectiveness and credibility of cross‐sector partnerships and multi‐stakeholder initiatives more generally, given the risk of co‐optation (for reformative NGOs) as well as the risk of foregoing significant funding and governance opportunities (for radical NGOs).
The Semantic Web, 2018
Question answering (QA) systems often consist of several components such as Named Entity Disambig... more Question answering (QA) systems often consist of several components such as Named Entity Disambiguation (NED), Relation Extraction (RE), and Query Generation (QG). In this paper, we focus on the QG process of a QA pipeline on a large-scale Knowledge Base (KB), with noisy annotations and complex sentence structures. We therefore propose SQG, a SPARQL Query Generator with modular architecture, enabling easy integration with other components for the construction of a fully functional QA pipeline. SQG can be used on large open-domain KBs and handle noisy inputs by discovering a minimal subgraph based on uncertain inputs, that it receives from the NED and RE components. This ability allows SQG to consider a set of candidate entities/relations, as opposed to the most probable ones, which leads to a significant boost in the performance of the QG component. The captured subgraph covers multiple candidate walks, which correspond to SPARQL queries. To enhance the accuracy, we present a ranking model based on Tree-LSTM that takes into account the syntactical structure of the question and the tree representation of the candidate queries to find the one representing the correct intention behind the question. SQG outperforms the baseline systems and achieves a macro F1-measure of 75% on the LC-QuAD dataset.
Lecture Notes in Computer Science, 2015
DOI to the publisher's website. • The final author version and the galley proof are versions of t... more DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
Accounting Forum, 2014
This paper examines the question of whether corporate sustainability reports can serve as accurat... more This paper examines the question of whether corporate sustainability reports can serve as accurate and fair representations of corporate sustainability performance. It presents the results of a sentiment analysis of CEO statements in corporate sustainability reports and corporate financial reports between 2001 and 2010. Making an analogy with corporate financial reporting it is expected that if corporate sustainability reports accurately reflect sustainability performance, then this should be reflected in the rhetoric used. The analysis shows that the rhetoric in the CEO statements of sustainability reports is indicative of impression management rather than accountability, despite increasing standardization of sustainability reporting.
Nature Climate Change, 2015
This is a repository copy of Linguistic analysis of IPCC summaries for policymakers and associate... more This is a repository copy of Linguistic analysis of IPCC summaries for policymakers and associated coverage.
Environmental Communication, 2016
Uploads
Papers by Giulio Napolitano