Skip to main content

Davide Lanti

Free University of Bozen-Bolzano, Computer Science, PhD Student

Followers

12

Following

8

Co-authors

8

Public Views

InterestsView All (6)

Uploads

Papers by Davide Lanti

Conceptually-grounded mapping patterns for Virtual Knowledge Graphs

Data and Knowledge Engineering, May 1, 2023

Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to ... more Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to the flexibility of their data model, allowing one to access and integrate collections of data of different forms. Virtual Knowledge Graphs (VKGs), a variant of KGs originating from the field of Ontology-based Data Access (OBDA), are a promising paradigm for integrating and accessing legacy data sources. The main idea of VKGs is that the KG remains virtual: the end-user interacts with a KG, but queries are reformulated on-the-fly as queries over the data source(s). To enable the paradigm, one needs to define declarative mappings specifying the link between the data sources and the elements in the VKG. In this work, we try to investigate common patterns that arise when specifying such mappings, building on well-established methodologies from the area of conceptual modeling and database design.

ADaMaP: Automatic Alignment of Relational Data Sources using Mapping Patterns (Abstract)

Description Logics, 2021

Counting Query Answers over a DL-Lite Knowledge Base

Counting answers to a query is an operation supported by virtually all database management system... more Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing P and coNP lower bounds, and upper bounds in P and L S. For the L S case, we have devised a novel query rewriting technique into first-order logic with counting.

The 5 th International Workshop on Scalable Semantic Web Knowledge Base Systems ( SSWS

Semantic Web reasoning systems are confronted with the task to process growing amounts of distrib... more Semantic Web reasoning systems are confronted with the task to process growing amounts of distributed, dynamic resources. This paper presents a novel way of approaching the challenge by RDF graph traversal, exploiting the advantages of swarm intelligence. The natureinspired and index-free methodology is realised by self-organising swarms of autonomous, light-weight entities that traverse RDF graphs by following paths, aiming to instantiate pattern-based inference rules. The method is evaluated on the basis of a series of simulation experiments with regard to desirable properties of Semantic Web reasoning, focussing on anytime behaviour, adaptiveness and scalability.

Sigmod Record, Jan 31, 2022

A full-fledged data exploration system must combine different access modalities with a powerful c... more A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE-an end-to-end data exploration system-that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Research, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.

Ontology-based Data Federation

Proceedings of the 11th International Joint Conference on Knowledge Graphs

Ontology-based data access (OBDA) is a well-established approach to information management which ... more Ontology-based data access (OBDA) is a well-established approach to information management which facilitates the access to a (single) relational data source through the mediation of a high-level ontology, and the use of a declarative mapping linking the data layer to the ontology. We formally introduce here the notion of ontology-based data federation (OBDF) to denote a framework that combines OBDA with a data federation layer where multiple, possibly heterogeneous sources are virtually exposed as a single relational database. We discuss opportunities and challenges of OBDF, and provide techniques to deliver efficient query answering in an OBDF setting. Such techniques are validated through an extensive experimental evaluation based on the Berlin SPARQL Benchmark.

Fast and Simple Data Scaling for OBDA Benchmarks

BLINK@ISWC, 2016

In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively re... more In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to n times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. In the VIG system, we lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. VIG is efficient and notably each tuple is generated in constant time. VIG has been successfully used in the NPD benchmark, but it provides a general approach that can be re-used to scale any data instance in any OBDA setting.

ADaMaP: Automatic Alignment of Relational Data Sources Using Mapping Patterns

Advanced Information Systems Engineering, 2021

The Virtual Knowledge Graph System Ontop

Lecture Notes in Computer Science, 2020

Ontop is a popular open-source virtual knowledge graph system that can expose heterogeneous data ... more Ontop is a popular open-source virtual knowledge graph system that can expose heterogeneous data sources as a unified knowledge graph. Ontop has been widely used in a variety of research and industrial projects. In this paper, we describe the challenges, design choices, new features of the latest release of Ontop v4, summarizing the development efforts of the last 4 years.

A Scalable Benchmark for OBDA Systems: Preliminary Report⋆

In ontology-based data access (OBDA), the aim is to provide a highlevel conceptual view over pote... more In ontology-based data access (OBDA), the aim is to provide a highlevel conceptual view over potentially very large (relational) data sources by means of a mediating ontology. The ontology is connected to the data sources through a declarative specification given in terms of mappings that relate each (class and property) symbol in the ontology to an (SQL) view over the data. Although prototype OBDA systems providing the ability to answer SPARQL queries over the ontology are available, a significant challenge remains: performance. To properly evaluate OBDA systems, benchmarks tailored towards the requirements in this setting are needed. OWL benchmarks, which have been developed to test the performance of generic SPARQL query engines, however, fail at 1) exhibiting a complex real-world ontology, 2) providing challenging real world queries, 3) providing large amounts of real-world data, and the possibility to test a system over data of increasing size, and 4) capturing important OBDA-specific measures related to the rewriting-based query answering approach in OBDA. In this work, we propose a novel benchmark for OBDA systems based on a real world use-case adopted in the EU project Optique. We validate our benchmark on the system Ontop, showing that it is more adequate than previous benchmarks not tailored for OBDA.

A systematic overview of data federation systems

Semantic Web

Data federation addresses the problem of uniformly accessing multiple, possibly heterogeneous dat... more Data federation addresses the problem of uniformly accessing multiple, possibly heterogeneous data sources, by mapping them into a unified schema, such as an RDF(S)/OWL ontology or a relational schema, and by supporting the execution of queries, like SPARQL or SQL queries, over that unified schema. Data explosion in volume and variety has made data federation increasingly popular in many application domains. Hence, many data federation systems have been developed in industry and academia, and it has become challenging for users to select suitable systems to achieve their objectives. In order to systematically analyze and compare these systems, we propose an evaluation framework comprising four dimensions: (i) federation capabilities, i.e., query language, data source, and federation techniques; (ii) data security, i.e., authentication, authorization, auditing, encryption, and data masking; (iii) interface, i.e., graphical interface, command line interface, and application programmin...

ADaMaP: Automatic Alignment of Data Sources using Mapping Patterns

We propose a method for automatically extracting semantics from data sources. The availability of... more We propose a method for automatically extracting semantics from data sources. The availability of multiple data sources on the one hand and the lack of proper semantic documentation of such data sources on the other hand call for new strategies in integrating data sources by extracting semantics from the data source itself rather than from its documentation. We observe that relational databases are created from semantically-rich designs such as ER diagrams, which are often not conveyed together with the database itself. While the relational model may be semantically-poor with respect to ontological models, the original semanticallyrich design of the application domain leaves recognizable footprints that can be converted into ontology mapping patterns. In this work, we offer an algorithm to automatically detect and map a relational schema to ontology mapping patterns and offer an empirical evaluation using two benchmark datasets.

OBDA Constraints for Effective Query Answering (Extended Version)

ArXiv, 2016

In Ontology Based Data Access (OBDA) users pose SPARQL queries over an ontology that lies on top ... more In Ontology Based Data Access (OBDA) users pose SPARQL queries over an ontology that lies on top of relational datasources. These queries are translated on-the-fly into SQL queries by OBDA systems. Standard SPARQL-to-SQL translation techniques in OBDA often produce SQL queries containing redundant joins and unions, even after a number of semantic and structural optimizations. These redundancies are detrimental to the performance of query answering, especially in complex industrial OBDA scenarios with large enterprise databases. To address this issue, we introduce two novel notions of OBDA constraints and show how to exploit them for efficient query answering. We conduct an extensive set of experiments on large datasets using real world data and queries, showing that these techniques strongly improve the performance of query answering up to orders of magnitude.

Fast and Simple Data Scaling for OBDA Benchmarks

In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively re... more In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to n times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. In the VIG system, we lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. VIG is efficient and notably each tuple is generated in constant time. VIG has been successfully used in the NPD benchmark, but it provides a general approach that can be re-used to scale any data instance in any OBDA setting.

Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions

Description Logics, 2020

We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated un... more We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated under count semantics over a DL-Lite knowledge base. The target query language is an extension of relational algebra with aggregation and arithmetic functions, which can be translated into SQL. The algorithm supports number restrictions on the RHS of axioms in the input TBox, which can be used to encode statistics. The size of the output query remains linear in the binary encoding of these numbers, which improves upon previously proposed approaches.

Enriching Ontology-based Data Access with Provenance

Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources b... more Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case, these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-LiteR ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.

The Virtual Knowledge Graph System Ontop (Extended Abstract)

Description Logics, 2020

VKG. The Virtual Knowledge Graph (VKG) approach, also known in the literature as Ontology-Based D... more VKG. The Virtual Knowledge Graph (VKG) approach, also known in the literature as Ontology-Based Data Access (OBDA) [8,12], has become a popular paradigm for accessing and integrating data sources [13]. In such approach, the data sources, which are normally relational databases, are virtualized through a mapping and an ontology, and presented as a unified knowledge graph, which can be queried by end-users through a vocabulary they are familiar with. At query time, a VKG system translates user queries over the ontology to SQL queries over the database. This approach frees end-users from the low-level details of data organization, so that they can concentrate on their high-level tasks. As it is gaining more importance, the VKG paradigm has been implemented in several systems [1,2,9,11] and adopted in a wide range of use cases. Here, we present the latest major release, Ontop v4, of a popular VKG system. Ontop v1. The development of Ontop has spanned the past decade. Developing such a s...

Counting Query Answers over a DL-Lite Knowledge Base (extended version)

ArXiv, 2020

Counting answers to a query is an operation supported by virtually all database management system... more Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with c...

An Evaluation of VIG with the BSBM Benchmark

We present an experimental evaluation of VIG, a data scaler for OBDA benchmarks. Data scaling is ... more We present an experimental evaluation of VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for scaling an input data instance to s times its size, while preserving certain application-specific characteristics. A data scaler is a “general” generator, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. VIG lifts the scaling approach from the database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. To evaluate VIG, in this paper we use it to generate data for the Berlin SPARQL Benchmark (BSBM), and compare it with the official BSBM data generator.

INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

A full-fledged data exploration system must combine different access modalities with a powerful c... more A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE an end-to-end data exploration system that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Research, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus...

Conceptually-grounded mapping patterns for Virtual Knowledge Graphs

Data and Knowledge Engineering, May 1, 2023

Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to ... more Knowledge Graphs (KGs) have been gaining momentum recently in both academia and industry, due to the flexibility of their data model, allowing one to access and integrate collections of data of different forms. Virtual Knowledge Graphs (VKGs), a variant of KGs originating from the field of Ontology-based Data Access (OBDA), are a promising paradigm for integrating and accessing legacy data sources. The main idea of VKGs is that the KG remains virtual: the end-user interacts with a KG, but queries are reformulated on-the-fly as queries over the data source(s). To enable the paradigm, one needs to define declarative mappings specifying the link between the data sources and the elements in the VKG. In this work, we try to investigate common patterns that arise when specifying such mappings, building on well-established methodologies from the area of conceptual modeling and database design.

ADaMaP: Automatic Alignment of Relational Data Sources using Mapping Patterns (Abstract)

Description Logics, 2021

Counting Query Answers over a DL-Lite Knowledge Base

Counting answers to a query is an operation supported by virtually all database management system... more Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing P and coNP lower bounds, and upper bounds in P and L S. For the L S case, we have devised a novel query rewriting technique into first-order logic with counting.

The 5 th International Workshop on Scalable Semantic Web Knowledge Base Systems ( SSWS

Semantic Web reasoning systems are confronted with the task to process growing amounts of distrib... more Semantic Web reasoning systems are confronted with the task to process growing amounts of distributed, dynamic resources. This paper presents a novel way of approaching the challenge by RDF graph traversal, exploiting the advantages of swarm intelligence. The natureinspired and index-free methodology is realised by self-organising swarms of autonomous, light-weight entities that traverse RDF graphs by following paths, aiming to instantiate pattern-based inference rules. The method is evaluated on the basis of a series of simulation experiments with regard to desirable properties of Semantic Web reasoning, focussing on anytime behaviour, adaptiveness and scalability.

Sigmod Record, Jan 31, 2022

A full-fledged data exploration system must combine different access modalities with a powerful c... more A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE-an end-to-end data exploration system-that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Research, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.

Ontology-based Data Federation

Proceedings of the 11th International Joint Conference on Knowledge Graphs

Ontology-based data access (OBDA) is a well-established approach to information management which ... more Ontology-based data access (OBDA) is a well-established approach to information management which facilitates the access to a (single) relational data source through the mediation of a high-level ontology, and the use of a declarative mapping linking the data layer to the ontology. We formally introduce here the notion of ontology-based data federation (OBDF) to denote a framework that combines OBDA with a data federation layer where multiple, possibly heterogeneous sources are virtually exposed as a single relational database. We discuss opportunities and challenges of OBDF, and provide techniques to deliver efficient query answering in an OBDF setting. Such techniques are validated through an extensive experimental evaluation based on the Berlin SPARQL Benchmark.

Fast and Simple Data Scaling for OBDA Benchmarks

BLINK@ISWC, 2016

In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively re... more In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to n times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. In the VIG system, we lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. VIG is efficient and notably each tuple is generated in constant time. VIG has been successfully used in the NPD benchmark, but it provides a general approach that can be re-used to scale any data instance in any OBDA setting.

ADaMaP: Automatic Alignment of Relational Data Sources Using Mapping Patterns

Advanced Information Systems Engineering, 2021

The Virtual Knowledge Graph System Ontop

Lecture Notes in Computer Science, 2020

Ontop is a popular open-source virtual knowledge graph system that can expose heterogeneous data ... more Ontop is a popular open-source virtual knowledge graph system that can expose heterogeneous data sources as a unified knowledge graph. Ontop has been widely used in a variety of research and industrial projects. In this paper, we describe the challenges, design choices, new features of the latest release of Ontop v4, summarizing the development efforts of the last 4 years.

A Scalable Benchmark for OBDA Systems: Preliminary Report⋆

In ontology-based data access (OBDA), the aim is to provide a highlevel conceptual view over pote... more In ontology-based data access (OBDA), the aim is to provide a highlevel conceptual view over potentially very large (relational) data sources by means of a mediating ontology. The ontology is connected to the data sources through a declarative specification given in terms of mappings that relate each (class and property) symbol in the ontology to an (SQL) view over the data. Although prototype OBDA systems providing the ability to answer SPARQL queries over the ontology are available, a significant challenge remains: performance. To properly evaluate OBDA systems, benchmarks tailored towards the requirements in this setting are needed. OWL benchmarks, which have been developed to test the performance of generic SPARQL query engines, however, fail at 1) exhibiting a complex real-world ontology, 2) providing challenging real world queries, 3) providing large amounts of real-world data, and the possibility to test a system over data of increasing size, and 4) capturing important OBDA-specific measures related to the rewriting-based query answering approach in OBDA. In this work, we propose a novel benchmark for OBDA systems based on a real world use-case adopted in the EU project Optique. We validate our benchmark on the system Ontop, showing that it is more adequate than previous benchmarks not tailored for OBDA.

A systematic overview of data federation systems

Semantic Web

Data federation addresses the problem of uniformly accessing multiple, possibly heterogeneous dat... more Data federation addresses the problem of uniformly accessing multiple, possibly heterogeneous data sources, by mapping them into a unified schema, such as an RDF(S)/OWL ontology or a relational schema, and by supporting the execution of queries, like SPARQL or SQL queries, over that unified schema. Data explosion in volume and variety has made data federation increasingly popular in many application domains. Hence, many data federation systems have been developed in industry and academia, and it has become challenging for users to select suitable systems to achieve their objectives. In order to systematically analyze and compare these systems, we propose an evaluation framework comprising four dimensions: (i) federation capabilities, i.e., query language, data source, and federation techniques; (ii) data security, i.e., authentication, authorization, auditing, encryption, and data masking; (iii) interface, i.e., graphical interface, command line interface, and application programmin...

ADaMaP: Automatic Alignment of Data Sources using Mapping Patterns

We propose a method for automatically extracting semantics from data sources. The availability of... more We propose a method for automatically extracting semantics from data sources. The availability of multiple data sources on the one hand and the lack of proper semantic documentation of such data sources on the other hand call for new strategies in integrating data sources by extracting semantics from the data source itself rather than from its documentation. We observe that relational databases are created from semantically-rich designs such as ER diagrams, which are often not conveyed together with the database itself. While the relational model may be semantically-poor with respect to ontological models, the original semanticallyrich design of the application domain leaves recognizable footprints that can be converted into ontology mapping patterns. In this work, we offer an algorithm to automatically detect and map a relational schema to ontology mapping patterns and offer an empirical evaluation using two benchmark datasets.

OBDA Constraints for Effective Query Answering (Extended Version)

ArXiv, 2016

In Ontology Based Data Access (OBDA) users pose SPARQL queries over an ontology that lies on top ... more In Ontology Based Data Access (OBDA) users pose SPARQL queries over an ontology that lies on top of relational datasources. These queries are translated on-the-fly into SQL queries by OBDA systems. Standard SPARQL-to-SQL translation techniques in OBDA often produce SQL queries containing redundant joins and unions, even after a number of semantic and structural optimizations. These redundancies are detrimental to the performance of query answering, especially in complex industrial OBDA scenarios with large enterprise databases. To address this issue, we introduce two novel notions of OBDA constraints and show how to exploit them for efficient query answering. We conduct an extensive set of experiments on large datasets using real world data and queries, showing that these techniques strongly improve the performance of query answering up to orders of magnitude.

Fast and Simple Data Scaling for OBDA Benchmarks

In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively re... more In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to n times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. In the VIG system, we lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. VIG is efficient and notably each tuple is generated in constant time. VIG has been successfully used in the NPD benchmark, but it provides a general approach that can be re-used to scale any data instance in any OBDA setting.

Rewriting Count Queries over DL-Lite TBoxes with Number Restrictions

Description Logics, 2020

We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated un... more We propose a query rewriting algorithm for a restricted class of conjunctive queries evaluated under count semantics over a DL-Lite knowledge base. The target query language is an extension of relational algebra with aggregation and arithmetic functions, which can be translated into SQL. The algorithm supports number restrictions on the RHS of axioms in the input TBox, which can be used to encode statistics. The size of the output query remains linear in the binary encoding of these numbers, which improves upon previously proposed approaches.

Enriching Ontology-based Data Access with Provenance

Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources b... more Ontology-based data access (OBDA) is a popular paradigm for querying heterogeneous data sources by connecting them through mappings to an ontology. In OBDA, it is often difficult to reconstruct why a tuple occurs in the answer of a query. We address this challenge by enriching OBDA with provenance semirings, taking inspiration from database theory. In particular, we investigate the problems of (i) deciding whether a provenance annotated OBDA instance entails a provenance annotated conjunctive query, and (ii) computing a polynomial representing the provenance of a query entailed by a provenance annotated OBDA instance. Differently from pure databases, in our case, these polynomials may be infinite. To regain finiteness, we consider idempotent semirings, and study the complexity in the case of DL-LiteR ontologies. We implement Task (ii) in a state-of-the-art OBDA system and show the practical feasibility of the approach through an extensive evaluation against two popular benchmarks.

The Virtual Knowledge Graph System Ontop (Extended Abstract)

Description Logics, 2020

VKG. The Virtual Knowledge Graph (VKG) approach, also known in the literature as Ontology-Based D... more VKG. The Virtual Knowledge Graph (VKG) approach, also known in the literature as Ontology-Based Data Access (OBDA) [8,12], has become a popular paradigm for accessing and integrating data sources [13]. In such approach, the data sources, which are normally relational databases, are virtualized through a mapping and an ontology, and presented as a unified knowledge graph, which can be queried by end-users through a vocabulary they are familiar with. At query time, a VKG system translates user queries over the ontology to SQL queries over the database. This approach frees end-users from the low-level details of data organization, so that they can concentrate on their high-level tasks. As it is gaining more importance, the VKG paradigm has been implemented in several systems [1,2,9,11] and adopted in a wide range of use cases. Here, we present the latest major release, Ontop v4, of a popular VKG system. Ontop v1. The development of Ontop has spanned the past decade. Developing such a s...

Counting Query Answers over a DL-Lite Knowledge Base (extended version)

ArXiv, 2020

Counting answers to a query is an operation supported by virtually all database management system... more Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing a PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the latter case, we define a novel query rewriting technique into first-order logic with c...

An Evaluation of VIG with the BSBM Benchmark

We present an experimental evaluation of VIG, a data scaler for OBDA benchmarks. Data scaling is ... more We present an experimental evaluation of VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for scaling an input data instance to s times its size, while preserving certain application-specific characteristics. A data scaler is a “general” generator, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. VIG lifts the scaling approach from the database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. To evaluate VIG, in this paper we use it to generate data for the Berlin SPARQL Benchmark (BSBM), and compare it with the official BSBM data generator.

INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

A full-fledged data exploration system must combine different access modalities with a powerful c... more A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE an end-to-end data exploration system that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Research, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus...