The large number of linked datasets in the Web, and their diversity in terms of schema representa... more The large number of linked datasets in the Web, and their diversity in terms of schema representation has led to a fragmented dataset landscape. Querying and addressing information needs that span across disparate datasets requires the alignment of such schemas. Majority of schema and ontology alignment approaches focus exclusively on class alignment. Yet, relation alignment has not been fully addressed, and existing approaches fall short on addressing the dynamics of datasets and their size. In this work, we address the problem of relation alignment across disparate linked datasets. Our approach focuses on two main aspects. First, online relation alignment, where we do not require full access, and sample instead for a minimal subset of the data. Thus, we address the main limitation of existing work on dealing with the large scale of linked datasets, and in cases where the datasets provide only query access. Second, we learn supervised machine learning models for which we employ various features or matchers that account for the diversity of linked datasets at the instance level. We perform an experimental evaluation on real-world linked datasets, DBpedia, YAGO, and Freebase. The results show superior performance against state-of-the-art approaches in schema matching, with an average relation alignment accuracy of 84%. In addition, we show that relation alignment can be performed efficiently at scale.
HAL (Le Centre pour la Communication Scientifique Directe), 2009
Ce livrable presente en premiere partie le modele logique pour les flux ROSES (notions de temps, ... more Ce livrable presente en premiere partie le modele logique pour les flux ROSES (notions de temps, donnees et flux), ainsi que l'algebre logique d'operateurs sur flux (filtrage, map, union, jointure, etc.), qui permet d'exprimer des requetes continues sur les flux ROSES. Sur la base du modele logique, Le modele logique permet d'etablir des equivalences d'expressions algebriques utilisees a l'optimisation des requetes ? les principales equivalences sont egalement presentees dans cette premiere partie. La seconde partie presente un modele physique oriente evenements, plus proche de l'implementation et une algebre physique qui permet de traduire dans ce modele les operateurs logiques. Les regles de reecriture des expressions de l'algebre logique en algebre physique sont egalement presentees. Enfin, cette partie specifie une implementation evenementielle de l'algebre physique, qui servira a la realisation d'un evaluateur de requetes continues dans le systeme ROSES.
HAL (Le Centre pour la Communication Scientifique Directe), Jul 11, 2005
The open-source software communities currently face an increasing complexity of managing the soft... more The open-source software communities currently face an increasing complexity of managing the software content among theirs developers and contributors. This is mainly due to the continuously growing size of the software, the high frequency of the updates, and the heterogeneity of the participants. We propose a distribution system that tackles two main issues in the software content management: efficient content dissemination through a P2P system architecture, and advanced information system capabilities, using a distributed index for resource location.
We consider here the problem of adding diversity requirements for the results of continuous top-k... more We consider here the problem of adding diversity requirements for the results of continuous top-k queries in a large scale social network, while preserving an efficient, continuous processing. We propose the DA-SANTA algorithm, which smoothly adds content diversity to the continuous processing of top-k queries at the social network scale. The experimental study demonstrates the very good properties in terms of effectiveness and efficiency of this algorithm.
Information streams provide today a prevalent way of publishing and consuming content on the Web,... more Information streams provide today a prevalent way of publishing and consuming content on the Web, especially due to the great success of social networks. Top-k queries over the streams of interest allow limiting results to the most relevant content, while continuous processing of such queries is the most effective approach in large scale systems. However, current systems fail in combining continuous top-k processing with rich scoring models including social network criteria. We present here the SANTA algorithm, able to handle scoring functions including content similarity, but also social network criteria and events in a continuous processing of top-k queries. We propose a variant (SANTA+) that accelerates the processing of interaction events in social networks. We compare SANTA/SANTA+ with an extension of a state-of-the-art algorithm and report a rich experimental study of our approach.
We consider here the problem of adding diversity requirements for the results of continuous top-k... more We consider here the problem of adding diversity requirements for the results of continuous top-k queries in a large scale social network, while preserving an efficient, continuous processing. We propose the DA-SANTA algorithm, which smoothly adds content diversity to the continuous processing of top-k queries at the social network scale. The experimental study demonstrates the very good properties in terms of effectiveness and efficiency of this algorithm.
Communications in computer and information science, 2016
With the huge popularity of social networks, publishing and consuming content through information... more With the huge popularity of social networks, publishing and consuming content through information streams is nowadays at the heart of the new Web. Top-k queries over the streams of interest allow limiting results to relevant content, while continuous processing of such queries is the most effective approach in large scale systems. Current systems fail in combining continuous top-k processing with rich scoring models including social network criteria. We present in this paper our vision on the possible features of a social network of information streams, with a rich scoring model compatible with continuous top-k processing.
The evolution of user requirements and of enabling technologies will have a significant impact on... more The evolution of user requirements and of enabling technologies will have a significant impact on how online search for multimedia content is performed. A major demand of the users is to avoid the “keyword bottleneck”, in part by the inclusion of content-based search criteria. But indexing the content of images, videos or music is very different from indexing hypertext, both because of the nature and volume of the data and because of specific rights issues. To highlight some research challenges that we deem important in this context, we start by taking a closer look at two major types of existing or potential multimedia content providers: the general public and large institutional archives. With widespread digital imaging and cheap high-capacity storage, end users became creators and potential providers of multimedia content. According to the prevailing paradigm for access to user-generated content (UGC), producers upload their content to the central servers of a provider that makes...
Le Centre pour la Communication Scientifique Directe - HAL - Université de Nantes, Oct 27, 2018
s des articles de doctorant•e•s 7.1 Garanties de confidentialité et d'efficacité sur les plate-fo... more s des articles de doctorant•e•s 7.1 Garanties de confidentialité et d'efficacité sur les plate-formes de crowdsourcing 2 Conférenciers invités 2.1 Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d'Amato Bio : Claudia d'Amato obtained her PhD in 2007 from the University of Bari, defending the thesis titled "Similarity Based Learning Methods for the Semantic Web" for which she got the the nomination as author of one of the Best Italian PhD Thesis in Artificial Intelligence from the Artificial Intelligence Italian Commission for the AI*IA award 2007. She pioneered the research on developing Machine Learning methods for ontology mining, that still represents her main research interest. Her research activity has been disseminated through 19 journal papers, 12 book chapters, 55 papers in international collections, 27 papers in international workshop proceedings and 13 articles in national conference and workshop proceedings. She edited 27 books and proceedings and 3 journal special issues. During her research activity she also won several best paper awards. Claudia d'Amato served/is serving as
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2022
Over the last decade, a large number of digital documentation projects have demonstrated the pote... more Over the last decade, a large number of digital documentation projects have demonstrated the potential of image-based modelling of heritage objects in the context of documentation, conservation, and restoration. The inclusion of these emerging methods in the daily monitoring of the activities of a heritage restoration site (context in which hundreds of photographs per day can be acquired by multiple actors, in accordance with several observation and analysis needs) raises new questions at the intersection of big data management, analysis, semantic enrichment, and more generally automatic structuring of this data. In this article we propose a data model developed around these questions and identify the main challenges to overcome the problem of structuring massive collections of photographs through a review of the available literature on similarity metrics used to organise the pictures based on their content or metadata. This work is realized in the context of the restoration site of the Notre-Dame de Paris cathedral that will be used as the main case study.
The large number of linked datasets in the Web, and their diversity in terms of schema representa... more The large number of linked datasets in the Web, and their diversity in terms of schema representation has led to a fragmented dataset landscape. Querying and addressing information needs that span across disparate datasets requires the alignment of such schemas. Majority of schema and ontology alignment approaches focus exclusively on class alignment. Yet, relation alignment has not been fully addressed, and existing approaches fall short on addressing the dynamics of datasets and their size. In this work, we address the problem of relation alignment across disparate linked datasets. Our approach focuses on two main aspects. First, online relation alignment, where we do not require full access, and sample instead for a minimal subset of the data. Thus, we address the main limitation of existing work on dealing with the large scale of linked datasets, and in cases where the datasets provide only query access. Second, we learn supervised machine learning models for which we employ various features or matchers that account for the diversity of linked datasets at the instance level. We perform an experimental evaluation on real-world linked datasets, DBpedia, YAGO, and Freebase. The results show superior performance against state-of-the-art approaches in schema matching, with an average relation alignment accuracy of 84%. In addition, we show that relation alignment can be performed efficiently at scale.
HAL (Le Centre pour la Communication Scientifique Directe), 2009
Ce livrable presente en premiere partie le modele logique pour les flux ROSES (notions de temps, ... more Ce livrable presente en premiere partie le modele logique pour les flux ROSES (notions de temps, donnees et flux), ainsi que l'algebre logique d'operateurs sur flux (filtrage, map, union, jointure, etc.), qui permet d'exprimer des requetes continues sur les flux ROSES. Sur la base du modele logique, Le modele logique permet d'etablir des equivalences d'expressions algebriques utilisees a l'optimisation des requetes ? les principales equivalences sont egalement presentees dans cette premiere partie. La seconde partie presente un modele physique oriente evenements, plus proche de l'implementation et une algebre physique qui permet de traduire dans ce modele les operateurs logiques. Les regles de reecriture des expressions de l'algebre logique en algebre physique sont egalement presentees. Enfin, cette partie specifie une implementation evenementielle de l'algebre physique, qui servira a la realisation d'un evaluateur de requetes continues dans le systeme ROSES.
HAL (Le Centre pour la Communication Scientifique Directe), Jul 11, 2005
The open-source software communities currently face an increasing complexity of managing the soft... more The open-source software communities currently face an increasing complexity of managing the software content among theirs developers and contributors. This is mainly due to the continuously growing size of the software, the high frequency of the updates, and the heterogeneity of the participants. We propose a distribution system that tackles two main issues in the software content management: efficient content dissemination through a P2P system architecture, and advanced information system capabilities, using a distributed index for resource location.
We consider here the problem of adding diversity requirements for the results of continuous top-k... more We consider here the problem of adding diversity requirements for the results of continuous top-k queries in a large scale social network, while preserving an efficient, continuous processing. We propose the DA-SANTA algorithm, which smoothly adds content diversity to the continuous processing of top-k queries at the social network scale. The experimental study demonstrates the very good properties in terms of effectiveness and efficiency of this algorithm.
Information streams provide today a prevalent way of publishing and consuming content on the Web,... more Information streams provide today a prevalent way of publishing and consuming content on the Web, especially due to the great success of social networks. Top-k queries over the streams of interest allow limiting results to the most relevant content, while continuous processing of such queries is the most effective approach in large scale systems. However, current systems fail in combining continuous top-k processing with rich scoring models including social network criteria. We present here the SANTA algorithm, able to handle scoring functions including content similarity, but also social network criteria and events in a continuous processing of top-k queries. We propose a variant (SANTA+) that accelerates the processing of interaction events in social networks. We compare SANTA/SANTA+ with an extension of a state-of-the-art algorithm and report a rich experimental study of our approach.
We consider here the problem of adding diversity requirements for the results of continuous top-k... more We consider here the problem of adding diversity requirements for the results of continuous top-k queries in a large scale social network, while preserving an efficient, continuous processing. We propose the DA-SANTA algorithm, which smoothly adds content diversity to the continuous processing of top-k queries at the social network scale. The experimental study demonstrates the very good properties in terms of effectiveness and efficiency of this algorithm.
Communications in computer and information science, 2016
With the huge popularity of social networks, publishing and consuming content through information... more With the huge popularity of social networks, publishing and consuming content through information streams is nowadays at the heart of the new Web. Top-k queries over the streams of interest allow limiting results to relevant content, while continuous processing of such queries is the most effective approach in large scale systems. Current systems fail in combining continuous top-k processing with rich scoring models including social network criteria. We present in this paper our vision on the possible features of a social network of information streams, with a rich scoring model compatible with continuous top-k processing.
The evolution of user requirements and of enabling technologies will have a significant impact on... more The evolution of user requirements and of enabling technologies will have a significant impact on how online search for multimedia content is performed. A major demand of the users is to avoid the “keyword bottleneck”, in part by the inclusion of content-based search criteria. But indexing the content of images, videos or music is very different from indexing hypertext, both because of the nature and volume of the data and because of specific rights issues. To highlight some research challenges that we deem important in this context, we start by taking a closer look at two major types of existing or potential multimedia content providers: the general public and large institutional archives. With widespread digital imaging and cheap high-capacity storage, end users became creators and potential providers of multimedia content. According to the prevailing paradigm for access to user-generated content (UGC), producers upload their content to the central servers of a provider that makes...
Le Centre pour la Communication Scientifique Directe - HAL - Université de Nantes, Oct 27, 2018
s des articles de doctorant•e•s 7.1 Garanties de confidentialité et d'efficacité sur les plate-fo... more s des articles de doctorant•e•s 7.1 Garanties de confidentialité et d'efficacité sur les plate-formes de crowdsourcing 2 Conférenciers invités 2.1 Ontology Mining by exploiting Machine Learning for Semantic Data Management Claudia d'Amato Bio : Claudia d'Amato obtained her PhD in 2007 from the University of Bari, defending the thesis titled "Similarity Based Learning Methods for the Semantic Web" for which she got the the nomination as author of one of the Best Italian PhD Thesis in Artificial Intelligence from the Artificial Intelligence Italian Commission for the AI*IA award 2007. She pioneered the research on developing Machine Learning methods for ontology mining, that still represents her main research interest. Her research activity has been disseminated through 19 journal papers, 12 book chapters, 55 papers in international collections, 27 papers in international workshop proceedings and 13 articles in national conference and workshop proceedings. She edited 27 books and proceedings and 3 journal special issues. During her research activity she also won several best paper awards. Claudia d'Amato served/is serving as
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2022
Over the last decade, a large number of digital documentation projects have demonstrated the pote... more Over the last decade, a large number of digital documentation projects have demonstrated the potential of image-based modelling of heritage objects in the context of documentation, conservation, and restoration. The inclusion of these emerging methods in the daily monitoring of the activities of a heritage restoration site (context in which hundreds of photographs per day can be acquired by multiple actors, in accordance with several observation and analysis needs) raises new questions at the intersection of big data management, analysis, semantic enrichment, and more generally automatic structuring of this data. In this article we propose a data model developed around these questions and identify the main challenges to overcome the problem of structuring massive collections of photographs through a review of the available literature on similarity metrics used to organise the pictures based on their content or metadata. This work is realized in the context of the restoration site of the Notre-Dame de Paris cathedral that will be used as the main case study.
Uploads
Papers by Dan Vodislav