Skip to main content

Domenico Talia

University of Calabria, DEIS -Dipartimento di Elettronica, Informatica e Sistemistica, Faculty Member

Followers

137

Following

5

Co-authors

2

Public Views

I am working on trying to combine HPC and distributed computing with knowledge discovery and data analytics.

less

InterestsView All (8)

Uploads

Papers by Domenico Talia

by Richard Olejnik, Leonel Sousa, and Domenico Talia

Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin... more Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin, Ireland Alper ��en, Bo��azi��i University, Turkey Andreas Kn��pfer, TU Dresden, Germany Bertil Folliot, University Paris 6, France Can ��zturan, Bo��azi��i University, Turkey Chistine Morin, IRISA, Rennes, France Dana Petcu, Western University of Timisoara, Romania Daniel Grosu, Wayne State University, Detroit, USA Denis Trystram, IMAG, France Dieter Kranzlm��ller, Ludwig-Maximilians-Universitaet Muenchen Domenico Talia, Universita ...

Enabling knowledge discovery services on grids

The Grid is mainly used today for supporting high-performance compute intensive applications. How... more The Grid is mainly used today for supporting high-performance compute intensive applications. However, it is going to be effectively exploited for deploying data-driven and knowledge discovery applications. To support these classes of applications, high-level tools and services are vital. The Knowledge Grid is a high-level system for providing Grid-based knowledge discovery services.

Peer-to-Peer Metadata Management for Knowledge Discovery Applications in Grids

Computational Grids are powerful platforms gathering computational power and storage space from t... more Computational Grids are powerful platforms gathering computational power and storage space from thousands of geographically distributed resources. The applications running on such platforms need to efficiently and reliably access the various and heterogeneous distributed resources they offer. This can be achieved by using metadata information describing all available resources. It is therefore crucial to provide efficient metadata management architectures and frameworks.

Distributed data mining services leveraging WSRF

Abstract The continuous increase of data volumes available from many sources raises new challenge... more Abstract The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computational intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.

Distributed data mining services leveraging WSRF

The continuous increase of data volumes available from many sources raises new challenges for the... more The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computationally intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.

Perspectives on grid computing

Grid computing has been the subject of many large national and international IT projects. However... more Grid computing has been the subject of many large national and international IT projects. However, not all goals of these projects have been achieved. In particular, the number of users lags behind the initial forecasts laid out by proponents of grid technologies. This underachievement may have led to claims that the grid concept as a whole is on its way to being replaced by Cloud computing and various X-as-a-Service approaches.

A data sharing protocol for desktop grid projects

Abstract Service grids and desktop grids are great solutions for solving the available compute po... more Abstract Service grids and desktop grids are great solutions for solving the available compute power problem and helping to balance loads across network systems. However, with the objective to support new scientific communities that need extremely large numbers of resources, the solution could be to interconnect these two kinds of Grid systems into an integrated Service GridDesktop Grid (SG-DG) infrastructure.

Eureka!: A tool for interactive knowledge discovery

In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical... more In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows a user to partition the search space by choosing shape or density according to criteria he deems optimal.

ERGOT: Combining DHTs and SONs for Semantic-Based Service Discovery on the Grid

Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middle... more Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middleware tools, toward a more application-oriented Service Oriented Architecture in which resources are exposed as services. The soaring number of available services advocates distributed and semantic-based discovery architectures. Distribution promotes scalability and fault-tolerance whereas semantics is required to provide for meaningful descriptions of services and support their efficient retrieval.

Using Grids for Distributed Knowledge Discovery

AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patt... more AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patterns, trends, and models in large datasets. The grid can be effectively exploited for deploying knowledge discovery applications because of the high performance it can offer and its distributed infrastructure. For effective use of grids in knowledge discovery, the development of middleware is critical to support data management, data transfer, data mining and knowledge representation.

A parallel cellular simulator for bioremediation of contaminated soils

Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. Th... more Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. The most important principle of bioremediation is that microorganisms (mainly bacteria) can be used to destroy hazardous contaminants or transform them into a less harmful form. Currently, we are facing this problem in the CABOTO project within the PCI ESPRIT framework.

Knowledge Discovery e Data Mining: Concetti, Algoritmi e Sistemi

Abstract Research and development work in the area of knowledge discovery and data mining concern... more Abstract Research and development work in the area of knowledge discovery and data mining concerns the study and de nition of techniques, methods, and tools for the extraction of novel, useful and not explicitely available patterns from large volumes of data. Data mining techniques originated from the usage of statistical analysis and machine learning techniques for the mining of patterns from databases. In the last few years new techniques and algorithms have been designed.

Towards the next-generation grid: a pervasive environment for knowledge-based computing

Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem so... more Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. A main factor that will drive the development and evolution of the Grid will be the necessity to face the enormous amount of data that any field of human activity is producing at a rate never seen before. This position paper attempts to forecast the ongoing evolution of computational Grids towards what we name next-generation Grids.

A DHT-based semantic overlay network for service discovery

The number of available Internet services increases every day. This trend demands distributed mod... more The number of available Internet services increases every day. This trend demands distributed models and architectures to support scalability as well as semantics to enable efficient publication and retrieval of services. Two common approaches toward this goal are semantic overlay networks (SONs) and distributed hash tables (DHTs) with semantic extensions.

Dynamic querying in structured peer-to-peer networks

Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minim... more Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minimize the number of peers that is necessary to visit to reach the desired number of results. In this paper we introduce the use of the DQ technique in structured P2P networks. In particular, we present a P2P search algorithm, named DQ-DHT (Dynamic Querying over a Distributed Hash Table), to perform DQ-like searches over DHT-based overlays.

Cloud Computing and Software Agents: Towards Cloud Intelligent Services

Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computi... more Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computing that are “elastic” since they are able to adapt to user and application needs. Clouds are used through a service-oriented interface that implements the*-as-a-service paradigm to offer Cloud services on demand.

A dht-based peer-to-peer framework for resource discovery in grids

Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids ha... more Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids have recently been proposed. This report looks at a P2P resource discovery framework aiming to manage various Grid resources and complex queries. Following the discussion on characteristics of Grid resources and related query requirements, a DHT-based framework leveraging different P2P resource discovery techniques is proposed.

Models and languages for parallel computation

Abstract We survey parallel programming models and languages using six criteria to assess their s... more Abstract We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should by easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs.

Mobile data mining on small devices through web services

Analysis of data is a complex process that often involves remote resources (computers, software, ... more Analysis of data is a complex process that often involves remote resources (computers, software, databases, ﬁles, etc.) and people (analysts, professionals, end users). Recently, distributed data mining techniques are used to analyze dispersed data sets. An advancement in this research area comes from the use of mobile computing technology for supporting new data analysis techniques and new ways to discover knowledge from every place in which people operate.

Scalable parallel clustering for data mining on multicomputers

This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a pa... more This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results.

by Richard Olejnik, Leonel Sousa, and Domenico Talia

Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin... more Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin, Ireland Alper ��en, Bo��azi��i University, Turkey Andreas Kn��pfer, TU Dresden, Germany Bertil Folliot, University Paris 6, France Can ��zturan, Bo��azi��i University, Turkey Chistine Morin, IRISA, Rennes, France Dana Petcu, Western University of Timisoara, Romania Daniel Grosu, Wayne State University, Detroit, USA Denis Trystram, IMAG, France Dieter Kranzlm��ller, Ludwig-Maximilians-Universitaet Muenchen Domenico Talia, Universita ...

Enabling knowledge discovery services on grids

The Grid is mainly used today for supporting high-performance compute intensive applications. How... more The Grid is mainly used today for supporting high-performance compute intensive applications. However, it is going to be effectively exploited for deploying data-driven and knowledge discovery applications. To support these classes of applications, high-level tools and services are vital. The Knowledge Grid is a high-level system for providing Grid-based knowledge discovery services.

Peer-to-Peer Metadata Management for Knowledge Discovery Applications in Grids

Computational Grids are powerful platforms gathering computational power and storage space from t... more Computational Grids are powerful platforms gathering computational power and storage space from thousands of geographically distributed resources. The applications running on such platforms need to efficiently and reliably access the various and heterogeneous distributed resources they offer. This can be achieved by using metadata information describing all available resources. It is therefore crucial to provide efficient metadata management architectures and frameworks.

Distributed data mining services leveraging WSRF

Abstract The continuous increase of data volumes available from many sources raises new challenge... more Abstract The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computational intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.

Distributed data mining services leveraging WSRF

The continuous increase of data volumes available from many sources raises new challenges for the... more The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computationally intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.

Perspectives on grid computing

Grid computing has been the subject of many large national and international IT projects. However... more Grid computing has been the subject of many large national and international IT projects. However, not all goals of these projects have been achieved. In particular, the number of users lags behind the initial forecasts laid out by proponents of grid technologies. This underachievement may have led to claims that the grid concept as a whole is on its way to being replaced by Cloud computing and various X-as-a-Service approaches.

A data sharing protocol for desktop grid projects

Abstract Service grids and desktop grids are great solutions for solving the available compute po... more Abstract Service grids and desktop grids are great solutions for solving the available compute power problem and helping to balance loads across network systems. However, with the objective to support new scientific communities that need extremely large numbers of resources, the solution could be to interconnect these two kinds of Grid systems into an integrated Service GridDesktop Grid (SG-DG) infrastructure.

Eureka!: A tool for interactive knowledge discovery

In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical... more In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows a user to partition the search space by choosing shape or density according to criteria he deems optimal.

ERGOT: Combining DHTs and SONs for Semantic-Based Service Discovery on the Grid

Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middle... more Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middleware tools, toward a more application-oriented Service Oriented Architecture in which resources are exposed as services. The soaring number of available services advocates distributed and semantic-based discovery architectures. Distribution promotes scalability and fault-tolerance whereas semantics is required to provide for meaningful descriptions of services and support their efficient retrieval.

Using Grids for Distributed Knowledge Discovery

AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patt... more AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patterns, trends, and models in large datasets. The grid can be effectively exploited for deploying knowledge discovery applications because of the high performance it can offer and its distributed infrastructure. For effective use of grids in knowledge discovery, the development of middleware is critical to support data management, data transfer, data mining and knowledge representation.

A parallel cellular simulator for bioremediation of contaminated soils

Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. Th... more Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. The most important principle of bioremediation is that microorganisms (mainly bacteria) can be used to destroy hazardous contaminants or transform them into a less harmful form. Currently, we are facing this problem in the CABOTO project within the PCI ESPRIT framework.

Knowledge Discovery e Data Mining: Concetti, Algoritmi e Sistemi

Abstract Research and development work in the area of knowledge discovery and data mining concern... more Abstract Research and development work in the area of knowledge discovery and data mining concerns the study and de nition of techniques, methods, and tools for the extraction of novel, useful and not explicitely available patterns from large volumes of data. Data mining techniques originated from the usage of statistical analysis and machine learning techniques for the mining of patterns from databases. In the last few years new techniques and algorithms have been designed.

Towards the next-generation grid: a pervasive environment for knowledge-based computing

Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem so... more Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. A main factor that will drive the development and evolution of the Grid will be the necessity to face the enormous amount of data that any field of human activity is producing at a rate never seen before. This position paper attempts to forecast the ongoing evolution of computational Grids towards what we name next-generation Grids.

A DHT-based semantic overlay network for service discovery

The number of available Internet services increases every day. This trend demands distributed mod... more The number of available Internet services increases every day. This trend demands distributed models and architectures to support scalability as well as semantics to enable efficient publication and retrieval of services. Two common approaches toward this goal are semantic overlay networks (SONs) and distributed hash tables (DHTs) with semantic extensions.

Dynamic querying in structured peer-to-peer networks

Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minim... more Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minimize the number of peers that is necessary to visit to reach the desired number of results. In this paper we introduce the use of the DQ technique in structured P2P networks. In particular, we present a P2P search algorithm, named DQ-DHT (Dynamic Querying over a Distributed Hash Table), to perform DQ-like searches over DHT-based overlays.

Cloud Computing and Software Agents: Towards Cloud Intelligent Services

Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computi... more Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computing that are “elastic” since they are able to adapt to user and application needs. Clouds are used through a service-oriented interface that implements the*-as-a-service paradigm to offer Cloud services on demand.

A dht-based peer-to-peer framework for resource discovery in grids

Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids ha... more Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids have recently been proposed. This report looks at a P2P resource discovery framework aiming to manage various Grid resources and complex queries. Following the discussion on characteristics of Grid resources and related query requirements, a DHT-based framework leveraging different P2P resource discovery techniques is proposed.

Models and languages for parallel computation

Abstract We survey parallel programming models and languages using six criteria to assess their s... more Abstract We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should by easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs.

Mobile data mining on small devices through web services

Analysis of data is a complex process that often involves remote resources (computers, software, ... more Analysis of data is a complex process that often involves remote resources (computers, software, databases, ﬁles, etc.) and people (analysts, professionals, end users). Recently, distributed data mining techniques are used to analyze dispersed data sets. An advancement in this research area comes from the use of mobile computing technology for supporting new data analysis techniques and new ways to discover knowledge from every place in which people operate.

Scalable parallel clustering for data mining on multicomputers

This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a pa... more This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results.