Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin... more Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin, Ireland Alper ��en, Bo��azi��i University, Turkey Andreas Kn��pfer, TU Dresden, Germany Bertil Folliot, University Paris 6, France Can ��zturan, Bo��azi��i University, Turkey Chistine Morin, IRISA, Rennes, France Dana Petcu, Western University of Timisoara, Romania Daniel Grosu, Wayne State University, Detroit, USA Denis Trystram, IMAG, France Dieter Kranzlm��ller, Ludwig-Maximilians-Universitaet Muenchen Domenico Talia, Universita ...
The Grid is mainly used today for supporting high-performance compute intensive applications. How... more The Grid is mainly used today for supporting high-performance compute intensive applications. However, it is going to be effectively exploited for deploying data-driven and knowledge discovery applications. To support these classes of applications, high-level tools and services are vital. The Knowledge Grid is a high-level system for providing Grid-based knowledge discovery services.
Computational Grids are powerful platforms gathering computational power and storage space from t... more Computational Grids are powerful platforms gathering computational power and storage space from thousands of geographically distributed resources. The applications running on such platforms need to efficiently and reliably access the various and heterogeneous distributed resources they offer. This can be achieved by using metadata information describing all available resources. It is therefore crucial to provide efficient metadata management architectures and frameworks.
Abstract The continuous increase of data volumes available from many sources raises new challenge... more Abstract The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computational intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.
The continuous increase of data volumes available from many sources raises new challenges for the... more The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computationally intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.
Grid computing has been the subject of many large national and international IT projects. However... more Grid computing has been the subject of many large national and international IT projects. However, not all goals of these projects have been achieved. In particular, the number of users lags behind the initial forecasts laid out by proponents of grid technologies. This underachievement may have led to claims that the grid concept as a whole is on its way to being replaced by Cloud computing and various X-as-a-Service approaches.
Abstract Service grids and desktop grids are great solutions for solving the available compute po... more Abstract Service grids and desktop grids are great solutions for solving the available compute power problem and helping to balance loads across network systems. However, with the objective to support new scientific communities that need extremely large numbers of resources, the solution could be to interconnect these two kinds of Grid systems into an integrated Service GridDesktop Grid (SG-DG) infrastructure.
In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical... more In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows a user to partition the search space by choosing shape or density according to criteria he deems optimal.
Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middle... more Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middleware tools, toward a more application-oriented Service Oriented Architecture in which resources are exposed as services. The soaring number of available services advocates distributed and semantic-based discovery architectures. Distribution promotes scalability and fault-tolerance whereas semantics is required to provide for meaningful descriptions of services and support their efficient retrieval.
AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patt... more AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patterns, trends, and models in large datasets. The grid can be effectively exploited for deploying knowledge discovery applications because of the high performance it can offer and its distributed infrastructure. For effective use of grids in knowledge discovery, the development of middleware is critical to support data management, data transfer, data mining and knowledge representation.
Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. Th... more Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. The most important principle of bioremediation is that microorganisms (mainly bacteria) can be used to destroy hazardous contaminants or transform them into a less harmful form. Currently, we are facing this problem in the CABOTO project within the PCI ESPRIT framework.
Abstract Research and development work in the area of knowledge discovery and data mining concern... more Abstract Research and development work in the area of knowledge discovery and data mining concerns the study and de nition of techniques, methods, and tools for the extraction of novel, useful and not explicitely available patterns from large volumes of data. Data mining techniques originated from the usage of statistical analysis and machine learning techniques for the mining of patterns from databases. In the last few years new techniques and algorithms have been designed.
Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem so... more Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. A main factor that will drive the development and evolution of the Grid will be the necessity to face the enormous amount of data that any field of human activity is producing at a rate never seen before. This position paper attempts to forecast the ongoing evolution of computational Grids towards what we name next-generation Grids.
The number of available Internet services increases every day. This trend demands distributed mod... more The number of available Internet services increases every day. This trend demands distributed models and architectures to support scalability as well as semantics to enable efficient publication and retrieval of services. Two common approaches toward this goal are semantic overlay networks (SONs) and distributed hash tables (DHTs) with semantic extensions.
Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minim... more Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minimize the number of peers that is necessary to visit to reach the desired number of results. In this paper we introduce the use of the DQ technique in structured P2P networks. In particular, we present a P2P search algorithm, named DQ-DHT (Dynamic Querying over a Distributed Hash Table), to perform DQ-like searches over DHT-based overlays.
Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computi... more Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computing that are “elastic” since they are able to adapt to user and application needs. Clouds are used through a service-oriented interface that implements the*-as-a-service paradigm to offer Cloud services on demand.
Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids ha... more Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids have recently been proposed. This report looks at a P2P resource discovery framework aiming to manage various Grid resources and complex queries. Following the discussion on characteristics of Grid resources and related query requirements, a DHT-based framework leveraging different P2P resource discovery techniques is proposed.
Abstract We survey parallel programming models and languages using six criteria to assess their s... more Abstract We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should by easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs.
Analysis of data is a complex process that often involves remote resources (computers, software, ... more Analysis of data is a complex process that often involves remote resources (computers, software, databases, files, etc.) and people (analysts, professionals, end users). Recently, distributed data mining techniques are used to analyze dispersed data sets. An advancement in this research area comes from the use of mobile computing technology for supporting new data analysis techniques and new ways to discover knowledge from every place in which people operate.
This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a pa... more This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results.
Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin... more Alex Nicolau, University of California, Irvine, USA Alexey Lastovetsky, University College Dublin, Ireland Alper ��en, Bo��azi��i University, Turkey Andreas Kn��pfer, TU Dresden, Germany Bertil Folliot, University Paris 6, France Can ��zturan, Bo��azi��i University, Turkey Chistine Morin, IRISA, Rennes, France Dana Petcu, Western University of Timisoara, Romania Daniel Grosu, Wayne State University, Detroit, USA Denis Trystram, IMAG, France Dieter Kranzlm��ller, Ludwig-Maximilians-Universitaet Muenchen Domenico Talia, Universita ...
The Grid is mainly used today for supporting high-performance compute intensive applications. How... more The Grid is mainly used today for supporting high-performance compute intensive applications. However, it is going to be effectively exploited for deploying data-driven and knowledge discovery applications. To support these classes of applications, high-level tools and services are vital. The Knowledge Grid is a high-level system for providing Grid-based knowledge discovery services.
Computational Grids are powerful platforms gathering computational power and storage space from t... more Computational Grids are powerful platforms gathering computational power and storage space from thousands of geographically distributed resources. The applications running on such platforms need to efficiently and reliably access the various and heterogeneous distributed resources they offer. This can be achieved by using metadata information describing all available resources. It is therefore crucial to provide efficient metadata management architectures and frameworks.
Abstract The continuous increase of data volumes available from many sources raises new challenge... more Abstract The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computational intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.
The continuous increase of data volumes available from many sources raises new challenges for the... more The continuous increase of data volumes available from many sources raises new challenges for their effective understanding. Knowledge discovery in large data repositories involves processes and activities that are computationally intensive, collaborative, and distributed in nature. The Grid is a profitable infrastructure that can be effectively exploited for handling distributed data mining and knowledge discovery.
Grid computing has been the subject of many large national and international IT projects. However... more Grid computing has been the subject of many large national and international IT projects. However, not all goals of these projects have been achieved. In particular, the number of users lags behind the initial forecasts laid out by proponents of grid technologies. This underachievement may have led to claims that the grid concept as a whole is on its way to being replaced by Cloud computing and various X-as-a-Service approaches.
Abstract Service grids and desktop grids are great solutions for solving the available compute po... more Abstract Service grids and desktop grids are great solutions for solving the available compute power problem and helping to balance loads across network systems. However, with the objective to support new scientific communities that need extremely large numbers of resources, the solution could be to interconnect these two kinds of Grid systems into an integrated Service GridDesktop Grid (SG-DG) infrastructure.
In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical... more In this paper we describe an interactive, visual knowledge discovery tool for analyzing numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows a user to partition the search space by choosing shape or density according to criteria he deems optimal.
Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middle... more Abstract The Grid has rapidly moved from a toolkit-centered approach, composed of a set of middleware tools, toward a more application-oriented Service Oriented Architecture in which resources are exposed as services. The soaring number of available services advocates distributed and semantic-based discovery architectures. Distribution promotes scalability and fault-tolerance whereas semantics is required to provide for meaningful descriptions of services and support their efficient retrieval.
AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patt... more AbstrAct Knowledge discovery is a compute-and data-intensive process that allows for finding patterns, trends, and models in large datasets. The grid can be effectively exploited for deploying knowledge discovery applications because of the high performance it can offer and its distributed infrastructure. For effective use of grids in knowledge discovery, the development of middleware is critical to support data management, data transfer, data mining and knowledge representation.
Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. Th... more Abstract The bioremediation of contaminated soils is one of main strategies for site clean-up. The most important principle of bioremediation is that microorganisms (mainly bacteria) can be used to destroy hazardous contaminants or transform them into a less harmful form. Currently, we are facing this problem in the CABOTO project within the PCI ESPRIT framework.
Abstract Research and development work in the area of knowledge discovery and data mining concern... more Abstract Research and development work in the area of knowledge discovery and data mining concerns the study and de nition of techniques, methods, and tools for the extraction of novel, useful and not explicitely available patterns from large volumes of data. Data mining techniques originated from the usage of statistical analysis and machine learning techniques for the mining of patterns from databases. In the last few years new techniques and algorithms have been designed.
Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem so... more Abstract The Grid is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. A main factor that will drive the development and evolution of the Grid will be the necessity to face the enormous amount of data that any field of human activity is producing at a rate never seen before. This position paper attempts to forecast the ongoing evolution of computational Grids towards what we name next-generation Grids.
The number of available Internet services increases every day. This trend demands distributed mod... more The number of available Internet services increases every day. This trend demands distributed models and architectures to support scalability as well as semantics to enable efficient publication and retrieval of services. Two common approaches toward this goal are semantic overlay networks (SONs) and distributed hash tables (DHTs) with semantic extensions.
Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minim... more Dynamic Querying (DQ) is a technique adopted in unstructured Peer-to-Peer (P2P) networks to minimize the number of peers that is necessary to visit to reach the desired number of results. In this paper we introduce the use of the DQ technique in structured P2P networks. In particular, we present a P2P search algorithm, named DQ-DHT (Dynamic Querying over a Distributed Hash Table), to perform DQ-like searches over DHT-based overlays.
Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computi... more Abstract—Cloud computing systems provide large-scale infrastructures for high-performance computing that are “elastic” since they are able to adapt to user and application needs. Clouds are used through a service-oriented interface that implements the*-as-a-service paradigm to offer Cloud services on demand.
Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids ha... more Abstract Several systems adopting Peer-to-Peer (P2P) solutions for resource discovery in Grids have recently been proposed. This report looks at a P2P resource discovery framework aiming to manage various Grid resources and complex queries. Following the discussion on characteristics of Grid resources and related query requirements, a DHT-based framework leveraging different P2P resource discovery techniques is proposed.
Abstract We survey parallel programming models and languages using six criteria to assess their s... more Abstract We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should by easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs.
Analysis of data is a complex process that often involves remote resources (computers, software, ... more Analysis of data is a complex process that often involves remote resources (computers, software, databases, files, etc.) and people (analysts, professionals, end users). Recently, distributed data mining techniques are used to analyze dispersed data sets. An advancement in this research area comes from the use of mobile computing technology for supporting new data analysis techniques and new ways to discover knowledge from every place in which people operate.
This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a pa... more This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results.
Uploads
Papers by Domenico Talia