Academia.eduAcademia.edu

Ontology-Based Meta-Mining of Knowledge Discovery Workflows

2011, Studies in Computational Intelligence

This chapter describes a principled approach to meta-learning that has three distinctive features. First, whereas most previous work on meta-learning focused exclusively on the learning task, our approach applies meta-learning to the full knowledge discovery process and is thus more aptly referred to as meta-mining. Second, traditional meta-learning regards learning algorithms as black boxes and essentially correlates properties of their input (data) with the performance of their output (learned model). We propose to tear open the black box and analyse algorithms in terms of their core components, their underlying assumptions, the cost functions and optimization strategies they use, and the models and decision boundaries they generate. Third, to ground metamining on a declarative representation of the data mining (dm) process and its components, we built a DM ontology and knowledge base using the Web Ontology Language (owl).

Ontology-Based Meta-Mining of Knowledge Discovery Workflows Melanie Hilario, Phong Nguyen, Huyen Do, Adam Woznica, and Alexandros Kalousis Artificial Intelligence Laboratory, University of Geneva Abstract. This chapter describes a principled approach to meta-learning that has three distinctive features. First, whereas most previous work on meta-learning focused exclusively on the learning task, our approach applies meta-learning to the full knowledge discovery process and is thus more aptly referred to as meta-mining. Second, traditional meta-learning regards learning algorithms as black boxes and essentially correlates properties of their input (data) with the performance of their output (learned model). We propose to tear open the black box and analyse algorithms in terms of their core components, their underlying assumptions, the cost functions and optimization strategies they use, and the models and decision boundaries they generate. Third, to ground metamining on a declarative representation of the data mining (dm) process and its components, we built a DM ontology and knowledge base using the Web Ontology Language (owl). The Data Mining Optimization Ontology (dmop, pronounced deemope)) provides a unified conceptual framework for analysing dm tasks, algorithms, models, datasets, workflows and performance metrics, as well as their relationships. The dm knowledge base uses concepts from dmop to describe existing data mining algorithms and their implementations in major dm software packages. Meta-data collected from data mining experiments are also described in terms of concepts from the ontology and linked to algorithm and operator descriptions in the knowledge base; they are then stored in data mining experiment data bases to serve as training and evaluation data for the meta-miner. These three features together lay the groundwork for what we call deep or semantic meta-mining, i.e., dm process or workflow mining that is driven simultaneously by meta-data and by the collective expertise of data miners embodied in the data mining ontology and knowledge base. In Section 1, we review the state of the art in the fields of metalearning and data mining ontologies; at the same time, we motivate the need for ontology-based meta-mining and distinguish our approach from related work in these two areas. Section 2 gives a detailed description of dmop, while Section 3 introduces a novel method for ontology-based discovery of generalized patterns from data mining workflows. Section 4 reports on proof-of-concept experiments conducted to gauge the efficacy of dmop-based workflow mining, and Section 5 concludes. N. Jankowski et al. (Eds.): Meta-Learning in Computational Intelligence, SCI 358, pp. 273–315. springerlink.com © Springer-Verlag Berlin Heidelberg 2011 274 1 M. Hilario et al. State of the Art and Motivation The work described in this chapter draws together two research streams that have remained independent so far—meta-learning and data mining ontology construction. This section reviews the state of the art in both areas and points out the novelty of our approach with respect to each. 1.1 From Meta-learning to Meta-mining Meta-learning is learning to learn: in computer science, it is the application of machine learning techniques to meta-data describing past learning experience in order to modify some aspect of the learning process and improve the performance of the resulting model [29,3,13,78]. Meta-learning thus defined applies specifically to learning, which is only one—albeit the central—step in the data mining (or knowledge discovery) process1 . The quality of mined knowledge depends as much on other steps such as data cleaning, data selection, feature extraction and selection, model pruning, and model aggregation. We still lack an understanding of how the different components of the data mining process interact; there are no clear guidelines except for high-level process models such as crisp-dm [18]. Process-related issues, such as the composition of data mining operations and the need for a methodology of data mining, are among the ten data mining challenges discussed in [80]. In response to this challenge, a number of systems have been designed to provide user support throughout the different phases of the kd process (Serban et al., 2010). Most of them rely on a planning approach and produce workflows that are valid but not necessarily optimal with respect to a given cost function such as predictive accuracy. This is the case of the planner-based intelligent discovery assistant (ida) implemented in the e-lico project2 . To allow the planner to select the most promising workflows from an often huge set of candidates, an ontology-based meta-learner mines records of past data mining experiments to extract models and patterns that will suggest which dm algorithms should be used together in order to achieve the best results for a given problem, data set and cost function. The e-lico ida therefore self-improves as a result of meta-mining, loosely defined as kd process-oriented meta-learning. Meta-mining extends the meta-learning approach to the full knowledge discovery process: in the same way that meta-learning is aimed at optimizing the results of learning, meta-mining optimizes the results of data mining by taking into account the interdependencies and interactions between the different process operations, and in particular between learning and the different pre/post-processing steps. In this sense, metamining subsumes meta-learning and must address all the open issues regarding meta-learning. 1 2 We follow current usage in treating data mining and knowledge discovery as synonyms, using the terms learning or modelling to refer to what Fayyad et al. [25] called the data mining phase of the knowledge discovery process. http://www.e-lico.org