Recently, privacy issues have becomes important in data mining, especially when data is horizonta... more Recently, privacy issues have becomes important in data mining, especially when data is horizontally or vertically partitioned. For the vertically partitioned case, many data mining problems can be reduced to securely computing the scalar product. Among these problems, we can mention association rule mining over vertically partitioned data. Efficiency of a secure scalar product can be measured by the overhead of communication needed to ensure this security. Several solutions have been proposed for privacy preserving association rule mining in vertically partitioned data. But the main drawback of these solutions is the excessive overhead communication needed for ensuring data privacy. In this paper we propose a new secure scalar product with the aim to reduce the overhead communication.
Page 1. Papiers présentés à la conférence Renpar 2002 Yaya SLIMANI et Denis TRYSTRAM 2 juin 2004 ... more Page 1. Papiers présentés à la conférence Renpar 2002 Yaya SLIMANI et Denis TRYSTRAM 2 juin 2004 Page 2. ... L'ensemble des opérations est ordonné, et changer l'ordre des opérations change le programme, de même que changer l'ordre des notes change la mélodie. ...
... 81,00 Ajouter au panier le numéro de revue de TRYSTRAM Denis, SLIMANI Yahia, JEMNI Mohamed.... more ... 81,00 Ajouter au panier le numéro de revue de TRYSTRAM Denis, SLIMANI Yahia, JEMNI Mohamed. Date de parution : 03-2005 Support : Numéro de revue Langue : FRANÇAIS 286p. ... Gestion des ressources -O. Beaumont, V. Boudet, P.-F. Dutot, Y. Robert, D. Trystram. ...
Distributed query processing is fast becoming a reality. With the new emerging applications such ... more Distributed query processing is fast becoming a reality. With the new emerging applications such as the grid applications, distributed data processing becomes a complex undertaking due to the changes coming from both underlying networks and the requirements of grid-enabled databases. Clearly, without considering the network characteristics and the heterogeneity, the solution quality for distributed data processing may degrade. In this paper, we propose a generic cost-based query optimization to meet the requirements while taking network topology into consideration.
Proceedings of the Workshop on Parallel and Distributed Systems Testing, Analysis, and Debugging - PADTAD '11, 2011
Grids are now regarded as promising platforms for data and computation-intensive applications lik... more Grids are now regarded as promising platforms for data and computation-intensive applications like data mining. However, the exploration of such large-scale computing resources necessitates the development of new distributed algorithms. The major challenge facing the developers of distributed data mining algorithms is how to adjust the load imbalance that occurs during execution. This load imbalance is due to the dynamic nature of data mining algorithms (i.e. we cannot predict the load before execution) and the heterogeneity of Grid computing systems. In this paper, we propose a dynamic load balancing strategy for distributed association rule mining algorithms under a Grid computing environment. We evaluate the performance of the proposed strategy by the use of Grid'5000. A Grid infrastructure distributed in nine sites around France, for research in large-scale parallel and distributed systems.
La Recherche d'Information Distribuée (RID) constitue aujourd'hui un domaine d'investigation en p... more La Recherche d'Information Distribuée (RID) constitue aujourd'hui un domaine d'investigation en pleine effervescence. Afin de définir un processus de RID dans un système totalement distribué tel que les systèmes pair-à-pair, un intérêt particulier doit être porté à la phase de combinaison de résultats provenant de pairs autonomes, que nous appelons interclassement. Les approches classiques ne s'avèrent pas efficaces vu le manque de statistiques globales. Elles s'avèrent également trop généralistes pour s'adapter à des besoins particuliers. Toutes ces raisons nous ont incités à proposer un modèle d'interclassement de résultats qui tient compte des besoins de l'utilisateur (appelé modèle comportemental). ABSTRACT. Distributed Information Retrieval (DIR) is already an important area of investigation due to the importance of ongoing access to relevant information. In a completely distributed system such as peer-to-peer systems, an important interest should be attributed to the rank aggregation phase. Classical approaches miss effectiveness due to lack of global statistics. This leads us to seek for a new aggregation model that takes into account the user needs (called behavioral model).
2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07), 2007
ABSTRACT One of the principal motivations to use the grids computing and data grids comes from th... more ABSTRACT One of the principal motivations to use the grids computing and data grids comes from the applications using of large sets from data, for example, in high-energy physics or life science to improve the total output of the software environments used to carry these applications on the grids, data replication are deposited on various selected sites. In the field of the grids the majority of the strategies of replication of the data and scheduling of the jobs were tested by simulation. Several simulators of grids were born. One of the most simulators interesting for our study is the OptorSim tool. In this paper, we present an extension of the OptorSim simulator by a consistency management module of the replicas in the Data Grids. This extension corresponds to a hybrid approach of consistency, it inspired by the pessimistic and optimistic approaches of consistency. This suggested approach has two vocations, in the first time, it makes it possible to reduce the response times compared with the completely pessimistic approach, in the second time, it gives a good quality of service compared with the optimistic approach.
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 2011
With the important growth of requirements to analyze large amount of structured data such as chem... more With the important growth of requirements to analyze large amount of structured data such as chemical compounds, proteins structures, XML documents, to cite but a few, graph mining has become an attractive track and a real challenge in the data mining field. Among the various kinds of graph patterns, frequent subgraphs seem to be relevant in characterizing graphsets, discriminating different groups of sets, and classifying and clustering graphs. Because of the NP-Completeness of subgraph isomorphism test as well as the huge search space, fragment miners are exponential in runtime and/or memory consumption. In this paper we study a new polynomial projection operator named AC-Projection based on a key technique of constraint programming namely Arc Consistency (AC). This is intended to replace the use of the exponential subgraph isomorphism. We study the relevance of frequent AC-reduced graph patterns on classification and we prove that we can achieve an important performance gain without or with non-significant loss of discovered pattern's quality.
2010 International Conference on Information Retrieval & Knowledge Management (CAMP), 2010
Many approaches of terminology extraction make use of contextual information to acquire relations... more Many approaches of terminology extraction make use of contextual information to acquire relations between terms. The quality and the quantity of this information influence the accuracy of the terminology extractor. In this paper, we assume that logical structure of documents constitute a rich source of contextual information which can be used to infer semantic relations between terms and thus construct
We investigate the problem of optimizing distributed queries by using semijoins in order to minim... more We investigate the problem of optimizing distributed queries by using semijoins in order to minimize the amount of data communication between sites. The problem is reduced to that of finding an optimal semijoin sequence that locally fully reduces the relations referenced in a general query graph before processing the join operations.
Abstract—In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to gene... more Abstract—In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to generate ,non ,redundant ,rule associations. Using this adaptation, we can generate frequent closed itemsets that are more compact,than frequent itemsets used in Apriori approach. This adaptation has been experimented,on a set of datasets benchmarks. Keywords—Datamining, Frequent itemsets, Frequent closed
To avoid obtaining an unmanageable highly sized association rule sets-compounded with their low p... more To avoid obtaining an unmanageable highly sized association rule sets-compounded with their low precision-that often make the perusal of knowledge ineffective, the extraction and exploitation of compact and informative generic basis of association rules is a becoming a must. Moreover, they provide a powerful verification technique for hampering gene mis-annotating or badly clustering in the Unigene library. However, extracted generic basis is still oversized and their exploitation is impractical. Thus, providing critical nuggets of extra-valued knowledge is a compellingly addressable issue. To tackle such a drawback, we propose in this paper a novel approach, called EGEA (Evolutionary Gene Extraction Approach). Such approach aims to considerably reduce the quantity of knowledge, extracted from a gene expression dataset, presented to an expert. Thus, we use a genetic algorithm to select the more predictive set of genes related to patient situations. Once, the relevant attributes (genes) have been selected, they serve as an input for a second approach stage, i.e., extracting generic association rules from this reduced set of genes. The notably decrease of the generic association rule cardinality, extracted from the selected gene set, permits to improve the quality of knowledge exploitation. Carried out experiments on a benchmark dataset pointed out that among this set, there are genes which are previously unknown prognosis-associated genes. This may serve as molecular targets for new therapeutic strategies to repress the relapse of pediatric acute myeloid leukemia (AML).
The combinatorial nature of the Feature Selection problem has made the use of heuristic methods i... more The combinatorial nature of the Feature Selection problem has made the use of heuristic methods indispensable even for moderate dataset dimensions. Recently, several optimization paradigms emerged as attractive alternatives to classic heuristic based approaches. In this paper, we propose a new an adapted Particle Swarm Optimization for the exploration of the feature selection problem search space. In spite of the
Feature subset selection is an important preprocessing and guiding step for classification. The c... more Feature subset selection is an important preprocessing and guiding step for classification. The combinatorial nature of the problem have made the use of evolutionary and heuristic methods indispensble for the exploration of high dimensional problem search spaces. In this paper, a set of hybridization schemata of genetic algorithm with local search are investigated through a memetic framework. Empirical study compares
Recently, privacy issues have becomes important in data mining, especially when data is horizonta... more Recently, privacy issues have becomes important in data mining, especially when data is horizontally or vertically partitioned. For the vertically partitioned case, many data mining problems can be reduced to securely computing the scalar product. Among these problems, we can mention association rule mining over vertically partitioned data. Efficiency of a secure scalar product can be measured by the overhead of communication needed to ensure this security. Several solutions have been proposed for privacy preserving association rule mining in vertically partitioned data. But the main drawback of these solutions is the excessive overhead communication needed for ensuring data privacy. In this paper we propose a new secure scalar product with the aim to reduce the overhead communication.
Page 1. Papiers présentés à la conférence Renpar 2002 Yaya SLIMANI et Denis TRYSTRAM 2 juin 2004 ... more Page 1. Papiers présentés à la conférence Renpar 2002 Yaya SLIMANI et Denis TRYSTRAM 2 juin 2004 Page 2. ... L'ensemble des opérations est ordonné, et changer l'ordre des opérations change le programme, de même que changer l'ordre des notes change la mélodie. ...
... 81,00 Ajouter au panier le numéro de revue de TRYSTRAM Denis, SLIMANI Yahia, JEMNI Mohamed.... more ... 81,00 Ajouter au panier le numéro de revue de TRYSTRAM Denis, SLIMANI Yahia, JEMNI Mohamed. Date de parution : 03-2005 Support : Numéro de revue Langue : FRANÇAIS 286p. ... Gestion des ressources -O. Beaumont, V. Boudet, P.-F. Dutot, Y. Robert, D. Trystram. ...
Distributed query processing is fast becoming a reality. With the new emerging applications such ... more Distributed query processing is fast becoming a reality. With the new emerging applications such as the grid applications, distributed data processing becomes a complex undertaking due to the changes coming from both underlying networks and the requirements of grid-enabled databases. Clearly, without considering the network characteristics and the heterogeneity, the solution quality for distributed data processing may degrade. In this paper, we propose a generic cost-based query optimization to meet the requirements while taking network topology into consideration.
Proceedings of the Workshop on Parallel and Distributed Systems Testing, Analysis, and Debugging - PADTAD '11, 2011
Grids are now regarded as promising platforms for data and computation-intensive applications lik... more Grids are now regarded as promising platforms for data and computation-intensive applications like data mining. However, the exploration of such large-scale computing resources necessitates the development of new distributed algorithms. The major challenge facing the developers of distributed data mining algorithms is how to adjust the load imbalance that occurs during execution. This load imbalance is due to the dynamic nature of data mining algorithms (i.e. we cannot predict the load before execution) and the heterogeneity of Grid computing systems. In this paper, we propose a dynamic load balancing strategy for distributed association rule mining algorithms under a Grid computing environment. We evaluate the performance of the proposed strategy by the use of Grid'5000. A Grid infrastructure distributed in nine sites around France, for research in large-scale parallel and distributed systems.
La Recherche d'Information Distribuée (RID) constitue aujourd'hui un domaine d'investigation en p... more La Recherche d'Information Distribuée (RID) constitue aujourd'hui un domaine d'investigation en pleine effervescence. Afin de définir un processus de RID dans un système totalement distribué tel que les systèmes pair-à-pair, un intérêt particulier doit être porté à la phase de combinaison de résultats provenant de pairs autonomes, que nous appelons interclassement. Les approches classiques ne s'avèrent pas efficaces vu le manque de statistiques globales. Elles s'avèrent également trop généralistes pour s'adapter à des besoins particuliers. Toutes ces raisons nous ont incités à proposer un modèle d'interclassement de résultats qui tient compte des besoins de l'utilisateur (appelé modèle comportemental). ABSTRACT. Distributed Information Retrieval (DIR) is already an important area of investigation due to the importance of ongoing access to relevant information. In a completely distributed system such as peer-to-peer systems, an important interest should be attributed to the rank aggregation phase. Classical approaches miss effectiveness due to lack of global statistics. This leads us to seek for a new aggregation model that takes into account the user needs (called behavioral model).
2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07), 2007
ABSTRACT One of the principal motivations to use the grids computing and data grids comes from th... more ABSTRACT One of the principal motivations to use the grids computing and data grids comes from the applications using of large sets from data, for example, in high-energy physics or life science to improve the total output of the software environments used to carry these applications on the grids, data replication are deposited on various selected sites. In the field of the grids the majority of the strategies of replication of the data and scheduling of the jobs were tested by simulation. Several simulators of grids were born. One of the most simulators interesting for our study is the OptorSim tool. In this paper, we present an extension of the OptorSim simulator by a consistency management module of the replicas in the Data Grids. This extension corresponds to a hybrid approach of consistency, it inspired by the pessimistic and optimistic approaches of consistency. This suggested approach has two vocations, in the first time, it makes it possible to reduce the response times compared with the completely pessimistic approach, in the second time, it gives a good quality of service compared with the optimistic approach.
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 2011
With the important growth of requirements to analyze large amount of structured data such as chem... more With the important growth of requirements to analyze large amount of structured data such as chemical compounds, proteins structures, XML documents, to cite but a few, graph mining has become an attractive track and a real challenge in the data mining field. Among the various kinds of graph patterns, frequent subgraphs seem to be relevant in characterizing graphsets, discriminating different groups of sets, and classifying and clustering graphs. Because of the NP-Completeness of subgraph isomorphism test as well as the huge search space, fragment miners are exponential in runtime and/or memory consumption. In this paper we study a new polynomial projection operator named AC-Projection based on a key technique of constraint programming namely Arc Consistency (AC). This is intended to replace the use of the exponential subgraph isomorphism. We study the relevance of frequent AC-reduced graph patterns on classification and we prove that we can achieve an important performance gain without or with non-significant loss of discovered pattern's quality.
2010 International Conference on Information Retrieval & Knowledge Management (CAMP), 2010
Many approaches of terminology extraction make use of contextual information to acquire relations... more Many approaches of terminology extraction make use of contextual information to acquire relations between terms. The quality and the quantity of this information influence the accuracy of the terminology extractor. In this paper, we assume that logical structure of documents constitute a rich source of contextual information which can be used to infer semantic relations between terms and thus construct
We investigate the problem of optimizing distributed queries by using semijoins in order to minim... more We investigate the problem of optimizing distributed queries by using semijoins in order to minimize the amount of data communication between sites. The problem is reduced to that of finding an optimal semijoin sequence that locally fully reduces the relations referenced in a general query graph before processing the join operations.
Abstract—In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to gene... more Abstract—In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to generate ,non ,redundant ,rule associations. Using this adaptation, we can generate frequent closed itemsets that are more compact,than frequent itemsets used in Apriori approach. This adaptation has been experimented,on a set of datasets benchmarks. Keywords—Datamining, Frequent itemsets, Frequent closed
To avoid obtaining an unmanageable highly sized association rule sets-compounded with their low p... more To avoid obtaining an unmanageable highly sized association rule sets-compounded with their low precision-that often make the perusal of knowledge ineffective, the extraction and exploitation of compact and informative generic basis of association rules is a becoming a must. Moreover, they provide a powerful verification technique for hampering gene mis-annotating or badly clustering in the Unigene library. However, extracted generic basis is still oversized and their exploitation is impractical. Thus, providing critical nuggets of extra-valued knowledge is a compellingly addressable issue. To tackle such a drawback, we propose in this paper a novel approach, called EGEA (Evolutionary Gene Extraction Approach). Such approach aims to considerably reduce the quantity of knowledge, extracted from a gene expression dataset, presented to an expert. Thus, we use a genetic algorithm to select the more predictive set of genes related to patient situations. Once, the relevant attributes (genes) have been selected, they serve as an input for a second approach stage, i.e., extracting generic association rules from this reduced set of genes. The notably decrease of the generic association rule cardinality, extracted from the selected gene set, permits to improve the quality of knowledge exploitation. Carried out experiments on a benchmark dataset pointed out that among this set, there are genes which are previously unknown prognosis-associated genes. This may serve as molecular targets for new therapeutic strategies to repress the relapse of pediatric acute myeloid leukemia (AML).
The combinatorial nature of the Feature Selection problem has made the use of heuristic methods i... more The combinatorial nature of the Feature Selection problem has made the use of heuristic methods indispensable even for moderate dataset dimensions. Recently, several optimization paradigms emerged as attractive alternatives to classic heuristic based approaches. In this paper, we propose a new an adapted Particle Swarm Optimization for the exploration of the feature selection problem search space. In spite of the
Feature subset selection is an important preprocessing and guiding step for classification. The c... more Feature subset selection is an important preprocessing and guiding step for classification. The combinatorial nature of the problem have made the use of evolutionary and heuristic methods indispensble for the exploration of high dimensional problem search spaces. In this paper, a set of hybridization schemata of genetic algorithm with local search are investigated through a memetic framework. Empirical study compares
Uploads
Papers by Y. Slimani