Skip to main content

Sana Ben Hamida

Followers

5

Following

14

Mentions

1

Public Views

Sana Ben Hamida is an associate professor at Paris Nanterre University and an associate researcher at the computer science laboratory (LAMSADE) of Paris Dauphine University. Her main research topics are evolutionary algorithms, machine learning and related applications. Much of her work focuses on problems related to scaling evolutionary learning techniques for massive data. Sana Ben Hamida is also interested in the application of evolutionary algorithms to solve supervised and unsupervised learning problems in the fields of biology and biodiversity.

less

Interests

Uploads

Papers by Sana Ben Hamida

Evolutionary Algorithms: Handling Constraints and Application

A Les fonctions tests pour l'optimisation sous contraintes 201 B Des notions d'optique pour l'app... more A Les fonctions tests pour l'optimisation sous contraintes 201 B Des notions d'optique pour l'application du sytème laser 207 º¼º½ Ä ÐÙÑ Ö ÚÙ ÓÑÑ ÙÒ ÓÒ Ð ØÖÓÑ Ò Ø ÕÙ º º º º º º º º º º º º º º ¾¼ º¼º¾ Ä³ ÒØ Ò× Ø ÐÙÑ Ò Ù× º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º ¾¼

Evolutionary Algorithms

Encyclopedia of Computational Neuroscience

Evolutionary Algorithms

The IMA Volumes in Mathematics and its Applications, 1999

The EASY-GOING deconvolution (EGdeconv) program is extended to enable fast and automated fitting ... more The EASY-GOING deconvolution (EGdeconv) program is extended to enable fast and automated fitting of multiple quantum magic angle spinning (MQMAS) spectra guided by evolutionary algorithms. We implemented an analytical crystallite excitation model for spectrum simulation. Currently these efficiencies are limited to two-pulse and z-filtered 3QMAS spectra of spin 3/2 and 5/2 nuclei, whereas for higher spinquantum numbers ideal excitation is assumed. The analytical expressions are explained in full to avoid ambiguity and facilitate others to use them. The EGdeconv program can fit interaction parameter distributions. It currently includes a Gaussian distribution for the chemical shift and an (extended) Czjzek distribution for the quadrupolar interaction. We provide three case studies to illustrate EGdeconv's capabilities for fitting MQMAS spectra. The EGdeconv program is available as is on our website http:// egdeconv.science.ru.nl for 64-bit Linux operating systems.

Generic GA-PPI-Net: Generic Evolutionary Algorithm to Detect Semantic and Topological Biological Communities

Proceedings of the 15th International Conference on Software Technologies, 2020

Community detection aims to identify topological structures and discover patterns in complex netw... more Community detection aims to identify topological structures and discover patterns in complex networks. It presents an important problem of great significance in many fields. In this paper, we are interested in the detection of communities in biological networks. These networks represent protein-protein or gene-gene interactions which corresponds to a set of proteins or genes that collaborate at the same cellular function. The goal is to identify such semantic and/or topological communities from gene annotation sources such as Gene Ontology. We propose a Genetic Algorithm (GA) based technique as a clustering approach to detect communities from biological networks. For this purpose, we introduce four specific components to the GA: a fitness function based on a similarity measure and the interaction value between proteins or genes, a solution for representing a community with dynamic size, an heuristic crossover to strengthen links in the communities and a specific mutation operator. Experimental results show the ability of our Genetic Algorithm to detect communities of genes that are semantically similar or/and interacting.

L'impact du choix des données d'apprentissage dans la génération des IDS par la programmation génétique

Le Centre pour la Communication Scientifique Directe - HAL - Diderot, Mar 1, 2008

Evolutionary Algorithms

CRC Press eBooks, Jun 23, 2014

A new adaptive sampling approach for Genetic Programming

2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS)

Genetic Programming (GP) is afflicted by an excessive computation time that is more exacerbated w... more Genetic Programming (GP) is afflicted by an excessive computation time that is more exacerbated with data intensive problems. This issue has been addressed with different approaches such as sampling techniques or distributed implementations. In this paper, we focus on dynamic sampling algorithms that mostly give to GP learner a new sample each generation. In so doing, individuals do not have enough time to extract the hidden knowledge. We propose adaptive sampling which is halfway between static and dynamic methods. It is a flexible approach applicable to any dynamic sampling. We implemented some variants based on controlling re-sampling frequency that we experimented to solve KDD intrusion detection problem with GP. The experimental study demonstrates how it preserves the power of dynamic sampling with possible improvements in learning time and quality for some sampling algorithms. This work opens many new relevant extension paths.

Extending DEAP with Active Sampling for Evolutionary Supervised Learning

Proceedings of the 16th International Conference on Software Technologies, 2021

Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task ... more Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task for supervised machine learning techniques. It is important to provide these techniques additional tools to improve their efficiency when dealing with such data. A promising strategy is to reduce the size of the training sample seen by the learner and to change it regularly along the learning process. Such strategy known as active learning, is suitable for iterative learning algorithms such as Evolutionary Algorithms. This paper presents some sampling techniques for active learning and how they can be applied in a hierarchical way. Then, it details how these techniques could be implemented into DEAP, a Python framework for Evolutionary Algorithms. A comparative study demonstrates how active learning improve the evolutionary learning on two data bases for detecting pulsars and occupancy in buildings.

Multi-objective Optimization

Evolutionary Algorithms, 2017

An investor composes a portfolio of stocks in order to obtain a high return on his or her investm... more An investor composes a portfolio of stocks in order to obtain a high return on his or her investment with a small risk of incurring a loss; an oncologist prescribes radiotherapy to a cancer patient so as to destroy the tumor without causing damage to healthy organs; an airline manager constructs schedules that incur small salary costs and that ensure smooth operation even in the case of disruptions. All three decision makers (DMs) are in a similar situation-they need to make a decision trying to achieve several conflicting goals at the same time: The highest return investments are in general the riskiest ones, tumors can always be destroyed at the expense of irreversible damage to healthy organs, and the cheapest schedules to operate are ones that leave as little as possible time between flights, wreaking havoc to operations in the case of unexpected delays. Moreover, the investor, the oncologist, and the airline manager are all in a situation where the number of available options or alternatives is very large or even infinite. There are infinitely many ways to invest money and infinitely many possible radiotherapy treatments, but the number of feasible crew schedules is finite, albeit astronomical in practice. The alternatives are therefore described by constraints, rather than explicitly known: the sums invested in every stock must equal the total invested; the radiotherapy treatment must meet physical and clinical constraints; crew schedules must ensure that each flight has exactly one crew assigned to operate it. Mathematically, the alternatives are described by vectors in variable or decision space; the set of all vectors satisfying the constraints is called the feasible set in decision space. The consequences or attributes of the alternatives are described as vectors in objective or outcome space, where outcome (objective) vectors are a function of the decision (variable) vectors. The set of outcomes corresponding to feasible alternatives is called Articles

GA-PPI-Net: A Genetic Algorithm for Community Detection in Protein-Protein Interaction Networks

Community detection has become an important research direction for data mining in complex network... more Community detection has become an important research direction for data mining in complex networks. It aims to identify topological structures and discover patterns in complex networks, which presents an important problem of great significance. In this paper, we are interested in the detection of communities in the Protein-Protein or Gene-gene Interaction (PPI) networks. These networks represent a set of proteins or genes that collaborate at the same cellular function. The goal is to identify such semantic and topological communities from gene annotation sources such as Gene Ontology. We propose a Genetic Algorithm (GA) based approach to detect communities having different sizes from PPI networks. For this purpose, we introduce three specific components to the GA: a fitness function based on a similarity measure and the interaction value between proteins or genes, a solution for representing a community with dynamic size and a specific mutation operator. In the computational tests c...

Genetic Programming for Machine Learning

This paper presents a proof of concept. It shows that Genetic Programming (GP) can be used as a "... more This paper presents a proof of concept. It shows that Genetic Programming (GP) can be used as a "universal" machine learning method, that integrates several different algorithms, improving their accuracy. The system we propose, called Universal Genetic Programming (UGP) works by defining an initial population of programs, that contains the models produced by several different machine learning algorithms. The use of elitism allows UGP to return as a final solution the best initial model, in case it is not able to evolve a better one. The use of genetic operators driven by semantic awareness is likely to improve the initial models, by combining and mutating them. On three complex real-life problems, we present experimental evidence that UGP is actually able to improve the models produced by all the studied machine learning algorithms in isolation.

Trends of Evolutionary Machine Learning to Address Big Data Mining

Lecture Notes in Business Information Processing, 2021

Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task ... more Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task for supervised machine learning techniques. It is important to provide these techniques additional tools to improve their efficiency when dealing with such data. A promising strategy is to reduce the size of the training sample seen by the learner and to change it regularly along the learning process. Such strategy known as active learning, is suitable for iterative learning algorithms such as Evolutionary Algorithms. This paper presents some sampling techniques for active learning and how they can be applied in a hierarchical way. Then, it details how these techniques could be implemented into DEAP, a Python framework for Evolutionary Algorithms. A comparative study demonstrates how active learning improve the evolutionary learning on two data bases for detecting pulsars and occupancy in buildings.

Genetic Algorithm for Community Detection in Biological Networks

Procedia Computer Science, 2018

We are interested in the detection of communities in biological networks. We focus more precisely... more We are interested in the detection of communities in biological networks. We focus more precisely on gene interaction networks. They represent protein-protein or gene-gene interactions. A community in such networks corresponds to a set of proteins or genes that collaborate at the same cellular function. Our goal is to identify such network or community from gene annotation sources such as Gene Ontology (GO). In this paper, we propose a Genetic Algorithm (GA) based approach to discover communities in a gene interaction network. Special solution coding and mutation operator are introduced. Otherwise, we propose a specific fitness function based on similarity measure and interaction value between genes. Experiments on real data extracted from the well-known Kyoto Encyclopedia of Genes and Genomes (KEGG) database show the ability of the proposed method to successfully detect existing or even new communities.

Scale Genetic Programming for large Data Sets: Case of Higgs Bosons Classification

Procedia Computer Science, 2018

Extract knowledge and significant information from very large data sets is a main topic in Data S... more Extract knowledge and significant information from very large data sets is a main topic in Data Science, bringing the interest of researchers in machine learning field. Several machine learning techniques have proven effective to deal with massive data like Deep Neuronal Networks. Evolutionary algorithms are considered not well suitable for such problems because of their relatively high computational cost. This work is an attempt to prove that, with some extensions, evolutionary algorithms could be an interesting solution to learn from very large data sets. We propose the use of the Cartesian Genetic Programming (CGP) as meta-heuristic approach to learn from the Higgs big data set. CGP is extended with an active sampling technique in order to help the algorithm to deal with the mass of the provided data. The proposed method is able to take up the challenge of dealing with the complete benchmark data set of 11 million events and produces satisfactory preliminary results.

Hierarchical Data Topology Based Selection for Large Scale Learning

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), 2016

The amount of available data for data mining and knowledge discovery continues to grow very fast ... more The amount of available data for data mining and knowledge discovery continues to grow very fast with the era of Big Data. Genetic Programming algorithms (GP), that are efficient machine learning techniques, are face up to a new challenge that is to deal with the mass of the provided data. Active Sampling, already used for Active Learning, might be a good solution to improve the Evolutionary Algorithms (EA) training from very big data sets. This paper investigates the adaptation of Topology Based Selection (TBS) to face massive learning datasets by means of Hierarchical Sampling. We propose to combine the Random Subset Selection (RSS) with the TBS to create the RSS-TBS method. Two variants are implemented and applied to solve the KDD intrusion detection problem. They are compared to the original RSS and TBS techniques. The experimental results show that the important computational cost generated by original TBS when applied to large datasets can be lightened with the Hierarchical Sampling.

Chapitre InTech

A logarithmic mutation operator to solve constrained optimization problems

Hybrid Method for Optimal Quantization of the Normal Distribution

Open Transactions on Information Processing, 2014

Quantization of a continuous-value signal into discrete form is a standard task in all analog/dig... more Quantization of a continuous-value signal into discrete form is a standard task in all analog/digital devices commonly used to solve numerical problems in finance. In this paper, we consider quantization of the Normal distribution. We suggest an hybrid technique based on the evolutionary optimization and the Stochastic Gradient for obtaining an optimal L p-quantizer of a multidimensional random variable. First, we present the classical gradient-based approach used up to now to find a near optimal L p-quantizer which is frequently used to solve some high dimensional problems arising in finance. Then, we give an algorithm that permits to deal with the problem in the evolutionary optimization framework. Otherwise, to improve the capacity of the algorithm to fine-tune the best found solutions, we propose an hybrid method combining the two techniques. The objective of the hybrid method is to allow an powerful exploration and exploitation of the problem search space. The effectiveness of the proposed method is demonstrated throw numerical experiments.

Adaptive sampling for active learning with genetic programming

Cognitive Systems Research

Active learning is a machine learning paradigm allowing to decide which inputs to use for trainin... more Active learning is a machine learning paradigm allowing to decide which inputs to use for training. It is introduced to Genetic Programming (GP) essentially thanks to the dynamic data sampling, used to address some known issues such as the computational cost, the over-fitting problem and the imbalanced databases. The traditional dynamic sampling for GP gives to the algorithm a new sample periodically, often each generation, without considering the state of the evolution. In so doing, individuals do not have enough time to extract the hidden knowledge. An alternative approach is to use some information about the learning state to adapt the periodicity of the training data change. In this work, we propose an adaptive sampling strategy for classification tasks based on the state of solved fitness cases throughout learning. It is a flexible approach that could be applied with any dynamic sampling. We implemented some sampling algorithms extended with dynamic and adaptive controlling re-sampling frequency. We experimented them to solve the KDD intrusion detection and the Adult incomes prediction problems with GP. The experimental study demonstrates how the sampling frequency control preserves the power of dynamic sampling with possible improvements in learning time and quality. We also demonstrate that adaptive sampling can be an alternative to multi-level sampling. This work opens many new relevant extension paths.

Liage des données par les systèmes de recommandation intelligents dans une démarche d'optimisation de la qualité des données

Le Centre pour la Communication Scientifique Directe - HAL - memSIC, Nov 27, 2021

Une des phases importantes dans une démarche d'optimisation de la qualité des données d'une base ... more Une des phases importantes dans une démarche d'optimisation de la qualité des données d'une base est le liage des données. Le liage des données s'intéresseà détecter les descriptions référant au même objet du monde réel (e.g. même personne, même livre) afin de les nettoyer [Saïs and Thomopoulos, 2016]. Ce problème est fréquemment rencontré dans le domaine des ventes indirectes réalisées au travers de revendeurs (cadre de cetteétude) retournant des données clients souvent redondantes. L'objectif de ce travail est de fournir un outil performant pour le liage des données basé sur les systèmes de recommandation et l'intelligence artificielle dans un but d'optimisation de la qualité des données. Nous proposons un système hybride combinant un système de recommandation basé sur le contenu avec un système de recommandation basé sur le filtrage collaboratif pour un liage optimal des données clients.

Evolutionary Algorithms: Handling Constraints and Application

A Les fonctions tests pour l'optimisation sous contraintes 201 B Des notions d'optique pour l'app... more A Les fonctions tests pour l'optimisation sous contraintes 201 B Des notions d'optique pour l'application du sytème laser 207 º¼º½ Ä ÐÙÑ Ö ÚÙ ÓÑÑ ÙÒ ÓÒ Ð ØÖÓÑ Ò Ø ÕÙ º º º º º º º º º º º º º º ¾¼ º¼º¾ Ä³ ÒØ Ò× Ø ÐÙÑ Ò Ù× º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º º ¾¼

Evolutionary Algorithms

Encyclopedia of Computational Neuroscience

Evolutionary Algorithms

The IMA Volumes in Mathematics and its Applications, 1999

The EASY-GOING deconvolution (EGdeconv) program is extended to enable fast and automated fitting ... more The EASY-GOING deconvolution (EGdeconv) program is extended to enable fast and automated fitting of multiple quantum magic angle spinning (MQMAS) spectra guided by evolutionary algorithms. We implemented an analytical crystallite excitation model for spectrum simulation. Currently these efficiencies are limited to two-pulse and z-filtered 3QMAS spectra of spin 3/2 and 5/2 nuclei, whereas for higher spinquantum numbers ideal excitation is assumed. The analytical expressions are explained in full to avoid ambiguity and facilitate others to use them. The EGdeconv program can fit interaction parameter distributions. It currently includes a Gaussian distribution for the chemical shift and an (extended) Czjzek distribution for the quadrupolar interaction. We provide three case studies to illustrate EGdeconv's capabilities for fitting MQMAS spectra. The EGdeconv program is available as is on our website http:// egdeconv.science.ru.nl for 64-bit Linux operating systems.

Generic GA-PPI-Net: Generic Evolutionary Algorithm to Detect Semantic and Topological Biological Communities

Proceedings of the 15th International Conference on Software Technologies, 2020

Community detection aims to identify topological structures and discover patterns in complex netw... more Community detection aims to identify topological structures and discover patterns in complex networks. It presents an important problem of great significance in many fields. In this paper, we are interested in the detection of communities in biological networks. These networks represent protein-protein or gene-gene interactions which corresponds to a set of proteins or genes that collaborate at the same cellular function. The goal is to identify such semantic and/or topological communities from gene annotation sources such as Gene Ontology. We propose a Genetic Algorithm (GA) based technique as a clustering approach to detect communities from biological networks. For this purpose, we introduce four specific components to the GA: a fitness function based on a similarity measure and the interaction value between proteins or genes, a solution for representing a community with dynamic size, an heuristic crossover to strengthen links in the communities and a specific mutation operator. Experimental results show the ability of our Genetic Algorithm to detect communities of genes that are semantically similar or/and interacting.

L'impact du choix des données d'apprentissage dans la génération des IDS par la programmation génétique

Le Centre pour la Communication Scientifique Directe - HAL - Diderot, Mar 1, 2008

Evolutionary Algorithms

CRC Press eBooks, Jun 23, 2014

A new adaptive sampling approach for Genetic Programming

2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS)

Genetic Programming (GP) is afflicted by an excessive computation time that is more exacerbated w... more Genetic Programming (GP) is afflicted by an excessive computation time that is more exacerbated with data intensive problems. This issue has been addressed with different approaches such as sampling techniques or distributed implementations. In this paper, we focus on dynamic sampling algorithms that mostly give to GP learner a new sample each generation. In so doing, individuals do not have enough time to extract the hidden knowledge. We propose adaptive sampling which is halfway between static and dynamic methods. It is a flexible approach applicable to any dynamic sampling. We implemented some variants based on controlling re-sampling frequency that we experimented to solve KDD intrusion detection problem with GP. The experimental study demonstrates how it preserves the power of dynamic sampling with possible improvements in learning time and quality for some sampling algorithms. This work opens many new relevant extension paths.

Extending DEAP with Active Sampling for Evolutionary Supervised Learning

Proceedings of the 16th International Conference on Software Technologies, 2021

Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task ... more Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task for supervised machine learning techniques. It is important to provide these techniques additional tools to improve their efficiency when dealing with such data. A promising strategy is to reduce the size of the training sample seen by the learner and to change it regularly along the learning process. Such strategy known as active learning, is suitable for iterative learning algorithms such as Evolutionary Algorithms. This paper presents some sampling techniques for active learning and how they can be applied in a hierarchical way. Then, it details how these techniques could be implemented into DEAP, a Python framework for Evolutionary Algorithms. A comparative study demonstrates how active learning improve the evolutionary learning on two data bases for detecting pulsars and occupancy in buildings.

Multi-objective Optimization

Evolutionary Algorithms, 2017

An investor composes a portfolio of stocks in order to obtain a high return on his or her investm... more An investor composes a portfolio of stocks in order to obtain a high return on his or her investment with a small risk of incurring a loss; an oncologist prescribes radiotherapy to a cancer patient so as to destroy the tumor without causing damage to healthy organs; an airline manager constructs schedules that incur small salary costs and that ensure smooth operation even in the case of disruptions. All three decision makers (DMs) are in a similar situation-they need to make a decision trying to achieve several conflicting goals at the same time: The highest return investments are in general the riskiest ones, tumors can always be destroyed at the expense of irreversible damage to healthy organs, and the cheapest schedules to operate are ones that leave as little as possible time between flights, wreaking havoc to operations in the case of unexpected delays. Moreover, the investor, the oncologist, and the airline manager are all in a situation where the number of available options or alternatives is very large or even infinite. There are infinitely many ways to invest money and infinitely many possible radiotherapy treatments, but the number of feasible crew schedules is finite, albeit astronomical in practice. The alternatives are therefore described by constraints, rather than explicitly known: the sums invested in every stock must equal the total invested; the radiotherapy treatment must meet physical and clinical constraints; crew schedules must ensure that each flight has exactly one crew assigned to operate it. Mathematically, the alternatives are described by vectors in variable or decision space; the set of all vectors satisfying the constraints is called the feasible set in decision space. The consequences or attributes of the alternatives are described as vectors in objective or outcome space, where outcome (objective) vectors are a function of the decision (variable) vectors. The set of outcomes corresponding to feasible alternatives is called Articles

GA-PPI-Net: A Genetic Algorithm for Community Detection in Protein-Protein Interaction Networks

Community detection has become an important research direction for data mining in complex network... more Community detection has become an important research direction for data mining in complex networks. It aims to identify topological structures and discover patterns in complex networks, which presents an important problem of great significance. In this paper, we are interested in the detection of communities in the Protein-Protein or Gene-gene Interaction (PPI) networks. These networks represent a set of proteins or genes that collaborate at the same cellular function. The goal is to identify such semantic and topological communities from gene annotation sources such as Gene Ontology. We propose a Genetic Algorithm (GA) based approach to detect communities having different sizes from PPI networks. For this purpose, we introduce three specific components to the GA: a fitness function based on a similarity measure and the interaction value between proteins or genes, a solution for representing a community with dynamic size and a specific mutation operator. In the computational tests c...

Genetic Programming for Machine Learning

This paper presents a proof of concept. It shows that Genetic Programming (GP) can be used as a "... more This paper presents a proof of concept. It shows that Genetic Programming (GP) can be used as a "universal" machine learning method, that integrates several different algorithms, improving their accuracy. The system we propose, called Universal Genetic Programming (UGP) works by defining an initial population of programs, that contains the models produced by several different machine learning algorithms. The use of elitism allows UGP to return as a final solution the best initial model, in case it is not able to evolve a better one. The use of genetic operators driven by semantic awareness is likely to improve the initial models, by combining and mutating them. On three complex real-life problems, we present experimental evidence that UGP is actually able to improve the models produced by all the studied machine learning algorithms in isolation.

Trends of Evolutionary Machine Learning to Address Big Data Mining

Lecture Notes in Business Information Processing, 2021

Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task ... more Complexity, variety and large sizes of data bases make the Knowledge extraction a difficult task for supervised machine learning techniques. It is important to provide these techniques additional tools to improve their efficiency when dealing with such data. A promising strategy is to reduce the size of the training sample seen by the learner and to change it regularly along the learning process. Such strategy known as active learning, is suitable for iterative learning algorithms such as Evolutionary Algorithms. This paper presents some sampling techniques for active learning and how they can be applied in a hierarchical way. Then, it details how these techniques could be implemented into DEAP, a Python framework for Evolutionary Algorithms. A comparative study demonstrates how active learning improve the evolutionary learning on two data bases for detecting pulsars and occupancy in buildings.

Genetic Algorithm for Community Detection in Biological Networks

Procedia Computer Science, 2018

We are interested in the detection of communities in biological networks. We focus more precisely... more We are interested in the detection of communities in biological networks. We focus more precisely on gene interaction networks. They represent protein-protein or gene-gene interactions. A community in such networks corresponds to a set of proteins or genes that collaborate at the same cellular function. Our goal is to identify such network or community from gene annotation sources such as Gene Ontology (GO). In this paper, we propose a Genetic Algorithm (GA) based approach to discover communities in a gene interaction network. Special solution coding and mutation operator are introduced. Otherwise, we propose a specific fitness function based on similarity measure and interaction value between genes. Experiments on real data extracted from the well-known Kyoto Encyclopedia of Genes and Genomes (KEGG) database show the ability of the proposed method to successfully detect existing or even new communities.

Scale Genetic Programming for large Data Sets: Case of Higgs Bosons Classification

Procedia Computer Science, 2018

Extract knowledge and significant information from very large data sets is a main topic in Data S... more Extract knowledge and significant information from very large data sets is a main topic in Data Science, bringing the interest of researchers in machine learning field. Several machine learning techniques have proven effective to deal with massive data like Deep Neuronal Networks. Evolutionary algorithms are considered not well suitable for such problems because of their relatively high computational cost. This work is an attempt to prove that, with some extensions, evolutionary algorithms could be an interesting solution to learn from very large data sets. We propose the use of the Cartesian Genetic Programming (CGP) as meta-heuristic approach to learn from the Higgs big data set. CGP is extended with an active sampling technique in order to help the algorithm to deal with the mass of the provided data. The proposed method is able to take up the challenge of dealing with the complete benchmark data set of 11 million events and produces satisfactory preliminary results.

Hierarchical Data Topology Based Selection for Large Scale Learning

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), 2016

The amount of available data for data mining and knowledge discovery continues to grow very fast ... more The amount of available data for data mining and knowledge discovery continues to grow very fast with the era of Big Data. Genetic Programming algorithms (GP), that are efficient machine learning techniques, are face up to a new challenge that is to deal with the mass of the provided data. Active Sampling, already used for Active Learning, might be a good solution to improve the Evolutionary Algorithms (EA) training from very big data sets. This paper investigates the adaptation of Topology Based Selection (TBS) to face massive learning datasets by means of Hierarchical Sampling. We propose to combine the Random Subset Selection (RSS) with the TBS to create the RSS-TBS method. Two variants are implemented and applied to solve the KDD intrusion detection problem. They are compared to the original RSS and TBS techniques. The experimental results show that the important computational cost generated by original TBS when applied to large datasets can be lightened with the Hierarchical Sampling.

Chapitre InTech

A logarithmic mutation operator to solve constrained optimization problems

Hybrid Method for Optimal Quantization of the Normal Distribution

Open Transactions on Information Processing, 2014

Quantization of a continuous-value signal into discrete form is a standard task in all analog/dig... more Quantization of a continuous-value signal into discrete form is a standard task in all analog/digital devices commonly used to solve numerical problems in finance. In this paper, we consider quantization of the Normal distribution. We suggest an hybrid technique based on the evolutionary optimization and the Stochastic Gradient for obtaining an optimal L p-quantizer of a multidimensional random variable. First, we present the classical gradient-based approach used up to now to find a near optimal L p-quantizer which is frequently used to solve some high dimensional problems arising in finance. Then, we give an algorithm that permits to deal with the problem in the evolutionary optimization framework. Otherwise, to improve the capacity of the algorithm to fine-tune the best found solutions, we propose an hybrid method combining the two techniques. The objective of the hybrid method is to allow an powerful exploration and exploitation of the problem search space. The effectiveness of the proposed method is demonstrated throw numerical experiments.

Adaptive sampling for active learning with genetic programming

Cognitive Systems Research

Active learning is a machine learning paradigm allowing to decide which inputs to use for trainin... more Active learning is a machine learning paradigm allowing to decide which inputs to use for training. It is introduced to Genetic Programming (GP) essentially thanks to the dynamic data sampling, used to address some known issues such as the computational cost, the over-fitting problem and the imbalanced databases. The traditional dynamic sampling for GP gives to the algorithm a new sample periodically, often each generation, without considering the state of the evolution. In so doing, individuals do not have enough time to extract the hidden knowledge. An alternative approach is to use some information about the learning state to adapt the periodicity of the training data change. In this work, we propose an adaptive sampling strategy for classification tasks based on the state of solved fitness cases throughout learning. It is a flexible approach that could be applied with any dynamic sampling. We implemented some sampling algorithms extended with dynamic and adaptive controlling re-sampling frequency. We experimented them to solve the KDD intrusion detection and the Adult incomes prediction problems with GP. The experimental study demonstrates how the sampling frequency control preserves the power of dynamic sampling with possible improvements in learning time and quality. We also demonstrate that adaptive sampling can be an alternative to multi-level sampling. This work opens many new relevant extension paths.

Liage des données par les systèmes de recommandation intelligents dans une démarche d'optimisation de la qualité des données

Le Centre pour la Communication Scientifique Directe - HAL - memSIC, Nov 27, 2021

Une des phases importantes dans une démarche d'optimisation de la qualité des données d'une base ... more Une des phases importantes dans une démarche d'optimisation de la qualité des données d'une base est le liage des données. Le liage des données s'intéresseà détecter les descriptions référant au même objet du monde réel (e.g. même personne, même livre) afin de les nettoyer [Saïs and Thomopoulos, 2016]. Ce problème est fréquemment rencontré dans le domaine des ventes indirectes réalisées au travers de revendeurs (cadre de cetteétude) retournant des données clients souvent redondantes. L'objectif de ce travail est de fournir un outil performant pour le liage des données basé sur les systèmes de recommandation et l'intelligence artificielle dans un but d'optimisation de la qualité des données. Nous proposons un système hybride combinant un système de recommandation basé sur le contenu avec un système de recommandation basé sur le filtrage collaboratif pour un liage optimal des données clients.