International Journal of Automated Identification Technology, 2010
Abstract: This paper focuses on the use of image-based techniques in biometric verification. A de... more Abstract: This paper focuses on the use of image-based techniques in biometric verification. A detailed review of the existing literature on texture descriptors is provided and several methods are compared on three well known biometric problems: palm verification, knuckle verification and fingerprint verification. The texture descriptors evaluated in this study are based on the most commonly used measures, ie, Gabor filter bank response, local binary patterns, histogram of gradients, and local phase quantization. Moreover, different ...
Abstract-In this paper we make an extensive study of different methods for building ensembles of ... more Abstract-In this paper we make an extensive study of different methods for building ensembles of classifiers. We examine variants of ensemble methods that are based on perturbing features. We illustrate the power of using these variants by applying them to a number of different problems. We find that the best performing ensemble is obtained by combining an approach based on random subspace with a cluster-based input decimated ensemble and the principal direction oracle. Compared with other state-of-the-art ...
Expert Systems With Applications an International Journal, Apr 1, 2009
a b s t r a c t A novel method for building an ensemble of on-line signature verification systems... more a b s t r a c t A novel method for building an ensemble of on-line signature verification systems based on one-class classifiers is presented. The ensemble is built concatenating the classifiers obtained by the Random Subspace on the ''original features" and a set of classifiers each trained selecting a different set of ''artificial features" for each different subset of users that belong to the validation set. The ''artificial features" are extracted using an OverComplete global feature combination, starting from a set of global features a set of artificial features is created by applying mathematical operators to a randomly extracted set of the original ones, then a small subset is selected for verification by running sequential forward floating selection (SFFS). Finally a set of One-class classifiers are used to classify, between genuine and impostor, each match between two signatures. As dataset the MCYT signature database is used, our results show that the proposed ensemble outperforms the ensembles based only on the original features. Using only 5 genuine signatures for each user our best system obtains an equal error rate of 4.5 in the skilled forgeries and 1.4 in the Random Forgeries, when 20 genuine signatures are used to train the classifiers an equal error rate of 2.2 in the skilled forgeries and 0.5 in the Random Forgeries are obtained.
Advances in Experimental Medicine and Biology, 2010
The most common method of handling automated cell phenotype image classification is to determine ... more The most common method of handling automated cell phenotype image classification is to determine a common set of optimal features and then apply standard machine-learning algorithms to classify them. In this chapter, we use advanced methods for determining a set of optimized features for training an ensemble using random subspace with a set of Levenberg-Marquardt neural networks. The process requires that we first run several experiments to determine the individual features that offer the most information. The best performing features are then concatenated and used in the ensemble classification. Applying this approach, we have obtained an average accuracy of 97.4% using the three best benchmarks for this problem: the 2D HeLa dataset and both the endogenous and the transfected LOCATE mouse protein subcellular localization databases.
Expert Systems With Applications an International Journal, May 1, 2009
It is important to develop a reliable system for predicting bacterial virulent proteins for findi... more It is important to develop a reliable system for predicting bacterial virulent proteins for finding novel drug/vaccine and for understanding virulence mechanisms in pathogens.
It is well known in the literature that an ensemble of classifiers obtains good performance with ... more It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is &0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
Expert Systems With Applications an International Journal, Mar 1, 2009
In this paper, we have made an extensive study of artificial intelligence (AI) techniques like en... more In this paper, we have made an extensive study of artificial intelligence (AI) techniques like ensemble of classifiers and feature selection for the identification of students with learning disabilities. The experimental results show that our best method, which combines both ensemble of classifiers and feature selection, can correctly identify up to 50% of the learning disabilities (LD) students with 100% confidence. Also when predicting samples in ''junior high school" using model built on the ''elementary school" students and when the ''junior high school" samples are used to build the model we predict the samples in the ''elementary school" dataset. In particular, we propose variants of two recent Feature Transform-based ensemble methods (Rotation Forest and Input Decimated Ensemble). In the Rotation Forest, the feature set is randomly split into subsets and Principal Component Analysis (PCA) is used to transform the features that belong to a subset. The Input Decimated Ensemble first singles out a given class i and runs PCA on this data only. This transformation is applied to the whole dataset and a classifier D i is trained using these transformed patterns. This choice limits the size of the ensemble to the number of classes. In this paper, we perform an empirical comparison varying the Feature Transform method used in the Rotation Forest technique and we propose a clustering method to overcome the drawback of the Input Decimated Ensemble.
This chapter focuses on the use of machine learning and statistical approaches to combine fingerp... more This chapter focuses on the use of machine learning and statistical approaches to combine fingerprint matchers. The purposes of this chapter are:
In this paper our aim is to study how an ensemble of classifiers can improve the performance of a... more In this paper our aim is to study how an ensemble of classifiers can improve the performance of a machine learning technique for cell phenotype image classification. We want to point out some of the advantages that an ensemble of classifiers permits to obtain respect a stand-alone method. Finally, the preliminary results on the 2D-HeLa dataset, obtained by the fusion between a random subspace of Levenberg-Marquardt neural networks and a variant of the AdaBoost, are reported. It is interesting to note that the proposed system obtains an outstanding 97.5% Rank-1 accuracy and a >99% Rank-2 accuracy.
This paper focuses on the use of image-based techniques in biometric verification. A detailed rev... more This paper focuses on the use of image-based techniques in biometric verification. A detailed review of the existing literature on texture descriptors is provided and several methods are compared on three well known biometric problems: palm verification, knuckle verification and fingerprint verification. The texture descriptors evaluated in this study are based on the most commonly used measures, i.e., Gabor filter bank response, local binary patterns, histogram of gradients, and local phase quantization. Moreover, different distance measures are compared for obtaining the best performing system. The most common method for handling biometric data is to determine a common set of optimal features and then apply standard machine-learning algorithms and distance measures to classify them. In this paper we use advanced supervised selection methods for determining an optimized set of features for training an ensemble of classifiers and for reducing the dimensionality of the feature set by discarding the less discriminative features. The optimization process requires that we first run several experiments to determine which feature set offers the most information. The best performing feature set is then combined and used in the ensemble classification. Extensive experiments conducted over the three well-known biometric datasets show that it is possible to find a set of descriptors that works well for all the three tasks. We are thus able to produce a set of optimal generalized features. The best tested method is local phase quantization.
In this work, we propose a multi-matcher method for on-line signature verification that combines ... more In this work, we propose a multi-matcher method for on-line signature verification that combines bi-class classifiers and one-class classifiers. Global information is extracted with a feature-based representation and recognized by using an ensemble of classifiers. Moreover, we show that methods based on tokenised pseudo-random numbers and user specific signature features are highly dependent upon a parameter, the hashing threshold; we demonstrate that using an ensemble of classifiers it is possible to solve this problem leading to a considerable performance improvement. r
An ensemble of 2D ear matchers is built by training each matcher using a set of Gabor filters and... more An ensemble of 2D ear matchers is built by training each matcher using a set of Gabor filters and color spaces selected by a genetic algorithm (GA). First, using gray level images, we select the best Gabor filters applying Sequential Forward Floating Selection. Second, using the RGB images, several color spaces are obtained using a GA. Finally, an ensemble of 1-nearest neighbor matchers use the color spaces and filters for classification. The performance of the proposed approach is measured using the Notre-Dame EAR dataset. To create the color spaces, the dataset is divided into training and testing sets using ear samples from different individuals. System parameters are selected using samples of individuals that belong to the training set. The method is then tested on the testing set. In this way, we consider our protocol a reliable blind testing protocol. Our system obtains rank-1 of ~81% and rank-5 of ~92%.
In this paper, we investigate the performance of several systems based on ensemble of classifiers... more In this paper, we investigate the performance of several systems based on ensemble of classifiers for bankruptcy prediction and credit scoring.
The problem addressed in this letter concerns the multiclassifier generation by a random subspace... more The problem addressed in this letter concerns the multiclassifier generation by a random subspace method (RSM). In the RSM, the classifiers are constructed in random subspaces of the data feature space. In this letter, we propose an evolved feature weighting approach: in each subspace, the features are multiplied by a weight factor for minimizing the error rate in the training set. An efficient method based on particle swarm optimization (PSO) is here proposed for finding a set of weights for each feature in each subspace. The performance improvement with respect to the state-of-the-art approaches is validated through experiments with several benchmark data sets.
The most common method for handling human action classification is to determine a common set of o... more The most common method for handling human action classification is to determine a common set of optimal features and then apply a machine-learning algorithm to classify them. In this paper we explore combining sets of different features for training an ensemble using random subspace with a set of support vector machines. We propose two novel descriptors for this task domain: one based on Gabor filters and the other based on local binary patterns (LBPs). We then combine these two sets of features with the histogram of gradients. We obtain an accuracy of 97.8% using the 10-class Weizmann dataset and a 100% accuracy rate using the 9-class Weizmann dataset. These results are comparable with the state of the art. By combining sets of relatively simple descriptors it is possible to obtain results comparable to using more sophisticated approaches. Our simpler approach, however, offers the advantage of being less computationally expensive.
The aim of this work is to propose a method for detecting the social meanings that people perceiv... more The aim of this work is to propose a method for detecting the social meanings that people perceive in facial morphology using local face recognition techniques. Developing a reliable method to model people's trait impressions of faces has theoretical value in psychology and human-computer interaction.
In this paper the problem of finding a face recognition system that works well both under variabl... more In this paper the problem of finding a face recognition system that works well both under variable illumination conditions and under strictly controlled acquisition conditions is considered.
We perform an extensive study of the performance of different classification approaches on twenty... more We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset.
This paper purposes a new method for selecting the most discriminant rotation invariant patterns ... more This paper purposes a new method for selecting the most discriminant rotation invariant patterns in local binary patterns and local ternary patterns. Our experiments show that a selection based on variance performs better than the recently proposed method of using dominant local binary patterns (DLBP). Our method uses a random subspace of patterns with higher variance. Features are transformed using Neighborhood Preserving Embedding (NPE) and then used to train a support vector machine. Moreover, we extend DLBP with local ternary patterns (DLTP) and examine methods for building a supervised random subspace of classifiers where each bin of the histogram has a probability of belonging to a given subspace according to its occurrence frequencies. We compare several texture descriptors and show that the random subspace ensemble based on NPE features outperforms other recent state-of-the-art approaches. This conclusion is based on extensive experiments conducted in several domains using five benchmark databases.
International Journal of Automated Identification Technology, 2010
Abstract: This paper focuses on the use of image-based techniques in biometric verification. A de... more Abstract: This paper focuses on the use of image-based techniques in biometric verification. A detailed review of the existing literature on texture descriptors is provided and several methods are compared on three well known biometric problems: palm verification, knuckle verification and fingerprint verification. The texture descriptors evaluated in this study are based on the most commonly used measures, ie, Gabor filter bank response, local binary patterns, histogram of gradients, and local phase quantization. Moreover, different ...
Abstract-In this paper we make an extensive study of different methods for building ensembles of ... more Abstract-In this paper we make an extensive study of different methods for building ensembles of classifiers. We examine variants of ensemble methods that are based on perturbing features. We illustrate the power of using these variants by applying them to a number of different problems. We find that the best performing ensemble is obtained by combining an approach based on random subspace with a cluster-based input decimated ensemble and the principal direction oracle. Compared with other state-of-the-art ...
Expert Systems With Applications an International Journal, Apr 1, 2009
a b s t r a c t A novel method for building an ensemble of on-line signature verification systems... more a b s t r a c t A novel method for building an ensemble of on-line signature verification systems based on one-class classifiers is presented. The ensemble is built concatenating the classifiers obtained by the Random Subspace on the ''original features" and a set of classifiers each trained selecting a different set of ''artificial features" for each different subset of users that belong to the validation set. The ''artificial features" are extracted using an OverComplete global feature combination, starting from a set of global features a set of artificial features is created by applying mathematical operators to a randomly extracted set of the original ones, then a small subset is selected for verification by running sequential forward floating selection (SFFS). Finally a set of One-class classifiers are used to classify, between genuine and impostor, each match between two signatures. As dataset the MCYT signature database is used, our results show that the proposed ensemble outperforms the ensembles based only on the original features. Using only 5 genuine signatures for each user our best system obtains an equal error rate of 4.5 in the skilled forgeries and 1.4 in the Random Forgeries, when 20 genuine signatures are used to train the classifiers an equal error rate of 2.2 in the skilled forgeries and 0.5 in the Random Forgeries are obtained.
Advances in Experimental Medicine and Biology, 2010
The most common method of handling automated cell phenotype image classification is to determine ... more The most common method of handling automated cell phenotype image classification is to determine a common set of optimal features and then apply standard machine-learning algorithms to classify them. In this chapter, we use advanced methods for determining a set of optimized features for training an ensemble using random subspace with a set of Levenberg-Marquardt neural networks. The process requires that we first run several experiments to determine the individual features that offer the most information. The best performing features are then concatenated and used in the ensemble classification. Applying this approach, we have obtained an average accuracy of 97.4% using the three best benchmarks for this problem: the 2D HeLa dataset and both the endogenous and the transfected LOCATE mouse protein subcellular localization databases.
Expert Systems With Applications an International Journal, May 1, 2009
It is important to develop a reliable system for predicting bacterial virulent proteins for findi... more It is important to develop a reliable system for predicting bacterial virulent proteins for finding novel drug/vaccine and for understanding virulence mechanisms in pathogens.
It is well known in the literature that an ensemble of classifiers obtains good performance with ... more It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is &0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
Expert Systems With Applications an International Journal, Mar 1, 2009
In this paper, we have made an extensive study of artificial intelligence (AI) techniques like en... more In this paper, we have made an extensive study of artificial intelligence (AI) techniques like ensemble of classifiers and feature selection for the identification of students with learning disabilities. The experimental results show that our best method, which combines both ensemble of classifiers and feature selection, can correctly identify up to 50% of the learning disabilities (LD) students with 100% confidence. Also when predicting samples in ''junior high school" using model built on the ''elementary school" students and when the ''junior high school" samples are used to build the model we predict the samples in the ''elementary school" dataset. In particular, we propose variants of two recent Feature Transform-based ensemble methods (Rotation Forest and Input Decimated Ensemble). In the Rotation Forest, the feature set is randomly split into subsets and Principal Component Analysis (PCA) is used to transform the features that belong to a subset. The Input Decimated Ensemble first singles out a given class i and runs PCA on this data only. This transformation is applied to the whole dataset and a classifier D i is trained using these transformed patterns. This choice limits the size of the ensemble to the number of classes. In this paper, we perform an empirical comparison varying the Feature Transform method used in the Rotation Forest technique and we propose a clustering method to overcome the drawback of the Input Decimated Ensemble.
This chapter focuses on the use of machine learning and statistical approaches to combine fingerp... more This chapter focuses on the use of machine learning and statistical approaches to combine fingerprint matchers. The purposes of this chapter are:
In this paper our aim is to study how an ensemble of classifiers can improve the performance of a... more In this paper our aim is to study how an ensemble of classifiers can improve the performance of a machine learning technique for cell phenotype image classification. We want to point out some of the advantages that an ensemble of classifiers permits to obtain respect a stand-alone method. Finally, the preliminary results on the 2D-HeLa dataset, obtained by the fusion between a random subspace of Levenberg-Marquardt neural networks and a variant of the AdaBoost, are reported. It is interesting to note that the proposed system obtains an outstanding 97.5% Rank-1 accuracy and a >99% Rank-2 accuracy.
This paper focuses on the use of image-based techniques in biometric verification. A detailed rev... more This paper focuses on the use of image-based techniques in biometric verification. A detailed review of the existing literature on texture descriptors is provided and several methods are compared on three well known biometric problems: palm verification, knuckle verification and fingerprint verification. The texture descriptors evaluated in this study are based on the most commonly used measures, i.e., Gabor filter bank response, local binary patterns, histogram of gradients, and local phase quantization. Moreover, different distance measures are compared for obtaining the best performing system. The most common method for handling biometric data is to determine a common set of optimal features and then apply standard machine-learning algorithms and distance measures to classify them. In this paper we use advanced supervised selection methods for determining an optimized set of features for training an ensemble of classifiers and for reducing the dimensionality of the feature set by discarding the less discriminative features. The optimization process requires that we first run several experiments to determine which feature set offers the most information. The best performing feature set is then combined and used in the ensemble classification. Extensive experiments conducted over the three well-known biometric datasets show that it is possible to find a set of descriptors that works well for all the three tasks. We are thus able to produce a set of optimal generalized features. The best tested method is local phase quantization.
In this work, we propose a multi-matcher method for on-line signature verification that combines ... more In this work, we propose a multi-matcher method for on-line signature verification that combines bi-class classifiers and one-class classifiers. Global information is extracted with a feature-based representation and recognized by using an ensemble of classifiers. Moreover, we show that methods based on tokenised pseudo-random numbers and user specific signature features are highly dependent upon a parameter, the hashing threshold; we demonstrate that using an ensemble of classifiers it is possible to solve this problem leading to a considerable performance improvement. r
An ensemble of 2D ear matchers is built by training each matcher using a set of Gabor filters and... more An ensemble of 2D ear matchers is built by training each matcher using a set of Gabor filters and color spaces selected by a genetic algorithm (GA). First, using gray level images, we select the best Gabor filters applying Sequential Forward Floating Selection. Second, using the RGB images, several color spaces are obtained using a GA. Finally, an ensemble of 1-nearest neighbor matchers use the color spaces and filters for classification. The performance of the proposed approach is measured using the Notre-Dame EAR dataset. To create the color spaces, the dataset is divided into training and testing sets using ear samples from different individuals. System parameters are selected using samples of individuals that belong to the training set. The method is then tested on the testing set. In this way, we consider our protocol a reliable blind testing protocol. Our system obtains rank-1 of ~81% and rank-5 of ~92%.
In this paper, we investigate the performance of several systems based on ensemble of classifiers... more In this paper, we investigate the performance of several systems based on ensemble of classifiers for bankruptcy prediction and credit scoring.
The problem addressed in this letter concerns the multiclassifier generation by a random subspace... more The problem addressed in this letter concerns the multiclassifier generation by a random subspace method (RSM). In the RSM, the classifiers are constructed in random subspaces of the data feature space. In this letter, we propose an evolved feature weighting approach: in each subspace, the features are multiplied by a weight factor for minimizing the error rate in the training set. An efficient method based on particle swarm optimization (PSO) is here proposed for finding a set of weights for each feature in each subspace. The performance improvement with respect to the state-of-the-art approaches is validated through experiments with several benchmark data sets.
The most common method for handling human action classification is to determine a common set of o... more The most common method for handling human action classification is to determine a common set of optimal features and then apply a machine-learning algorithm to classify them. In this paper we explore combining sets of different features for training an ensemble using random subspace with a set of support vector machines. We propose two novel descriptors for this task domain: one based on Gabor filters and the other based on local binary patterns (LBPs). We then combine these two sets of features with the histogram of gradients. We obtain an accuracy of 97.8% using the 10-class Weizmann dataset and a 100% accuracy rate using the 9-class Weizmann dataset. These results are comparable with the state of the art. By combining sets of relatively simple descriptors it is possible to obtain results comparable to using more sophisticated approaches. Our simpler approach, however, offers the advantage of being less computationally expensive.
The aim of this work is to propose a method for detecting the social meanings that people perceiv... more The aim of this work is to propose a method for detecting the social meanings that people perceive in facial morphology using local face recognition techniques. Developing a reliable method to model people's trait impressions of faces has theoretical value in psychology and human-computer interaction.
In this paper the problem of finding a face recognition system that works well both under variabl... more In this paper the problem of finding a face recognition system that works well both under variable illumination conditions and under strictly controlled acquisition conditions is considered.
We perform an extensive study of the performance of different classification approaches on twenty... more We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset.
This paper purposes a new method for selecting the most discriminant rotation invariant patterns ... more This paper purposes a new method for selecting the most discriminant rotation invariant patterns in local binary patterns and local ternary patterns. Our experiments show that a selection based on variance performs better than the recently proposed method of using dominant local binary patterns (DLBP). Our method uses a random subspace of patterns with higher variance. Features are transformed using Neighborhood Preserving Embedding (NPE) and then used to train a support vector machine. Moreover, we extend DLBP with local ternary patterns (DLTP) and examine methods for building a supervised random subspace of classifiers where each bin of the histogram has a probability of belonging to a given subspace according to its occurrence frequencies. We compare several texture descriptors and show that the random subspace ensemble based on NPE features outperforms other recent state-of-the-art approaches. This conclusion is based on extensive experiments conducted in several domains using five benchmark databases.
Uploads
Papers by Loris Nanni