Skip to main content

David Masip

Followers

26

Following

11

Co-authors

10

Public Views

Dr Nafees Ahmad

South Asian University, New Delhi

Kansas State University

University of Cologne

University of Cambridge

M. Angelica Salazar Aguilar

Universidad Autónoma de Nuevo León

Université Laval

University of Trento

Dhanesh Sambariya

Rajasthan Technical University

Institut Teknologi Bandung

Cairo

Interests

Uploads

Papers by David Masip

Emotion recognition from mid-level features

Pattern Recognition Letters, Dec 1, 2015

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service... more This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Highlights • We automatically classify emotions from facial video sequences • We use the Action Units as intermediate features for emotion learning • We fuse structural and appearance information to classify emotions • We propose the use of the Histogram of Action Units for emotion classification • Subtle positive emotions can be automatically inferred with close to human accuracy

Interpreting CNN Models for Apparent Personality Trait Regression

This paper addresses the problem of automatically inferring personality traits of people talking ... more This paper addresses the problem of automatically inferring personality traits of people talking to a camera. As in many other computer vision problems, Convolutional Neural Networks (CNN) models have shown impressive results. However, despite of the success in terms of performance, it is unknown what internal representation emerges in the CNN. This paper presents a deep study on understanding why CNN models are performing surprisingly well in this complex problem. We use current techniques on CNN model interpretability, combined with face detection and Action Unit (AUs) recognition systems, to perform our quantitative studies. Our results show that: (1) face provides most of the discriminative information for personality trait inference, and (2) the internal CNN representations mainly analyze key face regions such as eyes, nose, and mouth. Finally, we study the contribution of AUs for personality trait inference, showing the influence of certain AUs in the facial trait judgments.

Opinion Mining on Educational Resources at the Open University of Catalonia

In order to make improvements to teaching, it is vital to know what students think of the way the... more In order to make improvements to teaching, it is vital to know what students think of the way they are taught. With that purpose in mind, exhaustively analyzing the forums associated with the subjects taught at the Universitat Oberta de Cataluya (UOC) would be extremely helpful, as the university's students often post comments on their learning experiences in them. Exploiting the content of such forums is not a simple undertaking. The volume of data involved is very large, and performing the task manually would require a great deal of effort from lecturers. As a first step to solve this problem, we propose a tool to automatically analyze the posts in forums of communities of UOC students and teachers, with a view to systematically mining the opinions they contain. This article defines the architecture of such tool and explains how lexical-semantic and language technology resources can be used to that end. For pilot testing purposes, the tool has been used to identify students' opinions on the UOC's Business Intelligence master's degree course during the last two years. The paper discusses the results of such test. The contribution of this paper is twofold. Firstly, it demonstrates the feasibility of using natural language parsing techniques to help teachers to make decisions. Secondly, it introduces a simple tool that can be refined and adapted to a virtual environment for the purpose in question.

Explainable, automated urban interventions to improve pedestrian and vehicle safety

arXiv (Cornell University), Oct 22, 2021

At the moment, urban mobility research and governmental initiatives are mostly focused on motor-r... more At the moment, urban mobility research and governmental initiatives are mostly focused on motor-related issues, e.g. the problems of congestion and pollution. And yet, we can not disregard the most vulnerable elements in the urban landscape: pedestrians, exposed to higher risks than other road users. Indeed, safe, accessible, and sustainable transport systems in cities are a core target of the UN's 2030 Agenda. Thus, there is an opportunity to apply advanced computational tools to the problem of traffic safety, in regards especially to pedestrians, who have been often overlooked in the past. This paper combines public data sources, large-scale street imagery and computer vision techniques to approach pedestrian and vehicle safety with an automated, relatively simple, and universally-applicable data-processing scheme. The steps involved in this pipeline include the adaptation and training of a Residual Convolutional Neural Network to determine a hazard index for each given urban scene, as well as an interpretability analysis based on image segmentation and class activation mapping on those same images. Combined, the outcome of this computational approach is a fine-grained map of hazard levels across a city, and an heuristic to identify interventions that might simultaneously improve pedestrian and vehicle safety. The proposed frame

On the Use of External Face Features for Identity Verification

Journal of Multimedia, Jul 1, 2006

Learnheuristics: hybridizing metaheuristics with machine learning for optimization with dynamic inputs

Open Mathematics, Mar 19, 2017

This paper reviews the existing literature on the combination of metaheuristics with machine lear... more This paper reviews the existing literature on the combination of metaheuristics with machine learning methods and then introduces the concept of learnheuristics, a novel type of hybrid algorithms. Learnheuristics can be used to solve combinatorial optimization problems with dynamic inputs (COPDIs). In these COPDIs, the problem inputs (elements either located in the objective function or in the constraints set) are not fixed in advance as usual. On the contrary, they might vary in a predictable (non-random) way as the solution is partially built according to some heuristic-based iterative process. For instance, a consumer's willingness to spend on a specific product might change as the availability of this product decreases and its price rises. Thus, these inputs might take different values depending on the current solution configuration. These variations in the inputs might require from a coordination between the learning mechanism and the metaheuristic algorithm: at each iteration, the learning method updates the inputs model used by the metaheuristic.

Shared Feature Extraction for Nearest Neighbor Face Recognition

IEEE Transactions on Neural Networks, Apr 1, 2008

Feature extraction for nearest neighbor classification: Application to gender recognition

International Journal of Intelligent Systems, Mar 23, 2005

In this article, we perform an extended analysis of different face-processing techniques for gend... more In this article, we perform an extended analysis of different face-processing techniques for gender recognition problems. Prior research works show that support vector machines (SVM) achieve the best classification results. We will show that a nearest neighbor classification approach can reach a similar performance or improve the SVM results, given an adequate selection of features of the input data. This selection is performed using a dimensionality reduction technique based on a modification of nonparametric discriminant analysis, designed to improve the nearest neighbor classification. The choice of nearest neighbor is especially justified by the use of a large database. We also analyze a nonlinear algorithm, locally linear embedding, and its supervised version. Given that this technique is focused on preserving the local configuration of the neighborhood of each point, it should be a priori a good dimensionality reduction technique for extracting good features for nearest neighbor classification. A complete comparative study with the most classical face-processing techniques is also performed.

Face Recognition in the Machine Reveals Properties of Human Face Recognition

arXiv (Cornell University), Dec 1, 2006

Psychophysical studies suggest that face recognition takes place in a narrow band of low spatial ... more Psychophysical studies suggest that face recognition takes place in a narrow band of low spatial frequencies ("critical band"). Here, we examined the recognition performance of an artificial face recognition system as a function of the size of the input images. Recognition performance was quantified with three discriminability measures: Fisher Linear Discriminant Analysis, non Parametric Discriminant Analysis, and mutual information. All of the three measures revealed a maximum at the same image sizes. Since spatial frequency content is a function of image size, our data consistently predict the range of psychophysical found frequencies. Our results therefore support the notion that the critical band of spatial frequencies for face recognition in humans and machines follows from inherent properties of face images.

Speeding Up Neural Networks for Large Scale Classification using WTA Hashing

arXiv (Cornell University), Apr 28, 2015

In this paper we propose to use the Winner Takes All hashing technique to speed up forward propag... more In this paper we propose to use the Winner Takes All hashing technique to speed up forward propagation and backward propagation in fully connected layers in convolutional neural networks. The proposed technique reduces significantly the computational complexity, which in turn, allows us to train layers with a large number of kernels with out the associated time penalty. As a consequence we are able to train convolutional neural network on a very large number of output classes with only a small increase in the computational cost. To show the effectiveness of the technique we train a new output layer on a pretrained network using both the regular multiplicative approach and our proposed hashing methodology. Our results showed no drop in performance and demonstrate, with our implementation, a 7 fold speed up during the training.

Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition

arXiv (Cornell University), Feb 19, 2018

Automated emotion recognition in the wild from facial images remains a challenging problem. Altho... more Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in Deep Learning have supposed a significant breakthrough in this topic, strong changes in pose, orientation and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly, and current state-of-the-art deep learning algorithms cannot model all the aforementioned difficulties. In this paper, we propose to apply a multi-task learning loss function to share a common feature representation with other related tasks. Particularly we show that emotion recognition benefits from jointly learning a model with a detector of facial Action Units (collective muscle movements). The proposed loss function addresses the problem of learning multiple tasks with heterogeneously labeled data, improving previous multi-task approaches. We validate the proposal using two datasets acquired in non controlled environments, and an application to predict compound facial emotion expressions.

Geometry-Based Ensembles: Toward a Structural Characterization of the Classification Boundary

IEEE Transactions on Pattern Analysis and Machine Intelligence, Jun 1, 2009

This paper introduces a novel binary discriminative learning technique based on the approximation... more This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points-points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a final-smooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and nonlinear behavior is obtained. The simplicity of the method allows its extension to cope with some of today's machine learning challenges, such as online learning, large-scale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database, comparing with several state-of-the-art classification techniques. Finally, we apply our technique in online and large-scale scenarios and in six real-life computer vision and pattern recognition problems: gender recognition based on face images, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas' disease myocardial damage severity detection, old musical scores clef classification, and action recognition using 3D accelerometer data from a wearable device. The results are promising and this paper opens a line of research that deserves further attention.

Feature Extraction in Face Recognition

Boosted Linear Projections for Discriminant Analysis

In this paper we explain a new linear Discriminant technique to project high dimensional data int... more In this paper we explain a new linear Discriminant technique to project high dimensional data into a low dimensional subspace where the accuracy of the nearest neighbor classifier is maximized. Our algorithm combines a set of one-dimensional projections, using the Adaboost algorithm, to form the final discriminant projection matrix. We also introduce the way to establish an order to rank

Winner takes all hashing for speeding up the training of neural networks in large class problems

Pattern Recognition Letters, Jul 1, 2017

This paper proposes speeding up of convolutional neural networks using Winner Takes All (WTA) has... more This paper proposes speeding up of convolutional neural networks using Winner Takes All (WTA) hashing. More specifically, WTA hash is used to identify relevant units and only these are computed, effectively ignoring the rest of the units. We show that the proposed method reduces the computational cost of forward and backward propagation for large fully connected layers. This allows us to train classification layers with a large number of units without the associated time penalty. We present different experiments on a dataset with 21K classes to gauge the effectiveness of this proposal. We concretely show that only a small amount of computation is required to train a classification layer. Then we measure and showcase the ability of WTA in identifying the required computation. Furthermore we compare this approach to the baseline and demonstrate a 6 fold speed up during training without compromising on the performance.

An Overview of Deep-Learning-Based Methods for Cardiovascular Risk Assessment with Retinal Images

Diagnostics

Cardiovascular diseases (CVDs) are one of the most prevalent causes of premature death. Early det... more Cardiovascular diseases (CVDs) are one of the most prevalent causes of premature death. Early detection is crucial to prevent and address CVDs in a timely manner. Recent advances in oculomics show that retina fundus imaging (RFI) can carry relevant information for the early diagnosis of several systemic diseases. There is a large corpus of RFI systematically acquired for diagnosing eye-related diseases that could be used for CVDs prevention. Nevertheless, public health systems cannot afford to dedicate expert physicians to only deal with this data, posing the need for automated diagnosis tools that can raise alarms for patients at risk. Artificial Intelligence (AI) and, particularly, deep learning models, became a strong alternative to provide computerized pre-diagnosis for patient risk retrieval. This paper provides a novel review of the major achievements of the recent state-of-the-art DL approaches to automated CVDs diagnosis. This overview gathers commonly used datasets, pre-pro...

VICSOM: VIsual Clues from SOcial Media for psychological assessment

ArXiv, 2019

Sharing multimodal information (typically images, videos or text) in Social Network Sites (SNS) o... more Sharing multimodal information (typically images, videos or text) in Social Network Sites (SNS) occupies a relevant part of our time. The particular way how users expose themselves in SNS can provide useful information to infer human behaviors. This paper proposes to use multimodal data gathered from Instagram accounts to predict the perceived prototypical needs described in Glasser's choice theory. The contribution is two-fold: (i) we provide a large multimodal database from Instagram public profiles (more than 30,000 images and text captions) annotated by expert Psychologists on each perceived behavior according to Glasser's theory, and (ii) we propose to automate the recognition of the (unconsciously) perceived needs by the users. Particularly, we propose a baseline using three different feature sets: visual descriptors based on pixel images (SURF and Visual Bag of Words), a high-level descriptor based on the automated scene description using Convolutional Neural Networks...

Emotions Classification using Facial Action Units Recognition

Using ORB, BoW and SVM to identify and track tagged Norway lobster Nephrops norvegicus (L.)

Instrumentation viewpoint, 2016

Sustainable capture policies of many species strongly depend on the understanding of their social... more Sustainable capture policies of many species strongly depend on the understanding of their social behaviour. Nevertheless, the analysis of emergent behaviour in marine species poses several challenges. Usually animals are captured and observed in tanks, and their behaviour is inferred from their dynamics and interactions. Therefore, researchers must deal with thousands of hours of video data. Without loss of generality, this paper proposes a computer vision approach to identify and track specific species, the Norway lobster, Nephrops norvegicus. We propose an identification scheme were animals are marked using black and white tags with a geometric shape in the center (holed triangle, filled triangle, holed circle and filled circle). Using a massive labelled dataset; we extract local features based on the ORB descriptor. These features are a posteriori clustered, and we construct a Bag of Visual Words feature vector per animal. This approximation yields us invariance to rotation and ...

A Novel Method for Reconstructing CT Images in GATE/GEANT4 with Application in Medical Imaging: A Complexity Analysis Approach

Journal of Information Processing, 2020

For reconstructing CT images in the clinical setting, 'effective energy' is usually used instead ... more For reconstructing CT images in the clinical setting, 'effective energy' is usually used instead of the total X-ray spectrum. This approximation causes an accuracy decline. We proposed to quantize the total X-ray spectrum into irregular intervals to preserve accuracy. A phantom consisting of the skull, rib bone, and lung tissues was irradiated with CT configuration in GATE/GEANT4. We applied inverse Radon transform to the obtained Sinogram to construct a Pixel-based Attenuation Matrix (PAM). PAM was then used to weight the calculated Hounsfield unit scale (HU) of each interval's representative energy. Finally, we multiplied the associated normalized photon flux of each interval to the calculated HUs. The performance of the proposed method was evaluated in the course of Complexity and Visual analysis. Entropy measurements, Kolmogorov complexity, and morphological richness were calculated to evaluate the complexity. Quantitative visual criteria (i.e., PSNR, FSIM, SSIM, and MSE) were reported to show the effectiveness of the fuzzy C-means approach in the segmenting task.

Emotion recognition from mid-level features

Pattern Recognition Letters, Dec 1, 2015

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service... more This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Highlights • We automatically classify emotions from facial video sequences • We use the Action Units as intermediate features for emotion learning • We fuse structural and appearance information to classify emotions • We propose the use of the Histogram of Action Units for emotion classification • Subtle positive emotions can be automatically inferred with close to human accuracy

Interpreting CNN Models for Apparent Personality Trait Regression

This paper addresses the problem of automatically inferring personality traits of people talking ... more This paper addresses the problem of automatically inferring personality traits of people talking to a camera. As in many other computer vision problems, Convolutional Neural Networks (CNN) models have shown impressive results. However, despite of the success in terms of performance, it is unknown what internal representation emerges in the CNN. This paper presents a deep study on understanding why CNN models are performing surprisingly well in this complex problem. We use current techniques on CNN model interpretability, combined with face detection and Action Unit (AUs) recognition systems, to perform our quantitative studies. Our results show that: (1) face provides most of the discriminative information for personality trait inference, and (2) the internal CNN representations mainly analyze key face regions such as eyes, nose, and mouth. Finally, we study the contribution of AUs for personality trait inference, showing the influence of certain AUs in the facial trait judgments.

Opinion Mining on Educational Resources at the Open University of Catalonia

In order to make improvements to teaching, it is vital to know what students think of the way the... more In order to make improvements to teaching, it is vital to know what students think of the way they are taught. With that purpose in mind, exhaustively analyzing the forums associated with the subjects taught at the Universitat Oberta de Cataluya (UOC) would be extremely helpful, as the university's students often post comments on their learning experiences in them. Exploiting the content of such forums is not a simple undertaking. The volume of data involved is very large, and performing the task manually would require a great deal of effort from lecturers. As a first step to solve this problem, we propose a tool to automatically analyze the posts in forums of communities of UOC students and teachers, with a view to systematically mining the opinions they contain. This article defines the architecture of such tool and explains how lexical-semantic and language technology resources can be used to that end. For pilot testing purposes, the tool has been used to identify students' opinions on the UOC's Business Intelligence master's degree course during the last two years. The paper discusses the results of such test. The contribution of this paper is twofold. Firstly, it demonstrates the feasibility of using natural language parsing techniques to help teachers to make decisions. Secondly, it introduces a simple tool that can be refined and adapted to a virtual environment for the purpose in question.

Explainable, automated urban interventions to improve pedestrian and vehicle safety

arXiv (Cornell University), Oct 22, 2021

At the moment, urban mobility research and governmental initiatives are mostly focused on motor-r... more At the moment, urban mobility research and governmental initiatives are mostly focused on motor-related issues, e.g. the problems of congestion and pollution. And yet, we can not disregard the most vulnerable elements in the urban landscape: pedestrians, exposed to higher risks than other road users. Indeed, safe, accessible, and sustainable transport systems in cities are a core target of the UN's 2030 Agenda. Thus, there is an opportunity to apply advanced computational tools to the problem of traffic safety, in regards especially to pedestrians, who have been often overlooked in the past. This paper combines public data sources, large-scale street imagery and computer vision techniques to approach pedestrian and vehicle safety with an automated, relatively simple, and universally-applicable data-processing scheme. The steps involved in this pipeline include the adaptation and training of a Residual Convolutional Neural Network to determine a hazard index for each given urban scene, as well as an interpretability analysis based on image segmentation and class activation mapping on those same images. Combined, the outcome of this computational approach is a fine-grained map of hazard levels across a city, and an heuristic to identify interventions that might simultaneously improve pedestrian and vehicle safety. The proposed frame

On the Use of External Face Features for Identity Verification

Journal of Multimedia, Jul 1, 2006

Learnheuristics: hybridizing metaheuristics with machine learning for optimization with dynamic inputs

Open Mathematics, Mar 19, 2017

This paper reviews the existing literature on the combination of metaheuristics with machine lear... more This paper reviews the existing literature on the combination of metaheuristics with machine learning methods and then introduces the concept of learnheuristics, a novel type of hybrid algorithms. Learnheuristics can be used to solve combinatorial optimization problems with dynamic inputs (COPDIs). In these COPDIs, the problem inputs (elements either located in the objective function or in the constraints set) are not fixed in advance as usual. On the contrary, they might vary in a predictable (non-random) way as the solution is partially built according to some heuristic-based iterative process. For instance, a consumer's willingness to spend on a specific product might change as the availability of this product decreases and its price rises. Thus, these inputs might take different values depending on the current solution configuration. These variations in the inputs might require from a coordination between the learning mechanism and the metaheuristic algorithm: at each iteration, the learning method updates the inputs model used by the metaheuristic.

Shared Feature Extraction for Nearest Neighbor Face Recognition

IEEE Transactions on Neural Networks, Apr 1, 2008

Feature extraction for nearest neighbor classification: Application to gender recognition

International Journal of Intelligent Systems, Mar 23, 2005

In this article, we perform an extended analysis of different face-processing techniques for gend... more In this article, we perform an extended analysis of different face-processing techniques for gender recognition problems. Prior research works show that support vector machines (SVM) achieve the best classification results. We will show that a nearest neighbor classification approach can reach a similar performance or improve the SVM results, given an adequate selection of features of the input data. This selection is performed using a dimensionality reduction technique based on a modification of nonparametric discriminant analysis, designed to improve the nearest neighbor classification. The choice of nearest neighbor is especially justified by the use of a large database. We also analyze a nonlinear algorithm, locally linear embedding, and its supervised version. Given that this technique is focused on preserving the local configuration of the neighborhood of each point, it should be a priori a good dimensionality reduction technique for extracting good features for nearest neighbor classification. A complete comparative study with the most classical face-processing techniques is also performed.

Face Recognition in the Machine Reveals Properties of Human Face Recognition

arXiv (Cornell University), Dec 1, 2006

Psychophysical studies suggest that face recognition takes place in a narrow band of low spatial ... more Psychophysical studies suggest that face recognition takes place in a narrow band of low spatial frequencies ("critical band"). Here, we examined the recognition performance of an artificial face recognition system as a function of the size of the input images. Recognition performance was quantified with three discriminability measures: Fisher Linear Discriminant Analysis, non Parametric Discriminant Analysis, and mutual information. All of the three measures revealed a maximum at the same image sizes. Since spatial frequency content is a function of image size, our data consistently predict the range of psychophysical found frequencies. Our results therefore support the notion that the critical band of spatial frequencies for face recognition in humans and machines follows from inherent properties of face images.

Speeding Up Neural Networks for Large Scale Classification using WTA Hashing

arXiv (Cornell University), Apr 28, 2015

In this paper we propose to use the Winner Takes All hashing technique to speed up forward propag... more In this paper we propose to use the Winner Takes All hashing technique to speed up forward propagation and backward propagation in fully connected layers in convolutional neural networks. The proposed technique reduces significantly the computational complexity, which in turn, allows us to train layers with a large number of kernels with out the associated time penalty. As a consequence we are able to train convolutional neural network on a very large number of output classes with only a small increase in the computational cost. To show the effectiveness of the technique we train a new output layer on a pretrained network using both the regular multiplicative approach and our proposed hashing methodology. Our results showed no drop in performance and demonstrate, with our implementation, a 7 fold speed up during the training.

Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition

arXiv (Cornell University), Feb 19, 2018

Automated emotion recognition in the wild from facial images remains a challenging problem. Altho... more Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in Deep Learning have supposed a significant breakthrough in this topic, strong changes in pose, orientation and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly, and current state-of-the-art deep learning algorithms cannot model all the aforementioned difficulties. In this paper, we propose to apply a multi-task learning loss function to share a common feature representation with other related tasks. Particularly we show that emotion recognition benefits from jointly learning a model with a detector of facial Action Units (collective muscle movements). The proposed loss function addresses the problem of learning multiple tasks with heterogeneously labeled data, improving previous multi-task approaches. We validate the proposal using two datasets acquired in non controlled environments, and an application to predict compound facial emotion expressions.

Geometry-Based Ensembles: Toward a Structural Characterization of the Classification Boundary

IEEE Transactions on Pattern Analysis and Machine Intelligence, Jun 1, 2009

This paper introduces a novel binary discriminative learning technique based on the approximation... more This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points-points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a final-smooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and nonlinear behavior is obtained. The simplicity of the method allows its extension to cope with some of today's machine learning challenges, such as online learning, large-scale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database, comparing with several state-of-the-art classification techniques. Finally, we apply our technique in online and large-scale scenarios and in six real-life computer vision and pattern recognition problems: gender recognition based on face images, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas' disease myocardial damage severity detection, old musical scores clef classification, and action recognition using 3D accelerometer data from a wearable device. The results are promising and this paper opens a line of research that deserves further attention.

Feature Extraction in Face Recognition

Boosted Linear Projections for Discriminant Analysis

In this paper we explain a new linear Discriminant technique to project high dimensional data int... more In this paper we explain a new linear Discriminant technique to project high dimensional data into a low dimensional subspace where the accuracy of the nearest neighbor classifier is maximized. Our algorithm combines a set of one-dimensional projections, using the Adaboost algorithm, to form the final discriminant projection matrix. We also introduce the way to establish an order to rank

Winner takes all hashing for speeding up the training of neural networks in large class problems

Pattern Recognition Letters, Jul 1, 2017

This paper proposes speeding up of convolutional neural networks using Winner Takes All (WTA) has... more This paper proposes speeding up of convolutional neural networks using Winner Takes All (WTA) hashing. More specifically, WTA hash is used to identify relevant units and only these are computed, effectively ignoring the rest of the units. We show that the proposed method reduces the computational cost of forward and backward propagation for large fully connected layers. This allows us to train classification layers with a large number of units without the associated time penalty. We present different experiments on a dataset with 21K classes to gauge the effectiveness of this proposal. We concretely show that only a small amount of computation is required to train a classification layer. Then we measure and showcase the ability of WTA in identifying the required computation. Furthermore we compare this approach to the baseline and demonstrate a 6 fold speed up during training without compromising on the performance.

An Overview of Deep-Learning-Based Methods for Cardiovascular Risk Assessment with Retinal Images

Diagnostics

Cardiovascular diseases (CVDs) are one of the most prevalent causes of premature death. Early det... more Cardiovascular diseases (CVDs) are one of the most prevalent causes of premature death. Early detection is crucial to prevent and address CVDs in a timely manner. Recent advances in oculomics show that retina fundus imaging (RFI) can carry relevant information for the early diagnosis of several systemic diseases. There is a large corpus of RFI systematically acquired for diagnosing eye-related diseases that could be used for CVDs prevention. Nevertheless, public health systems cannot afford to dedicate expert physicians to only deal with this data, posing the need for automated diagnosis tools that can raise alarms for patients at risk. Artificial Intelligence (AI) and, particularly, deep learning models, became a strong alternative to provide computerized pre-diagnosis for patient risk retrieval. This paper provides a novel review of the major achievements of the recent state-of-the-art DL approaches to automated CVDs diagnosis. This overview gathers commonly used datasets, pre-pro...

VICSOM: VIsual Clues from SOcial Media for psychological assessment

ArXiv, 2019

Sharing multimodal information (typically images, videos or text) in Social Network Sites (SNS) o... more Sharing multimodal information (typically images, videos or text) in Social Network Sites (SNS) occupies a relevant part of our time. The particular way how users expose themselves in SNS can provide useful information to infer human behaviors. This paper proposes to use multimodal data gathered from Instagram accounts to predict the perceived prototypical needs described in Glasser's choice theory. The contribution is two-fold: (i) we provide a large multimodal database from Instagram public profiles (more than 30,000 images and text captions) annotated by expert Psychologists on each perceived behavior according to Glasser's theory, and (ii) we propose to automate the recognition of the (unconsciously) perceived needs by the users. Particularly, we propose a baseline using three different feature sets: visual descriptors based on pixel images (SURF and Visual Bag of Words), a high-level descriptor based on the automated scene description using Convolutional Neural Networks...

Emotions Classification using Facial Action Units Recognition

Using ORB, BoW and SVM to identify and track tagged Norway lobster Nephrops norvegicus (L.)

Instrumentation viewpoint, 2016

Sustainable capture policies of many species strongly depend on the understanding of their social... more Sustainable capture policies of many species strongly depend on the understanding of their social behaviour. Nevertheless, the analysis of emergent behaviour in marine species poses several challenges. Usually animals are captured and observed in tanks, and their behaviour is inferred from their dynamics and interactions. Therefore, researchers must deal with thousands of hours of video data. Without loss of generality, this paper proposes a computer vision approach to identify and track specific species, the Norway lobster, Nephrops norvegicus. We propose an identification scheme were animals are marked using black and white tags with a geometric shape in the center (holed triangle, filled triangle, holed circle and filled circle). Using a massive labelled dataset; we extract local features based on the ORB descriptor. These features are a posteriori clustered, and we construct a Bag of Visual Words feature vector per animal. This approximation yields us invariance to rotation and ...

A Novel Method for Reconstructing CT Images in GATE/GEANT4 with Application in Medical Imaging: A Complexity Analysis Approach

Journal of Information Processing, 2020

For reconstructing CT images in the clinical setting, 'effective energy' is usually used instead ... more For reconstructing CT images in the clinical setting, 'effective energy' is usually used instead of the total X-ray spectrum. This approximation causes an accuracy decline. We proposed to quantize the total X-ray spectrum into irregular intervals to preserve accuracy. A phantom consisting of the skull, rib bone, and lung tissues was irradiated with CT configuration in GATE/GEANT4. We applied inverse Radon transform to the obtained Sinogram to construct a Pixel-based Attenuation Matrix (PAM). PAM was then used to weight the calculated Hounsfield unit scale (HU) of each interval's representative energy. Finally, we multiplied the associated normalized photon flux of each interval to the calculated HUs. The performance of the proposed method was evaluated in the course of Complexity and Visual analysis. Entropy measurements, Kolmogorov complexity, and morphological richness were calculated to evaluate the complexity. Quantitative visual criteria (i.e., PSNR, FSIM, SSIM, and MSE) were reported to show the effectiveness of the fuzzy C-means approach in the segmenting task.