The two most common paradigms for end-to-end speech recognition are connectionist temporal classi... more The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.
arXiv: Instrumentation and Methods for Astrophysics, 2015
Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount ... more Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount of objects, spectral data cannot be obtained for all of them. Therefore it is important to devise techniques for reliably estimating physical properties of objects from photometric information alone. These estimates are needed to automatically identify interesting objects worth a follow-up investigation as well as to produce the required data for a statistical analysis of the space covered by a survey. We argue that machine learning techniques are suitable to compute these estimates accurately and efficiently. This study considers the task of estimating the specific star formation rate (sSFR) of galaxies. It is shown that a nearest neighbours algorithm can produce better sSFR estimates than traditional SED fitting. We show that we can obtain accurate estimates of the sSFR even at high redshifts using only broad-band photometry based on the u, g, r, i and z filters from Sloan Digital Sky ...
Astronomy and astrophysics are entering a data-rich era. Large surveys have, quite literally, see... more Astronomy and astrophysics are entering a data-rich era. Large surveys have, quite literally, seen the light in the past decade, with more and larger telescopes to follow in the coming years. Data is now so abundant that making use of all the information is a difficult tasks. This thesis sets out from the assumption that there is more to gain from available data sets – new information from old data. Three contributions in this direction are considered. Firstly, a novel texture descriptor for parametrising galaxy morphology is presented. It uses the shape index and curvedness of local regions in images of galaxies and condenses information about the local structure to a single value. It is argued that this value can be interpreted as indicating regions of morphological interest, for example regions of newly formed stars, of gas and dust, spiral arms etc. The descriptor is shown to extract information about a galaxy’s specific star formation rate from its images that the usual spectra...
Breast cancer risk assessment is becoming increasingly important in clinical practice. It has bee... more Breast cancer risk assessment is becoming increasingly important in clinical practice. It has been suggested that features that characterize mammographic texture are more predictive for breast cancer than breast density. Yet, strong correlation between both types of features is an issue in many studies. In this work we investigate a method to generate texture features and/or scores that are independent of breast density. The method is especially useful in settings where features are learned from the data itself. We evaluate our method on a case control set comprising 394 cancers, and 1182 healthy controls. We show that the learned density independent texture features are significantly associated with breast cancer risk. As such it may aid in exploring breast characteristics that are predictive of breast cancer irrespective of breast density. Furthermore it offers opportunities to enhance personalized breast cancer screening beyond breast density.
Behavior planning of a vehicle in real-world traffic is a difficult problem. Complex systems have... more Behavior planning of a vehicle in real-world traffic is a difficult problem. Complex systems have to be build to accomplish the projection of tasks, environmental constraints, and purposes of the driver to the dynamics of two controlled variables: steering angle and velocity. This paper comprises two parts. First, the behavior planning for the task of intelligent cruise control is proposed. The controlled variables are determined by evaluating the dynamics of two one-dimensional neural fields. The information concerning the actual situation and driver preferences is coupled additively into the field. Second, the parameters of the dynamics for the steering angle are adjusted by a state-of-the-art evolution strategy in order to achieve a smooth, comfortable trajectory. The behavior of the vehicle is successfully controlled by the neural field dynamics in the testbed of a simulation environment.
The covariance matrix adaptation evolution strategy (CMA-ES) is arguably one of the most powerful... more The covariance matrix adaptation evolution strategy (CMA-ES) is arguably one of the most powerful real-valued derivative-free optimization algorithms, finding many applications in machine learning. The CMA-ES is a Monte Carlo method, sampling from a sequence of multi-variate Gaussian distributions. Given the function values at the sampled points, updating and storing the covariance matrix dominates the time and space complexity in each iteration of the algorithm. We propose a numerically stable quadratic-time covariance matrix update scheme with minimal memory requirements based on maintaining triangular Cholesky factors. This requires a modification of the cumulative step-size adaption (CSA) mechanism in the CMA-ES, in which we replace the inverse of the square root of the covariance matrix by the inverse of the triangular Cholesky factor. Because the triangular Cholesky factor changes smoothly with the matrix square root, this modification does not change the behavior of the CMA-E...
Sleep disorders affect a large portion of the global population and are strong predictors of morb... more Sleep disorders affect a large portion of the global population and are strong predictors of morbidity and all-cause mortality. Sleep staging segments a period of sleep into a sequence of phases providing the basis for most clinical decisions in sleep medicine. Manual sleep staging is difficult and time-consuming as experts must evaluate hours of polysomnography (PSG) recordings with electroencephalography (EEG) and electrooculography (EOG) data for each patient. Here, we present U-Sleep, a publicly available, ready-to-use deep-learning-based system for automated sleep staging (sleep.ai.ku.dk). U-Sleep is a fully convolutional neural network, which was trained and evaluated on PSG recordings from 15,660 participants of 16 clinical studies. It provides accurate segmentations across a wide range of patient cohorts and PSG protocols not considered when building the system. U-Sleep works for arbitrary combinations of typical EEG and EOG channels, and its special deep learning architectu...
This report documents the talks and discussions at the Dagstuhl Seminar 15211 "Theory of Evo... more This report documents the talks and discussions at the Dagstuhl Seminar 15211 "Theory of Evolutionary Algorithms". This seminar, now in its 8th edition, is the main meeting point of the highly active theory of randomized search heuristics subcommunities in Australia, Asia, North America, and Europe. Topics intensively discussed include rigorous runtime analysis and computational complexity theory for randomised search heuristics, information geometry of randomised search, and synergies between the theory of evolutionary algorithms and theories of natural evolution.
2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017
Selecting an optimal subset of k k k out of d d d features for linear regression models given n n... more Selecting an optimal subset of k k k out of d d d features for linear regression models given n n n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massivelyparallel implementation for selecting such optimal feature subsets in a brute-force fashion for small k k k. By exploiting the enormous compute power provided by modern parallel devices such as graphics processing units, it can deal with thousands of input dimensions even using standard commodity hardware only. We evaluate the practical runtime using artificial datasets and sketch the applicability of our framework in the context of astronomy.
The 2003 Congress on Evolutionary Computation, 2003. CEC '03.
We apply the CMA-ES, an evolution strategy which efficiently adapts the covariance matrix of the ... more We apply the CMA-ES, an evolution strategy which efficiently adapts the covariance matrix of the mutation distribution, to the optimization of the weights of neural networks for solving reinforcement learning problems. It turns out that the topology of the networks considerably influences the time to find a suitable control strategy. Still, our results with fixed network topologies are significantly better than those reported for the best evolutionary method so far, which adapts both the weights and the structure of the networks.
Estimation of optical flow is required in many computer vision applications. These applications o... more Estimation of optical flow is required in many computer vision applications. These applications often have to deal with strict time constraints. Therefore, flow algorithms with both high accuracy and computational efficiency are desirable. Accordingly, designing such a flow algorithm involves multi-objective optimization. In this work, we build on a popular algorithm developed for realtime applications. It is originally based on the Census transform and benefits from this encoding for table-based matching and tracking of interest points. We propose to use the more universal Haar wavelet features instead of the Census transform within the same framework. The resulting approach is more flexible, in particular it allows for sub-pixel accuracy. For comparison with the original method and another baseline algorithm, we considered both popular benchmark datasets as well as a long synthetic video sequence. We employed evolutionary multi-objective optimization to tune the algorithms. This allows to compare the different approaches in a systematic and unbiased way. Our results show that the overall performance of our method is significantly higher compared to the reference implementation.
Because of their convincing performance, there is a growing interest in using evolutionary algori... more Because of their convincing performance, there is a growing interest in using evolutionary algorithms for reinforcement learning. We propose learning of neural network policies by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variablemetric search algorithm for continuous optimization. We argue that this approach, which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for reinforcement learning, in particular because it is based on ranking policies (and therefore robust against noise), efficiently detects correlations between parameters, and infers a search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance.
The two most common paradigms for end-to-end speech recognition are connectionist temporal classi... more The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.
arXiv: Instrumentation and Methods for Astrophysics, 2015
Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount ... more Large-scale surveys make huge amounts of photometric data available. Because of the sheer amount of objects, spectral data cannot be obtained for all of them. Therefore it is important to devise techniques for reliably estimating physical properties of objects from photometric information alone. These estimates are needed to automatically identify interesting objects worth a follow-up investigation as well as to produce the required data for a statistical analysis of the space covered by a survey. We argue that machine learning techniques are suitable to compute these estimates accurately and efficiently. This study considers the task of estimating the specific star formation rate (sSFR) of galaxies. It is shown that a nearest neighbours algorithm can produce better sSFR estimates than traditional SED fitting. We show that we can obtain accurate estimates of the sSFR even at high redshifts using only broad-band photometry based on the u, g, r, i and z filters from Sloan Digital Sky ...
Astronomy and astrophysics are entering a data-rich era. Large surveys have, quite literally, see... more Astronomy and astrophysics are entering a data-rich era. Large surveys have, quite literally, seen the light in the past decade, with more and larger telescopes to follow in the coming years. Data is now so abundant that making use of all the information is a difficult tasks. This thesis sets out from the assumption that there is more to gain from available data sets – new information from old data. Three contributions in this direction are considered. Firstly, a novel texture descriptor for parametrising galaxy morphology is presented. It uses the shape index and curvedness of local regions in images of galaxies and condenses information about the local structure to a single value. It is argued that this value can be interpreted as indicating regions of morphological interest, for example regions of newly formed stars, of gas and dust, spiral arms etc. The descriptor is shown to extract information about a galaxy’s specific star formation rate from its images that the usual spectra...
Breast cancer risk assessment is becoming increasingly important in clinical practice. It has bee... more Breast cancer risk assessment is becoming increasingly important in clinical practice. It has been suggested that features that characterize mammographic texture are more predictive for breast cancer than breast density. Yet, strong correlation between both types of features is an issue in many studies. In this work we investigate a method to generate texture features and/or scores that are independent of breast density. The method is especially useful in settings where features are learned from the data itself. We evaluate our method on a case control set comprising 394 cancers, and 1182 healthy controls. We show that the learned density independent texture features are significantly associated with breast cancer risk. As such it may aid in exploring breast characteristics that are predictive of breast cancer irrespective of breast density. Furthermore it offers opportunities to enhance personalized breast cancer screening beyond breast density.
Behavior planning of a vehicle in real-world traffic is a difficult problem. Complex systems have... more Behavior planning of a vehicle in real-world traffic is a difficult problem. Complex systems have to be build to accomplish the projection of tasks, environmental constraints, and purposes of the driver to the dynamics of two controlled variables: steering angle and velocity. This paper comprises two parts. First, the behavior planning for the task of intelligent cruise control is proposed. The controlled variables are determined by evaluating the dynamics of two one-dimensional neural fields. The information concerning the actual situation and driver preferences is coupled additively into the field. Second, the parameters of the dynamics for the steering angle are adjusted by a state-of-the-art evolution strategy in order to achieve a smooth, comfortable trajectory. The behavior of the vehicle is successfully controlled by the neural field dynamics in the testbed of a simulation environment.
The covariance matrix adaptation evolution strategy (CMA-ES) is arguably one of the most powerful... more The covariance matrix adaptation evolution strategy (CMA-ES) is arguably one of the most powerful real-valued derivative-free optimization algorithms, finding many applications in machine learning. The CMA-ES is a Monte Carlo method, sampling from a sequence of multi-variate Gaussian distributions. Given the function values at the sampled points, updating and storing the covariance matrix dominates the time and space complexity in each iteration of the algorithm. We propose a numerically stable quadratic-time covariance matrix update scheme with minimal memory requirements based on maintaining triangular Cholesky factors. This requires a modification of the cumulative step-size adaption (CSA) mechanism in the CMA-ES, in which we replace the inverse of the square root of the covariance matrix by the inverse of the triangular Cholesky factor. Because the triangular Cholesky factor changes smoothly with the matrix square root, this modification does not change the behavior of the CMA-E...
Sleep disorders affect a large portion of the global population and are strong predictors of morb... more Sleep disorders affect a large portion of the global population and are strong predictors of morbidity and all-cause mortality. Sleep staging segments a period of sleep into a sequence of phases providing the basis for most clinical decisions in sleep medicine. Manual sleep staging is difficult and time-consuming as experts must evaluate hours of polysomnography (PSG) recordings with electroencephalography (EEG) and electrooculography (EOG) data for each patient. Here, we present U-Sleep, a publicly available, ready-to-use deep-learning-based system for automated sleep staging (sleep.ai.ku.dk). U-Sleep is a fully convolutional neural network, which was trained and evaluated on PSG recordings from 15,660 participants of 16 clinical studies. It provides accurate segmentations across a wide range of patient cohorts and PSG protocols not considered when building the system. U-Sleep works for arbitrary combinations of typical EEG and EOG channels, and its special deep learning architectu...
This report documents the talks and discussions at the Dagstuhl Seminar 15211 "Theory of Evo... more This report documents the talks and discussions at the Dagstuhl Seminar 15211 "Theory of Evolutionary Algorithms". This seminar, now in its 8th edition, is the main meeting point of the highly active theory of randomized search heuristics subcommunities in Australia, Asia, North America, and Europe. Topics intensively discussed include rigorous runtime analysis and computational complexity theory for randomised search heuristics, information geometry of randomised search, and synergies between the theory of evolutionary algorithms and theories of natural evolution.
2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017
Selecting an optimal subset of k k k out of d d d features for linear regression models given n n... more Selecting an optimal subset of k k k out of d d d features for linear regression models given n n n training instances is often considered intractable for feature spaces with hundreds or thousands of dimensions. We propose an efficient massivelyparallel implementation for selecting such optimal feature subsets in a brute-force fashion for small k k k. By exploiting the enormous compute power provided by modern parallel devices such as graphics processing units, it can deal with thousands of input dimensions even using standard commodity hardware only. We evaluate the practical runtime using artificial datasets and sketch the applicability of our framework in the context of astronomy.
The 2003 Congress on Evolutionary Computation, 2003. CEC '03.
We apply the CMA-ES, an evolution strategy which efficiently adapts the covariance matrix of the ... more We apply the CMA-ES, an evolution strategy which efficiently adapts the covariance matrix of the mutation distribution, to the optimization of the weights of neural networks for solving reinforcement learning problems. It turns out that the topology of the networks considerably influences the time to find a suitable control strategy. Still, our results with fixed network topologies are significantly better than those reported for the best evolutionary method so far, which adapts both the weights and the structure of the networks.
Estimation of optical flow is required in many computer vision applications. These applications o... more Estimation of optical flow is required in many computer vision applications. These applications often have to deal with strict time constraints. Therefore, flow algorithms with both high accuracy and computational efficiency are desirable. Accordingly, designing such a flow algorithm involves multi-objective optimization. In this work, we build on a popular algorithm developed for realtime applications. It is originally based on the Census transform and benefits from this encoding for table-based matching and tracking of interest points. We propose to use the more universal Haar wavelet features instead of the Census transform within the same framework. The resulting approach is more flexible, in particular it allows for sub-pixel accuracy. For comparison with the original method and another baseline algorithm, we considered both popular benchmark datasets as well as a long synthetic video sequence. We employed evolutionary multi-objective optimization to tune the algorithms. This allows to compare the different approaches in a systematic and unbiased way. Our results show that the overall performance of our method is significantly higher compared to the reference implementation.
Because of their convincing performance, there is a growing interest in using evolutionary algori... more Because of their convincing performance, there is a growing interest in using evolutionary algorithms for reinforcement learning. We propose learning of neural network policies by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variablemetric search algorithm for continuous optimization. We argue that this approach, which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for reinforcement learning, in particular because it is based on ranking policies (and therefore robust against noise), efficiently detects correlations between parameters, and infers a search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance.
Uploads
Papers by Christian Igel