Papers by Dimitri Kanevsky
Abstract Acoustic sensor networks can be used for localization of an acoustic-energy emitting sou... more Abstract Acoustic sensor networks can be used for localization of an acoustic-energy emitting source. While maximum-likelihood (ML) methods are widely used for estimating the pattern of motion, more advanced machine learning schemes should be employed for improving the accuracy of localization. In this paper, we develop a learning Bayesian tracking algorithm that is capable of reconstructing the target transition model using passive wireless acoustic sensors.
Abstract In this paper we present a novel compressed sensing (CS) algorithm for the recovery of c... more Abstract In this paper we present a novel compressed sensing (CS) algorithm for the recovery of compressible, possibly time-varying, signal from a sequence of noisy observations. The newly derived scheme is based on the acclaimed unscented Kalman filter (UKF), and is essentially self reliant in the sense that no peripheral optimization or CS algorithm is required for identifying the underlying signal support.
Abstract In this paper, we explore the use of exemplar-based sparse representations (SRs) to map ... more Abstract In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test features into the linear span of training examples. We show that the frame classification accuracy with these new features is 1. 3% higher than a Gaussian Mixture Model (GMM), showing that not only do SRs move test features closer to training, but also move the features closer to the correct class. Given these new SR features, we train up a Hidden Markov Model (HMM) on these features and perform recognition.
Abstract Sparse representations (SRs) are often used to characterize a test signal using few supp... more Abstract Sparse representations (SRs) are often used to characterize a test signal using few support training examples, and allow the number of supports to be adapted to the specific signal being categorized. Given the good performance of SRs compared to other classifiers for both image classification and phonetic classification, in this paper, we extended the use of SRs for text classification, a method which has thus far not been explored for this domain.
Abstract The use of exemplar-based techniques for both speech classification and recognition task... more Abstract The use of exemplar-based techniques for both speech classification and recognition tasks has become increasingly popular in recent years. However, the notion of why sparseness is important for exemplar-based speech processing has been relatively unexplored. In addition, little analysis has been done in speech processing on the appropriateness of different types of sparsity regularization constraints.
Abstract.—Let V be a plane smooth cubic curve over a finitely generated field k. The Mordell-Weil... more Abstract.—Let V be a plane smooth cubic curve over a finitely generated field k. The Mordell-Weil theorem for V states that there is a finite subset PCV (k) such that the whole V (k) can be obtained from P by drawing secants and tangents through pairs of previously constructed points and consecutively adding their new intersection points with V. Equivalently, the group of birational transformations of V generated by reflections with respect to fc-points is finitely generated.
Abstract Sparse representation techniques, such as Support Vector Machines (SVMs), k-nearest neig... more Abstract Sparse representation techniques, such as Support Vector Machines (SVMs), k-nearest neighbor (kNN) and Bayesian Compressive Sensing (BCS), can be used to characterize a test sample from a few support training samples in a dictionary set. In this paper, we introduce a semi-gaussian constraint into the BCS formulation, which allows support parameters to be estimated using a closed-form iterative solution.
Abstract Sparse representation phone identification features (SPIF) is a recently developed techn... more Abstract Sparse representation phone identification features (SPIF) is a recently developed technique to obtain an estimate of phone posterior probabilities conditioned on an acoustic feature vector. In this paper, we explore incorporating SPIF phone posterior probability estimates in large vocabulary continuous speech recognition (LVCSR) task by including them as additional features of exponential densities that model the HMM state emission likelihoods.
ABSTRACT We describe a system which automatically transcribes broadcast news in less than 10 time... more ABSTRACT We describe a system which automatically transcribes broadcast news in less than 10 times real-time. We detail the system architecture of this system, which was used by IBM in the 1999 HUB4 10xRT evaluation, and show that the performance of this system is over 20 percent more accurate at the same speed than the system we used in the 1998 evaluation.
Abstract Telematics services in cars (like navigation, cellular telephone, internet access) are b... more Abstract Telematics services in cars (like navigation, cellular telephone, internet access) are becoming increasingly popular, but they may distract drivers from their main driving tasks and negatively affect driving safety. This paper addresses some aspects of voice user interface in cars, as a mechanism to increase driver safety. Voice control becomes more efficient in reducing driver distraction if drivers can speak commands in a natural manner rather than having to remember one or two variants supported by the system.
The discrimination technique for estimating parameters of Gaussian mixtures that is based on the ... more The discrimination technique for estimating parameters of Gaussian mixtures that is based on the Extended Baum-Welch transformations (EBW) has had significant impact on the speech recognition community. In this paper we introduce a general definition of a family of EBW transformations that can be associated with a weighted sum of updated and initial models.
In this paper, we consider a generalization of the state-of-art discriminative method for optimiz... more In this paper, we consider a generalization of the state-of-art discriminative method for optimizing the conditional likelihood in Hidden Markov Models (HMMs), called the Extended Baum-Welch (EBW) algorithm, that has had significant impact on the speech recognition community. We propose a generalized form of EBW update rules that can be associated with a weighted sum of updated and initial models, and demonstrate that using novel update rules can significantly speed up parameter estimation for Gaussian mixtures.
Abstract Audio classification has applications in a variety of contexts, such as automatic sound ... more Abstract Audio classification has applications in a variety of contexts, such as automatic sound analysis, supervised audio segmentation and in audio information search and retrieval. Extended Baum-Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures, though recently they have been applied in unsupervised audio segmentation. In this paper, we extend the use of these transformations to derive an audio classification algorithm.
We demonstrate the generalizability of the Extended Baum-Welch (EBW) algorithm not only for HMM p... more We demonstrate the generalizability of the Extended Baum-Welch (EBW) algorithm not only for HMM parameter estimation but for decoding as well. We show that there can exist a general function associated with the objective function under EBW that reduces to the well-known auxiliary function used in the Baum-Welch algorithm for maximum likelihood estimates.
Abstract The discrimination technique for estimating the parameters of Gaussian mixtures that is ... more Abstract The discrimination technique for estimating the parameters of Gaussian mixtures that is based on the extended Baum transformations (EB) has had significant impact on the speech recognition community. There appear to be no published proofs that definitively show that these transformations increase the value of an objective function with iteration (ie, so-called" growth transformations").
Abstract Accessibility in the workplace and in academic settings has increased dramatically for u... more Abstract Accessibility in the workplace and in academic settings has increased dramatically for users with disabilities, driven by greater awareness, legislative mandate, and technological improvements. Gaps, however, remain. For persons who are deaf and hard of hearing in particular, full participation requires complete access to audio materials, both for live settings and for prerecorded audio and visual information.
Abstract We describe extensions and improvements to IBM's system for automatic transcription of b... more Abstract We describe extensions and improvements to IBM's system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic training data, 80 hours more than for the system described in Chen et al.(1998). In addition to improvements obtained in 1997 we made a number of changes and algorithmic enhancements.
ABSTRACT This paper describes IBM's large vocabulary continuous speech recognition (LVCSR) system... more ABSTRACT This paper describes IBM's large vocabulary continuous speech recognition (LVCSR) system used in the 1997 Hub4 English evaluation. It focusses on extensions and improvements to the system used in the 1996 evaluation. The recognizer uses an additional 35 hours of training data over the one used in the 1996 Hub4 evaluation 8].
Abstract We describe techniques for enhancing the accuracy, efficiency and features of a low-reso... more Abstract We describe techniques for enhancing the accuracy, efficiency and features of a low-resource, medium-vocabulary, grammarbased speech recognition system.
Abstract We present a simple method for recovering sparse signals from a series of noisy observat... more Abstract We present a simple method for recovering sparse signals from a series of noisy observations. Our algorithm is a Kalman filter (KF) that utilize a so-called pseudo-measurement technique for optimizing the convex minimization problem following from the theory of compressed sensing (CS).
Uploads
Papers by Dimitri Kanevsky