Papers by Patrick Haffner

TRECVID, 2007
AT&T participated in two tasks at TRECVID 2007: shot boundary detection (SBD) and rushes summariz... more AT&T participated in two tasks at TRECVID 2007: shot boundary detection (SBD) and rushes summarization. The SBD system developed for TRECVID 2006 was enhanced for robustness and efficiency. New visual features are extracted for cut, dissolve, and fast dissolve detectors, and SVM based verification method is used to boost the accuracy. The speed is improved by a more streamlined processing with on-the-fly result fusion. We submitted 10 runs for SBD evaluation task. The best result (TT05) was achieved with the following configuration: SVM based verification method; more training data that includes 2004, 2005, and 2006 SBD data; no SVM boundary adjustment; training SVM with high generalization capability (e.g., a smaller value of C). As a pilot task, rushes summarization aims to show the main objects and events in the raw material with least redundancy while maximizing the usability. We proposed a multimodal rushes summarization method that relies on both face and speech information. Evaluation results show that the new SBD system is highly effective and the human centric rushes summarization approach is concise and easy to understand.
We present a new image compression technique called \DjVu " that is speci cally geared towards th... more We present a new image compression technique called \DjVu " that is speci cally geared towards the compression of high-resolution, high-quality images of scanned documents in color. With DjVu , any screen connected to the Internet can access and display images of scanned p ages while faithfully reproducing the font, color, drawing, pictures, and paper texture. A typical magazine page in color at 300dpi can be c ompressed down to between 40 to 60 KB, approximately 5 to 10 times better than JPEG for a similar level of subjective quality. B&W documents are typically 15 to 30 KBytes at 300dpi, or 4 to 8 times better than CCITT-G4. A r eal-time, memory e cient version of the decoder was implemented, and is available as a plug-in for popular web browsers.

R esum e Nous pr esentons une technique nouvelle de compression d'images appel ee DjVu". Cette te... more R esum e Nous pr esentons une technique nouvelle de compression d'images appel ee DjVu". Cette technique est sp ecialement con cue pour la compression de documents en couleurs num eris es a haute r esolution. Un chier DjVu repr esentant une page typique d'un magazine en couleurs, num eris ee a 300 points par pouce dpi, requiert entre 40 et 80 KB, ce qui est est 5 a 10 fois meilleur qu'un chier JPEG o rant une lisibilit e similaire. Le c ompresseur DjVu c ommence p ar classer chaque pixel de l'image num eris ee comme pixel d'avant-plan texte, dessins au trait ou pixel d'arri ere-plan images, photos, texture d u p apier grâce a une combinaison de Mod eles de Markov Cach es HMM et d'heuristiques fond es sur le principe de Minimum Description Length MDL. Cette classi cation forme une image bitonale qui est compress ee grâce a une technique qui tire p arti des similitudes de forme entre les divers caract eres composant l'avant-plan. Les images d'avant-plan et d'arri ere-plan sont ensuite compress ees a l'aide d'un algorithme de compression par ondelettes avec une r esolution reduite. Un algorithme de masquage minimise le nombre de bits utilis es pour coder les pixels d'avant-plan ou d'arri ere-plan qui ne sont pas visibles dans l'image nale. Des logiciels d'encodage et de d ecodage sont disponibles pour toutes les plateformes usuelles. Une extension de butineur browser plugin" permet de visualiser tr es e cacement les images DjVu sur le Web.

Proceedings of SPIE, Dec 27, 2000
Image-based digital documents are composed of multiple pages, each o f w h i c h m a y b e compos... more Image-based digital documents are composed of multiple pages, each o f w h i c h m a y b e composed of multiple components such as the text, pictures, background, and annotations. We describe the image structure and software architecture that allows the DjVu system to load and render the required components on demand while minimizing the bandwidth requirements, and the memory requirements in the client. DjVu d o c u m e n t les are merely a list of enriched URLs that point to individual les (or le elements) that contain image components. Image components include: text images, background images, shape dictionaries shared by m ultiple pages, OCRed text, and several types of annotations. A m ultithreaded software architecture with smart caching allows individual components to be loaded and pre-decoded and rendered on-demand. Pages are pre-fetched or loaded on demand, allowing users to randomly access pages without downloading the entire document, and without the help of a byte server. Components that are shared accross pages (e.g. shape dictionnaries, or background layers) are loaded as required and cached. This greatly reduces the overall bandwidth requirements. Shared dictionnaries allow 40% typical le size reduction for scanned bitonal documents at 300dpi. Compression ratios on scanned US patents at 300dpi are 5.2 to 10.2 times higher than GroupIV with shared dictionnaries and 3.6 to 8.5 times higher than GroupIV without shared dictionnaries.

Kernel methods have found in recent years wide use in statistical learning techniques due to thei... more Kernel methods have found in recent years wide use in statistical learning techniques due to their good performance and their computational efficiency in high-dimensional feature space. However, text or speech data cannot always be represented by the fixed-length vectors that the traditional kernels handle. We recently introduced a general kernel framework based on weighted transducers, rational kernels, to extend kernel methods to the analysis of variable-length sequences and weighted automata [5] and described their application to spoken-dialog applications. We presented a constructive algorithm for ensuring that rational kernels are positive definite symmetric, a property which guarantees the convergence of discriminant classification algorithms such as Support Vector Machines, and showed that many string kernels previously introduced in the computational biology literature are special instances of such positive definite symmetric rational kernels [4]. This paper reviews the essential results given in [5, 3, 4] and presents them in the form of a short tutorial.

E-mail has become indispensable in today's networked society. However, the huge and ever-growing ... more E-mail has become indispensable in today's networked society. However, the huge and ever-growing volume of spam has become a serious threat to this important communication medium. It not only affects e-mail recipients, but also causes a significant overload to mail servers which handle the e-mail transmission. We perform an extensive analysis of IP addresses and IP aggregates given by network-aware clusters in order to investigate properties that can distinguish the bulk of the legitimate mail and spam. Our analysis indicates that the bulk of the legitimate mail comes from long-lived IP addresses. We also find that the bulk of the spam comes from network clusters that are relatively long-lived. Our analysis suggests that network-aware clusters may provide a good aggregation scheme for exploiting the history and structure of IP addresses. We then consider the implications of this analysis for prioritizing legitimate mail. We focus on the situation when mail server is overloaded, and the goal is to maximize the legitimate mail that it accepts. We demonstrate that the history and the structure of the IP addresses can reduce the adverse impact of mail server overload, by increasing the number of legitimate e-mails accepted by a factor of 3.

Springer eBooks, 2003
Kernel methods are widely used in statistical learning techniques. We recently introduced a gener... more Kernel methods are widely used in statistical learning techniques. We recently introduced a general kernel framework based on weighted transducers or rational relations, rational kernels, to extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. These kernels are efficient to compute and have been successfully used in applications such as spoken-dialog classification. Not all rational kernels are positive definite and symmetric (PDS) however, a sufficient property for guaranteeing the convergence of discriminant classification algorithms such as Support Vector Machines. We present several theoretical results related to PDS rational kernels. We show in particular that under some conditions these kernels are closed under sum, product, or Kleene-closure and give a general method for constructing a PDS rational kernel from an arbitrary transducer defined on some non-idempotent semirings. We also show that some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler, and some string kernels used in the context of computational biology are specific instances of rational kernels. Our results include the proof that the edit-distance over a non-trivial alphabet is not negative definite, which, to the best of our knowledge, was never stated or proved before.
Lecture Notes in Computer Science, 1999

With rapid growth in smart phones and mobile data, effectively managing cellular data networks is... more With rapid growth in smart phones and mobile data, effectively managing cellular data networks is important in meeting user performance expectations. However, the scale, complexity and dynamics of a large 3G cellular network make it a challenging task to understand the diverse factors that affect its performance. In this paper we study the RNC (Radio Network Controller)-level performance in one of the largest cellular network carriers in US. Using large amount of datasets collected from various sources across the network and over time, we investigate the key factors that influence the network performance in terms of the round-trip times and loss rates (averaged over an hourly time scale). We start by performing the "first-order" property analysis to analyze the correlation and impact of each factor on the network performance. We then apply RuleFit-a powerful supervised machine learning tool that combines linear regression and decision trees-to develop models and analyze the relative importance of various factors in estimating and predicting the network performance. Our analysis culminates with the detection and diagnosis of both "transient" and "persistent" performance anomalies, with discussion on the complex interactions and differing effects of the various factors that may influence the 3G UMTS (Universal Mobile Telecommunications System) network performance.
The MIT Press eBooks, 2002
Maximum margin classifiers such as Support Vector Machines (SVMs) critically depends upon the con... more Maximum margin classifiers such as Support Vector Machines (SVMs) critically depends upon the convex hulls of the training samples of each class, as they implicitly search for the minimum distance between the convex hulls. We propose Extrapolated Vector Machines (XVMs) which rely on extrapolations outside these convex hulls. XVMs improve SVM generalization very significantly on the MNIST [7] OCR data. They share similarities with the Fisher discriminant: maximize the inter-class margin while minimizing the intra-class disparity.

TRECVID, 2006
TRECVID (TREC Video Retrieval Evaluation) is sponsored by NIST to encourage research in digital v... more TRECVID (TREC Video Retrieval Evaluation) is sponsored by NIST to encourage research in digital video indexing and retrieval. It was initiated in 2001 as a "video track" of TREC and became an independent evaluation in 2003. AT&T participated in three tasks in TRECVID 2006: shot boundary determination (SBD), search, and rushes exploitation. The proposed SBD algorithm contains a set of finite state machine (FSM) based detectors for pure cut, fast dissolve, fade in, fade out, dissolve, and wipe. Support vector machine (SVM) is applied to cut and dissolve detectors to further boost the SBD performance. AT&T collaborated with Columbia University in the search and rushes exploitation tasks. In this paper, we mainly focus on the SBD system and briefly introduce our effort on the search and the rushes exploitation. The AT&T SBD system is highly effective and its evaluation results are among the best.

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., Nov 21, 2003
Classification is a key task in spoken-dialog systems. The response of a spoken-dialog system is ... more Classification is a key task in spoken-dialog systems. The response of a spoken-dialog system is often guided by the category assigned to the speaker's utterance. Unfortunately, classifiers based on the one-best transcription of the speech utterances are not satisfactory because of the high word error rate of conversational speech recognition systems. Since the correct transcription may not be the highest ranking one but often will be represented in the word lattices output by the recognizer, the classification accuracy can be much higher if the full lattice is exploited both during training and classification. In this paper we present the first principled approach for classification based on full lattices. For this purpose, we use the Support Vector Machine (SVM) framework with kernels for lattices. The lattice kernel we define belongs to the general class of rational kernels. We give efficient algorithms for computing kernels for arbitrary lattices and report experiments using the algorithm in a difficult call-classification task with ¢ ¤ £ categories. Our experiments with a trigram lattice kernel show a ¥ § ¦ © reduction in error rate at a ¢ © © rejection level.

North American Chapter of the Association for Computational Linguistics, Jun 5, 2010
Bloggers, professional reviewers, and consumers continuously create opinion-rich web reviews abou... more Bloggers, professional reviewers, and consumers continuously create opinion-rich web reviews about products and services, with the result that textual reviews are now abundant on the web and often convey a useful overall rating (number of stars). However, an overall rating cannot express the multiple or conflicting opinions that might be contained in the text, or explicitly rate the different aspects of the evaluated entity. This work addresses the task of automatically predicting ratings, for given aspects of a textual review, by assigning a numerical score to each evaluated aspect in the reviews. We handle this task as both a regression and a classification modeling problem and explore several combinations of syntactic and semantic features. Our results suggest that classification techniques perform better than ranking modeling when handling evaluative text.

Speech Communication, Mar 1, 2006
Large margin classifiers, such as SVMs and AdaBoost, have achieved state-of-the-art performance f... more Large margin classifiers, such as SVMs and AdaBoost, have achieved state-of-the-art performance for semantic classification problems that occur in spoken language understanding or textual data mining applications. However, these computationally expensive learning algorithms cannot always handle the very large number of examples, features, and classes that are present in the available training corpora. This paper provides an original and unified presentation of these algorithms within the framework of regularized and large margin linear classifiers, reviews some available optimization techniques, and offers practical solutions to scaling issues. Systematic experiments compare the algorithms according to a number of criteria: performance, robustness, computational and memory requirements, and ease of parallelization. Furthermore, they confirm that the 1-vs-other multiclass scheme is a simple, generic and easy to implement baseline that has excellent scaling properties. Finally, this paper identifies the limitations of the classifiers and the multiclass schemes that are implemented.

Springer eBooks, 2008
One of the key tasks in the design of large-scale dialog systems is classification. This consists... more One of the key tasks in the design of large-scale dialog systems is classification. This consists of assigning, out of a finite set, a specific category to each spoken utterance, based on the output of a speech recognizer. Classification in general is a standard machine learning problem, but the objects to classify in this particular case are word lattices, or weighted automata, and not the fixed-size vectors learning algorithms were originally designed for. This chapter presents a general kernel-based learning framework for the design of classification algorithms for weighted automata. It introduces a family of kernels, rational kernels, that combined with support vector machines form powerful techniques for spokendialog classification and other classification tasks in text and speech processing. It describes efficient algorithms for their computation and reports the results of their use in several difficult spoken-dialog classification tasks based on deployed systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classification accuracy. The chapter also provides some theoretical results helpful for the design of rational kernels.

Neural Information Processing Systems, Dec 1, 1998
Signal processing and pattern recognition algorithms make extensive use of convolution. In many c... more Signal processing and pattern recognition algorithms make extensive use of convolution. In many cases, computational accuracy is not as important as computational speed. In feature extraction, for instance, the features of interest in a signal are usually quite distorted. This form of noise justifies some level of quantization in order to achieve faster feature extraction. Our approach consists of approximating regions of the signal with low degree polynomials, and then differentiating the resulting signals in order to obtain impulse functions (or derivatives of impulse functions). With this representation, convolution becomes extremely simple and can be implemented quite effectively. The true convolution can be recovered by integrating the result of the convolution. This method yields substantial speed up in feature extraction and is applicable to convolutional neural networks.
Uploads
Papers by Patrick Haffner