Abstract Professor H.A. David has kindly pointed out that an expression for the joint probability... more Abstract Professor H.A. David has kindly pointed out that an expression for the joint probability distribution function of order statistics from overlapping samples, similar to equation (3) in our paper, had been previously given in the reference cited below. This expression could also have been used in deriving the results obtained later in our paper.
In the early 1990s, the state-of-the-art in commercial chromosome image acquisition was grayscale... more In the early 1990s, the state-of-the-art in commercial chromosome image acquisition was grayscale. Automated chromosome classification was based on the grayscale image and boundary information obtained during segmentation. Multi-spectral image acquisition was developed in 1990 and commercialized in the mid-1990s. One acquisition method, multiplex fluorescence in-situ hybridization (M-FISH), uses five color dyes. We previously introduced a segmentation algorithm for M-FISH images that minimizes the entropy of classified pixels within possible chromosomes. In this paper, we extend this entropy-minimization algorithm to work on raw image data, which removes the requirement for pixel classification. This method works by estimating entropy from raw image data rather than calculating entropy from classified pixels. A successful example image is given to illustrate the algorithm. Finally, it is determined that entropy estimation for minimum entropy segmentation adds a heavy computational burden without contributing any significant increase in classification performance, and thus not worth the effort.
Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore... more Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore combinatorial labeling technique (M-FISH) was developed wherein each class of chromosomes binds with a different combination of fluorophores. This results in a multispectral image, where each class of chromosomes has distinct spectral components. In this paper, we develop new methods for automatic chromosome identification by exploiting the multispectral information in M-FISH chromosome images and by jointly performing chromosome segmentation and classification. We (1) develop a maximum-likelihood hypothesis test that uses multispectral information, together with conventional criteria, to select the best segmentation possibility; (2) use this likelihood function to combine chromosome segmentation and classification into a robust chromosome identification system; and (3) show that the proposed likelihood function can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies, which can be indicators of radiation damage, cancer, and a wide variety of inherited diseases. We show that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes. We also show that it outperforms past M-FISH classification techniques that do not use segmentation information.
This paper describes the application of an amplitude modulation-frequency modulation (AM-FM) imag... more This paper describes the application of an amplitude modulation-frequency modulation (AM-FM) image representation in segmenting electron micrographs of skeletal muscle for the recognition of: 1) normal sarcomere ultrastructural pattern and 2) abnormal regions that occur in sarcomeres in various myopathies. A total of 26 electron micrographs from different myopathies were used for this study. It is shown that the AM-FM image representation can identify normal repetitive structures and sarcomeres, with a good degree of accuracy. This system can also detect abnormalities in sarcomeres which alter the normal regular pattern, as seen in muscle pathology, with a recognition accuracy of 75%-84% as compared to a human expert.
We propose a new anisotropic diffusion filter for denoising low signal-to-noise (SNR) molecular i... more We propose a new anisotropic diffusion filter for denoising low signal-to-noise (SNR) molecular images. This filter, which incorporates a median filter into the diffusion steps, is called an anisotropic median-diffusion filter. This hybrid filter achieved much better noise suppression with minimum edge blurring compared to the original anisotropic diffusion filter when it was tested on an image created based on a molecular image model. The universal quality index, proposed in this paper to measure the effectiveness of denoising, suggests that the anisotropic median-diffusion filter can retain adherence to the original image intensities and contrasts better than other filters. In addition, the performance of the filter is less sensitive to the selection of the image gradient threshold during diffusion, thus making automatic image denoising easier than with the original anisotropic diffusion filter.
We have developed a novel, model-based active contour algorithm, termed "snakules", for the annot... more We have developed a novel, model-based active contour algorithm, termed "snakules", for the annotation of spicules on mammography. At each suspect spiculated mass location that has been identified by either a radiologist or a computer-aided detection (CADe) algorithm, we deploy snakules that are converging open-ended active contours also known as snakes. The set of convergent snakules have the ability to deform, grow and adapt to the true spicules in the image, by an attractive process of curve evolution and motion that optimizes the local matching energy. Starting from a natural set of automatically detected candidate points, snakules are deployed in the region around a suspect spiculated mass location. Statistics of prior physical measurements of spiculated masses on mammography are used in the process of detecting the set of candidate points. Observer studies with experienced radiologists to evaluate the performance of snakules demonstrate the potential of the algorithm as an image analysis technique to improve the specificity of CADe algorithms and as a CADe prompting tool.
Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that ... more Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that allows simultaneous analysis of numerical and structural abnormalities of whole human chromosomes. Chromosomes are stained combinatorially in M-FISH. By analyzing the intensity combinations of each pixel, all chromosome pixels in an image are classified.
Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that ... more Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that allows simultaneous analysis of numerical and structural abnormalities of whole human chromosomes. Chromosomes are stained combinatorially in M-FISH. By analyzing the intensity combinations of each pixel, all chromosome pixels in an image are classified. Often, the intensity distributions between different images are found to be considerably different and the difference becomes the source of misclassifications of the pixels. Improved pixel classification accuracy is the most important task to ensure the success of the M-FISH technique. In this paper, we introduce a new feature normalization method for M-FISH images that reduces the difference in the feature distributions among different images using the expectation maximization (EM) algorithm. We also introduce a new unsupervised, nonparametric classification method for M-FISH images. The performance of the classifier is as accurate as the maximum-likelihood classifier, whose accuracy also significantly improved after the EM normalization. We would expect that any classifier will likely produce an improved classification accuracy following the EM normalization. Since the developed classification method does not require training data, it is highly convenient when ground truth does not exist. A significant improvement was achieved on the pixel classification accuracy after the new feature normalization. Indeed, the overall pixel classification accuracy improved by 20% after EM normalization.
Abstract-The measurement of instantaneous or locally-oc-curring signal frequencies is focal to a ... more Abstract-The measurement of instantaneous or locally-oc-curring signal frequencies is focal to a wide variety of variably-dimensioned signal processing applications. The task is par-ticularly well-motivated for analyzing globally nonstationary, locally coherent signals having a ...
IEEE Transactions on Information Technology in Biomedicine, 2000
Assessment of classifier performance is critical for fair comparison of methods, including consid... more Assessment of classifier performance is critical for fair comparison of methods, including considering alternative models or parameters during system design. The assessment must not only provide meaningful data on the classifier efficacy, but it must do so in a concise and clear manner. For two-class classification problems, receiver operating characteristic analysis provides a clear and concise assessment methodology for reporting performance and comparing competing systems. However, many other important biomedical questions cannot be posed as "two-class" classification tasks and more than two classes are often necessary. While several methods have been proposed for assessing the performance of classifiers for such multiclass problems, none has been widely accepted. The purpose of this paper is to critically review methods that have been proposed for assessing multiclass classifiers. A number of these methods provide a classifier performance index called the volume under surface (VUS). Empirical comparisons are carried out using 4 three-class case studies, in which three popular classification techniques are evaluated with these methods. Since the same classifier was assessed using multiple performance indexes, it is possible to gain insight into the relative strengths and weakness of the measures. We conclude that: 1) the method proposed by Scurfield provides the most detailed description of classifier performance and insight about the sources of error in a given classification task and 2) the methods proposed by He and Nakas also have great practical utility as they provide both the VUS and an estimate of the variance of the VUS. These estimates can be used to statistically compare two classification algorithms.
IEEE Transactions on Information Forensics and Security, 2000
We introduce a novel multimodal framework for face recognition based on local attributes calculat... more We introduce a novel multimodal framework for face recognition based on local attributes calculated from range and portrait image pairs. Gabor coefficients are computed at automatically detected landmark locations and combined with powerful anthropometric features defined in the form of geodesic and Euclidean distances between pairs of fiducial points. We make the pragmatic assumption that the 2-D and 3-D data is acquired passively (e.g., via stereo ranging) with perfect registration between the portrait data and the range data. Statistical learning approaches are evaluated independently to reduce the dimensionality of the 2-D and 3-D Gabor coefficients and the anthropometric distances. Three parallel face recognizers that result from applying the best performing statistical learning schemes are fused at the match score-level to construct a unified multimodal (2-D+3-D) face recognition system with boosted performance. Performance of the proposed algorithm is evaluated on a large public database of range and portrait image pairs and found to perform quite well.
Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flex... more Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users' quality of experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online TVSQ prediction in HTTP-based streaming.
We describe a new 3D saliency prediction model that accounts for diverse low-level luminance, chr... more We describe a new 3D saliency prediction model that accounts for diverse low-level luminance, chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type. The model also accounts for perceptual factors, such as the nonuniform resolution of the human eye, stereoscopic limits imposed by Panum's fusional area, and the predicted degree of (dis) comfort felt, when viewing the 3D video. The high-level analysis involves classification of each 3D video scene by type with regard to estimated camera motion and the motions of objects in the videos. Decisions regarding the relative saliency of objects or regions are supported by data obtained through a series of eye-tracking experiments. The algorithm developed from the model elements operates by finding and segmenting salient 3D space-time regions in a video, then calculating the saliency strength of each segment using measured attributes of motion, disparity, texture, and the predicted degree of visual discomfort experienced. The saliency energy of both segmented objects and frames are weighted using models of human foveation and Panum's fusional area yielding a single predictor of 3D saliency.
In the above paper, Egger and Li presented a set of two-channel filterbanks, asymmetrical filterb... more In the above paper, Egger and Li presented a set of two-channel filterbanks, asymmetrical filterbanks (AFB's), for image coding applications. The basic properties of these filters are linear-phase, perfect reconstruction, asymmetric lengths for dual filters, and maximum regularity. In this correspondence, we point out that the proposed AFB's are not new in the sense that the proposed construction is equivalent to the factorization of Lagrange halfband filters, which has been reported by other researchers. In addition, we correct an error in the formulation of constructing AFBs in their paper.
Natural scene statistics have played an increasingly important role in both our understanding of ... more Natural scene statistics have played an increasingly important role in both our understanding of the function and evolution of the human vision system, and in the development of modern image processing applications. Because range (egocentric distance) is arguably the most important thing a visual system must compute (from an evolutionary perspective), the joint statistics between image information (color and luminance) and range information are of particular interest. It seems obvious that where there is a depth discontinuity, there must be a higher probability of a brightness or color discontinuity too. This is true, but the more interesting case is in the other direction--because image information is much more easily computed than range information, the key conditional probabilities are those of finding a range discontinuity given an image discontinuity. Here, the intuition is much weaker; the plethora of shadows and textures in the natural environment imply that many image discontinuities must exist without corresponding changes in range. In this paper, we extend previous work in two ways--we use as our starting point a very high quality data set of coregistered color and range values collected specifically for this purpose, and we evaluate the statistics of perceptually relevant chromatic information in addition to luminance, range, and binocular disparity information. The most fundamental finding is that the probabilities of finding range changes do in fact depend in a useful and systematic way on color and luminance changes; larger range changes are associated with larger image changes. Second, we are able to parametrically model the prior marginal and conditional distributions of luminance, color, range, and (computed) binocular disparity. Finally, we provide a proof of principle that this information is useful by showing that our distribution models improve the performance of a Bayesian stereo algorithm on an independent set of input images. To summarize, we show that there is useful information about range in very low-level luminance and color information. To a system sensitive to this statistical information, it amounts to an additional (and only recently appreciated) depth cue, and one that is trivial to compute from the image data. We are confident that this information is robust, in that we go to great effort and expense to collect very high quality raw data. Finally, we demonstrate the practical utility of these findings by using them to improve the performance of a Bayesian stereo algorithm.
A crucial step in the assessment of an image compression method is the evaluation of the perceive... more A crucial step in the assessment of an image compression method is the evaluation of the perceived quality of the compressed images. Typically, researchers ask observers to rate perceived image quality directly and use these rating measures, averaged across observers and images, to assess how image quality degrades with increasing compression.
Compressive sensing (CS) makes it possible to more naturally create compact representations of da... more Compressive sensing (CS) makes it possible to more naturally create compact representations of data with respect to a desired data rate. Through wavelet decomposition, smooth and piecewise smooth signals can be represented as sparse and compressible coefficients. These coefficients can then be effectively compressed via the CS. Since a wavelet transform divides image information into layered blockwise wavelet coefficients over spatial and frequency domains, visual improvement can be attained by an appropriate perceptually weighted CS scheme. We introduce such a method in this paper and compare it with the conventional CS. The resulting visual CS model is shown to deliver improved visual reconstructions.
We introduce a new approach to image estimation based on a flexible constraint framework that enc... more We introduce a new approach to image estimation based on a flexible constraint framework that encapsulates meaningful structural image assumptions. Piecewise image models (PIM's) and local image models (LIM's) are defined and utilized to estimate noise-corrupted images. PIM's and LIM's are defined by image sets obeying certain piecewise or local image properties, such as piecewise linearity, or local monotonicity. By optimizing local image characteristics imposed by the models, image estimates are produced with respect to the characteristic sets defined by the models. Thus, we propose a new general formulation for nonlinear set-theoretic image estimation. Detailed image estimation algorithms and examples are given using two PIM's: piecewise constant (PICO) and piecewise linear (PILI) models, and two LIM's: locally monotonic (LOMO) and locally convex/concave (LOCO) models. These models define properties that hold over local image neighborhoods, and the corresponding image estimates may be inexpensively computed by iterative optimization algorithms. Forcing the model constraints to hold at every image coordinate of the solution defines a nonlinear regression problem that is generally nonconvex and combinatorial. However, approximate solutions may be computed in reasonable time using the novel generalized deterministic annealing (GDA) optimization technique, which is particularly well suited for locally constrained problems of this type. Results are given for corrupted imagery with signal-to-noise ratio (SNR) as low as 2 dB, demonstrating high quality image estimation as measured by local feature integrity, and improvement in SNR.
We apply an AM-FM surface albedo model to analyze the projection of surface patterns viewed throu... more We apply an AM-FM surface albedo model to analyze the projection of surface patterns viewed through a binocular camera system. This is used to support the use of modulation-based stereo matching where local image phase is used to compute stereo disparities. Local image phase is an advantagous feature for image matching, since the problem of computing disparities reduces to identifying local phase shifts between the stereoscopic image data. Local phase shifts, however, are problematic at high frequencies due to phase wrapping when disparities exceed 6. We meld powerful multichannel Gabor image demodulation techniques for multiscale (coarse-to-fine) computation of local image phase with a disparity channel model for depth computation. The resulting framework unifies phasebased matching approaches with AM-FM surface/image models. We demonstrate the concepts in a stereo algorithm that generates a dense, accurate disparity map without the problems associated with phase wrapping.
Abstract Professor H.A. David has kindly pointed out that an expression for the joint probability... more Abstract Professor H.A. David has kindly pointed out that an expression for the joint probability distribution function of order statistics from overlapping samples, similar to equation (3) in our paper, had been previously given in the reference cited below. This expression could also have been used in deriving the results obtained later in our paper.
In the early 1990s, the state-of-the-art in commercial chromosome image acquisition was grayscale... more In the early 1990s, the state-of-the-art in commercial chromosome image acquisition was grayscale. Automated chromosome classification was based on the grayscale image and boundary information obtained during segmentation. Multi-spectral image acquisition was developed in 1990 and commercialized in the mid-1990s. One acquisition method, multiplex fluorescence in-situ hybridization (M-FISH), uses five color dyes. We previously introduced a segmentation algorithm for M-FISH images that minimizes the entropy of classified pixels within possible chromosomes. In this paper, we extend this entropy-minimization algorithm to work on raw image data, which removes the requirement for pixel classification. This method works by estimating entropy from raw image data rather than calculating entropy from classified pixels. A successful example image is given to illustrate the algorithm. Finally, it is determined that entropy estimation for minimum entropy segmentation adds a heavy computational burden without contributing any significant increase in classification performance, and thus not worth the effort.
Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore... more Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore combinatorial labeling technique (M-FISH) was developed wherein each class of chromosomes binds with a different combination of fluorophores. This results in a multispectral image, where each class of chromosomes has distinct spectral components. In this paper, we develop new methods for automatic chromosome identification by exploiting the multispectral information in M-FISH chromosome images and by jointly performing chromosome segmentation and classification. We (1) develop a maximum-likelihood hypothesis test that uses multispectral information, together with conventional criteria, to select the best segmentation possibility; (2) use this likelihood function to combine chromosome segmentation and classification into a robust chromosome identification system; and (3) show that the proposed likelihood function can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies, which can be indicators of radiation damage, cancer, and a wide variety of inherited diseases. We show that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes. We also show that it outperforms past M-FISH classification techniques that do not use segmentation information.
This paper describes the application of an amplitude modulation-frequency modulation (AM-FM) imag... more This paper describes the application of an amplitude modulation-frequency modulation (AM-FM) image representation in segmenting electron micrographs of skeletal muscle for the recognition of: 1) normal sarcomere ultrastructural pattern and 2) abnormal regions that occur in sarcomeres in various myopathies. A total of 26 electron micrographs from different myopathies were used for this study. It is shown that the AM-FM image representation can identify normal repetitive structures and sarcomeres, with a good degree of accuracy. This system can also detect abnormalities in sarcomeres which alter the normal regular pattern, as seen in muscle pathology, with a recognition accuracy of 75%-84% as compared to a human expert.
We propose a new anisotropic diffusion filter for denoising low signal-to-noise (SNR) molecular i... more We propose a new anisotropic diffusion filter for denoising low signal-to-noise (SNR) molecular images. This filter, which incorporates a median filter into the diffusion steps, is called an anisotropic median-diffusion filter. This hybrid filter achieved much better noise suppression with minimum edge blurring compared to the original anisotropic diffusion filter when it was tested on an image created based on a molecular image model. The universal quality index, proposed in this paper to measure the effectiveness of denoising, suggests that the anisotropic median-diffusion filter can retain adherence to the original image intensities and contrasts better than other filters. In addition, the performance of the filter is less sensitive to the selection of the image gradient threshold during diffusion, thus making automatic image denoising easier than with the original anisotropic diffusion filter.
We have developed a novel, model-based active contour algorithm, termed "snakules", for the annot... more We have developed a novel, model-based active contour algorithm, termed "snakules", for the annotation of spicules on mammography. At each suspect spiculated mass location that has been identified by either a radiologist or a computer-aided detection (CADe) algorithm, we deploy snakules that are converging open-ended active contours also known as snakes. The set of convergent snakules have the ability to deform, grow and adapt to the true spicules in the image, by an attractive process of curve evolution and motion that optimizes the local matching energy. Starting from a natural set of automatically detected candidate points, snakules are deployed in the region around a suspect spiculated mass location. Statistics of prior physical measurements of spiculated masses on mammography are used in the process of detecting the set of candidate points. Observer studies with experienced radiologists to evaluate the performance of snakules demonstrate the potential of the algorithm as an image analysis technique to improve the specificity of CADe algorithms and as a CADe prompting tool.
Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that ... more Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that allows simultaneous analysis of numerical and structural abnormalities of whole human chromosomes. Chromosomes are stained combinatorially in M-FISH. By analyzing the intensity combinations of each pixel, all chromosome pixels in an image are classified.
Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that ... more Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that allows simultaneous analysis of numerical and structural abnormalities of whole human chromosomes. Chromosomes are stained combinatorially in M-FISH. By analyzing the intensity combinations of each pixel, all chromosome pixels in an image are classified. Often, the intensity distributions between different images are found to be considerably different and the difference becomes the source of misclassifications of the pixels. Improved pixel classification accuracy is the most important task to ensure the success of the M-FISH technique. In this paper, we introduce a new feature normalization method for M-FISH images that reduces the difference in the feature distributions among different images using the expectation maximization (EM) algorithm. We also introduce a new unsupervised, nonparametric classification method for M-FISH images. The performance of the classifier is as accurate as the maximum-likelihood classifier, whose accuracy also significantly improved after the EM normalization. We would expect that any classifier will likely produce an improved classification accuracy following the EM normalization. Since the developed classification method does not require training data, it is highly convenient when ground truth does not exist. A significant improvement was achieved on the pixel classification accuracy after the new feature normalization. Indeed, the overall pixel classification accuracy improved by 20% after EM normalization.
Abstract-The measurement of instantaneous or locally-oc-curring signal frequencies is focal to a ... more Abstract-The measurement of instantaneous or locally-oc-curring signal frequencies is focal to a wide variety of variably-dimensioned signal processing applications. The task is par-ticularly well-motivated for analyzing globally nonstationary, locally coherent signals having a ...
IEEE Transactions on Information Technology in Biomedicine, 2000
Assessment of classifier performance is critical for fair comparison of methods, including consid... more Assessment of classifier performance is critical for fair comparison of methods, including considering alternative models or parameters during system design. The assessment must not only provide meaningful data on the classifier efficacy, but it must do so in a concise and clear manner. For two-class classification problems, receiver operating characteristic analysis provides a clear and concise assessment methodology for reporting performance and comparing competing systems. However, many other important biomedical questions cannot be posed as "two-class" classification tasks and more than two classes are often necessary. While several methods have been proposed for assessing the performance of classifiers for such multiclass problems, none has been widely accepted. The purpose of this paper is to critically review methods that have been proposed for assessing multiclass classifiers. A number of these methods provide a classifier performance index called the volume under surface (VUS). Empirical comparisons are carried out using 4 three-class case studies, in which three popular classification techniques are evaluated with these methods. Since the same classifier was assessed using multiple performance indexes, it is possible to gain insight into the relative strengths and weakness of the measures. We conclude that: 1) the method proposed by Scurfield provides the most detailed description of classifier performance and insight about the sources of error in a given classification task and 2) the methods proposed by He and Nakas also have great practical utility as they provide both the VUS and an estimate of the variance of the VUS. These estimates can be used to statistically compare two classification algorithms.
IEEE Transactions on Information Forensics and Security, 2000
We introduce a novel multimodal framework for face recognition based on local attributes calculat... more We introduce a novel multimodal framework for face recognition based on local attributes calculated from range and portrait image pairs. Gabor coefficients are computed at automatically detected landmark locations and combined with powerful anthropometric features defined in the form of geodesic and Euclidean distances between pairs of fiducial points. We make the pragmatic assumption that the 2-D and 3-D data is acquired passively (e.g., via stereo ranging) with perfect registration between the portrait data and the range data. Statistical learning approaches are evaluated independently to reduce the dimensionality of the 2-D and 3-D Gabor coefficients and the anthropometric distances. Three parallel face recognizers that result from applying the best performing statistical learning schemes are fused at the match score-level to construct a unified multimodal (2-D+3-D) face recognition system with boosted performance. Performance of the proposed algorithm is evaluated on a large public database of range and portrait image pairs and found to perform quite well.
Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flex... more Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users' quality of experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online TVSQ prediction in HTTP-based streaming.
We describe a new 3D saliency prediction model that accounts for diverse low-level luminance, chr... more We describe a new 3D saliency prediction model that accounts for diverse low-level luminance, chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type. The model also accounts for perceptual factors, such as the nonuniform resolution of the human eye, stereoscopic limits imposed by Panum's fusional area, and the predicted degree of (dis) comfort felt, when viewing the 3D video. The high-level analysis involves classification of each 3D video scene by type with regard to estimated camera motion and the motions of objects in the videos. Decisions regarding the relative saliency of objects or regions are supported by data obtained through a series of eye-tracking experiments. The algorithm developed from the model elements operates by finding and segmenting salient 3D space-time regions in a video, then calculating the saliency strength of each segment using measured attributes of motion, disparity, texture, and the predicted degree of visual discomfort experienced. The saliency energy of both segmented objects and frames are weighted using models of human foveation and Panum's fusional area yielding a single predictor of 3D saliency.
In the above paper, Egger and Li presented a set of two-channel filterbanks, asymmetrical filterb... more In the above paper, Egger and Li presented a set of two-channel filterbanks, asymmetrical filterbanks (AFB's), for image coding applications. The basic properties of these filters are linear-phase, perfect reconstruction, asymmetric lengths for dual filters, and maximum regularity. In this correspondence, we point out that the proposed AFB's are not new in the sense that the proposed construction is equivalent to the factorization of Lagrange halfband filters, which has been reported by other researchers. In addition, we correct an error in the formulation of constructing AFBs in their paper.
Natural scene statistics have played an increasingly important role in both our understanding of ... more Natural scene statistics have played an increasingly important role in both our understanding of the function and evolution of the human vision system, and in the development of modern image processing applications. Because range (egocentric distance) is arguably the most important thing a visual system must compute (from an evolutionary perspective), the joint statistics between image information (color and luminance) and range information are of particular interest. It seems obvious that where there is a depth discontinuity, there must be a higher probability of a brightness or color discontinuity too. This is true, but the more interesting case is in the other direction--because image information is much more easily computed than range information, the key conditional probabilities are those of finding a range discontinuity given an image discontinuity. Here, the intuition is much weaker; the plethora of shadows and textures in the natural environment imply that many image discontinuities must exist without corresponding changes in range. In this paper, we extend previous work in two ways--we use as our starting point a very high quality data set of coregistered color and range values collected specifically for this purpose, and we evaluate the statistics of perceptually relevant chromatic information in addition to luminance, range, and binocular disparity information. The most fundamental finding is that the probabilities of finding range changes do in fact depend in a useful and systematic way on color and luminance changes; larger range changes are associated with larger image changes. Second, we are able to parametrically model the prior marginal and conditional distributions of luminance, color, range, and (computed) binocular disparity. Finally, we provide a proof of principle that this information is useful by showing that our distribution models improve the performance of a Bayesian stereo algorithm on an independent set of input images. To summarize, we show that there is useful information about range in very low-level luminance and color information. To a system sensitive to this statistical information, it amounts to an additional (and only recently appreciated) depth cue, and one that is trivial to compute from the image data. We are confident that this information is robust, in that we go to great effort and expense to collect very high quality raw data. Finally, we demonstrate the practical utility of these findings by using them to improve the performance of a Bayesian stereo algorithm.
A crucial step in the assessment of an image compression method is the evaluation of the perceive... more A crucial step in the assessment of an image compression method is the evaluation of the perceived quality of the compressed images. Typically, researchers ask observers to rate perceived image quality directly and use these rating measures, averaged across observers and images, to assess how image quality degrades with increasing compression.
Compressive sensing (CS) makes it possible to more naturally create compact representations of da... more Compressive sensing (CS) makes it possible to more naturally create compact representations of data with respect to a desired data rate. Through wavelet decomposition, smooth and piecewise smooth signals can be represented as sparse and compressible coefficients. These coefficients can then be effectively compressed via the CS. Since a wavelet transform divides image information into layered blockwise wavelet coefficients over spatial and frequency domains, visual improvement can be attained by an appropriate perceptually weighted CS scheme. We introduce such a method in this paper and compare it with the conventional CS. The resulting visual CS model is shown to deliver improved visual reconstructions.
We introduce a new approach to image estimation based on a flexible constraint framework that enc... more We introduce a new approach to image estimation based on a flexible constraint framework that encapsulates meaningful structural image assumptions. Piecewise image models (PIM's) and local image models (LIM's) are defined and utilized to estimate noise-corrupted images. PIM's and LIM's are defined by image sets obeying certain piecewise or local image properties, such as piecewise linearity, or local monotonicity. By optimizing local image characteristics imposed by the models, image estimates are produced with respect to the characteristic sets defined by the models. Thus, we propose a new general formulation for nonlinear set-theoretic image estimation. Detailed image estimation algorithms and examples are given using two PIM's: piecewise constant (PICO) and piecewise linear (PILI) models, and two LIM's: locally monotonic (LOMO) and locally convex/concave (LOCO) models. These models define properties that hold over local image neighborhoods, and the corresponding image estimates may be inexpensively computed by iterative optimization algorithms. Forcing the model constraints to hold at every image coordinate of the solution defines a nonlinear regression problem that is generally nonconvex and combinatorial. However, approximate solutions may be computed in reasonable time using the novel generalized deterministic annealing (GDA) optimization technique, which is particularly well suited for locally constrained problems of this type. Results are given for corrupted imagery with signal-to-noise ratio (SNR) as low as 2 dB, demonstrating high quality image estimation as measured by local feature integrity, and improvement in SNR.
We apply an AM-FM surface albedo model to analyze the projection of surface patterns viewed throu... more We apply an AM-FM surface albedo model to analyze the projection of surface patterns viewed through a binocular camera system. This is used to support the use of modulation-based stereo matching where local image phase is used to compute stereo disparities. Local image phase is an advantagous feature for image matching, since the problem of computing disparities reduces to identifying local phase shifts between the stereoscopic image data. Local phase shifts, however, are problematic at high frequencies due to phase wrapping when disparities exceed 6. We meld powerful multichannel Gabor image demodulation techniques for multiscale (coarse-to-fine) computation of local image phase with a disparity channel model for depth computation. The resulting framework unifies phasebased matching approaches with AM-FM surface/image models. We demonstrate the concepts in a stereo algorithm that generates a dense, accurate disparity map without the problems associated with phase wrapping.
Uploads
Papers by Alan Bovik