2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a rea... more In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometric distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketchto-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to "fool" the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-tophoto and photo-to-sketch mapping of in-domain data and generalizes it to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches.
Overlayed LR inputs HR intermediate frame Overlayed LR inputs DAIN+Bicubic DAIN+EDVR Ours * Equal... more Overlayed LR inputs HR intermediate frame Overlayed LR inputs DAIN+Bicubic DAIN+EDVR Ours * Equal contribution; † Equal advising. as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network. Then, we propose a deformable ConvL-STM to align and aggregate temporal information simultaneously for better leveraging global temporal contexts. Finally, a deep reconstruction network is adopted to predict HR slow-motion video frames. Extensive experiments on benchmark datasets demonstrate that the proposed method not only achieves better quantitative and qualitative performance but also is more than three times faster than recent two-stage state-of-the-art methods, e.g., DAIN+EDVR and DAIN+RBPN.
Applications that interact with the real world such as augmented reality or robot manipulation re... more Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refine and track the pose of the objects by matching the input image with rendered images.
bioRxiv (Cold Spring Harbor Laboratory), Jan 26, 2023
Abnormalities in biological cell nuclei morphology are correlated with cell cycle stages, disease... more Abnormalities in biological cell nuclei morphology are correlated with cell cycle stages, disease states, and various external stimuli. There have been many deep learning approaches that have described nuclei segmentation and analysis of nuclear morphology. One problem with many deep learning methods is acquiring large amounts of annotated nuclei data, which is generally expensive to obtain. In this paper, we propose a system to segment abnormally shaped nuclei with a limited amount of training data. We first generate specific shapes of synthetic nuclei groundtruth. We randomly sample these synthetic groundtruth images into training sets to train several Mask R-CNNs. We design an ensemble strategy to combine or fuse segmentation results from the Mask R-CNNs. We also design an oval nuclei removal by StarDist to reduce the false positives and improve the overall segmentation performance. Our experiments indicate that our method outperforms other methods in segmenting abnormally shaped nuclei.
IS&T International Symposium on Electronic Imaging Science and Technology, Jan 26, 2020
The evolving algorithms for 2D facial landmark detection empower people to recognize faces, analy... more The evolving algorithms for 2D facial landmark detection empower people to recognize faces, analyze facial expressions, etc. However, existing methods still encounter problems of unstable facial landmarks when applied to videos. Because previous research shows that the instability of facial landmarks is caused by the inconsistency of labeling quality among the public datasets, we want to have a better understanding of the influence of annotation noise in them. In this paper, we make the following contributions: 1) we propose two metrics that quantitatively measure the stability of detected facial landmarks, 2) we model the annotation noise in an existing public dataset, 3) we investigate the influence of different types of noise in training face alignment neural networks, and propose corresponding solutions. Our results demonstrate improvements in both accuracy and stability of detected facial landmarks.
In this paper, we explore the open-domain sketch-tophoto translation, which aims to synthesize a ... more In this paper, we explore the open-domain sketch-tophoto translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometry distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketch-to-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to "fool" the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-to-photo and photo-to-sketch mapping of in-domain data and generalizes them to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches.
In this paper, we address the space-time video super-resolution, which aims at generating a high-... more In this paper, we address the space-time video super-resolution, which aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. A naive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). Nevertheless, temporal interpolation and spatial upscaling are intra-related in this problem. Two-stage approaches cannot fully make use of this natural property. Besides, state-of-the-art VFI or VSR deep networks usually have a large frame reconstruction module in order to obtain high-quality photo-realistic video frames, which makes the two-stage approaches have large models and thus be relatively time-consuming. To overcome the issues, we present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video. Instead of reconstructing missing LR intermediate frames as VFI models do, we temporally interpolate LR frame features of the missing LR frames capturing local temporal contexts by a feature temporal interpolation module. Extensive experiments on widely used benchmarks demonstrate that the proposed framework not only achieves better qualitative and quantitative performance on both clean and noisy LR frames but also is several times faster than recent state-of-the-art two-stage networks. The source code is released in https:// github.com/ Mukosame/ Zooming-Slow-Mo-CVPR-2020.
Label-free cell imaging, where the cell is not "labeled" or modified by fluorescent chemicals, is... more Label-free cell imaging, where the cell is not "labeled" or modified by fluorescent chemicals, is an important research area in the field of biology. It avoids altering the cell's properties which typically happens in the process of chemical labeling. However, without the contrast enhancement from the label, the analysis of label-free imaging is more challenging than label-based imaging. In addition, it provides few human interpretable features, and thus needs machine learning approaches to help with the identification and tracking of specific cells. We are interested in label-free phase contrast imaging to track cells flowing in a cell sorting device where images are acquired at 500 frames/s. Existing Multiple Object Tracking (MOT) methods face four major challenges when used for tracking cells in a microfluidic sorting device: (i) most of the cells have large displacements between frames without any overlap; (ii) it is difficult to distinguish between cells as they are visually similar to each other; (iii) the velocities of cells vary with the location in the device; (iv) the appearance of cells may change as they move in and out of the focal plane of the imaging sensor that observes the isolation process. In this paper, we introduce a method for tracking cells in a predefined flow in the sorting device via phase contrast microscopy. Our proposed method is based on DeepSORT and YOLOv4 and exploits prior knowledge of a cell's velocity to assist tracking. We modify the Kalman filter in DeepSORT to accommodate a non-constant velocity motion model and integrate a representative velocity field obtained from fluid dynamics into the Kalman filter. The experimental results show that our proposed method outperforms several MOT methods for tracking cells in the sorting device.
Due to the limits of bandwidth and storage space, digital images are usually down-scaled and comp... more Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% shorter runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).
The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in ... more The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions ...
Recent progress in deep learning methods has shown that key steps in object detection and recogni... more Recent progress in deep learning methods has shown that key steps in object detection and recognition, including feature extraction, region proposals, and classification, can be done using Convolutional Neural Networks (CNN) with high accuracy. However, the use of CNNs for object detection and recognition has significant technical challenges that still need to be addressed. One of the most daunting problems is the very large number of training images required for each class/label. One way to address this problem is through the use of data augmentation methods where linear and nonlinear transforms are done on the training data to create "new" training images. Typical transformations include spatial flipping, warping and other deformations. An important concept of data augmentation is that the deformations applied to the labeled training images do not change the semantic meaning of the classes/labels. In this paper we investigate several approaches to data augmentation. First, several data augmentation techniques are used to increase the size of the training dataset. Then, a Faster R-CNN is trained with the augmented dataset for detect and recognize objects. Our work is focused on two different scenarios: detecting objects in the wild (i.e. commercial logos) and detecting objects captured using a camera mounted on a computer system (i.e. toy animals).
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
This paper looks at semi-supervised learning (SSL) for image-based text recognition. One of the m... more This paper looks at semi-supervised learning (SSL) for image-based text recognition. One of the most popular SSL approaches is pseudo-labeling (PL). PL approaches assign labels to unlabeled data before retraining the model with a combination of labeled and pseudo-labeled data. However, PL methods are severely degraded by noise and are prone to over-fitting to noisy labels, due to the inclusion of erroneous high confidence pseudo-labels generated from poorly calibrated models, thus, rendering threshold-based selection ineffective. Moreover, the combinatorial complexity of the hypothesis space and the error accumulation due to multiple incorrect autoregressive steps posit pseudo-labeling challenging for sequence models. To this end, we propose a pseudo-label generation and an uncertainty-based data selection framework for semi-supervised text recognition. We first use Beam-Search inference to yield highly probable hypotheses to assign pseudo-labels to the unlabelled examples. Then we adopt an ensemble of models, sampled by applying dropout, to obtain a robust estimate of the uncertainty associated with the prediction, considering both the character-level and word-level predictive distribution to select good quality pseudo-labels. Extensive experiments on several benchmark handwriting and scene-text datasets show that our method outperforms the baseline approaches and the previous stateof-the-art semi-supervised text-recognition methods.
2020 IEEE International Conference on Image Processing (ICIP)
Graininess noise is a common artifact in inkjet printing. While current inkjet printing technolog... more Graininess noise is a common artifact in inkjet printing. While current inkjet printing technologies attempt to control graininess in single color images, the results are often less than optimal for multi-color images. This is due to fluidic interactions between inks of different colors. This paper will describe a color decomposition methodology that can be used to study ink flow patterns in multi-color inkjet printed images at a microscopic scale. This technique is used to decompose multi-color images into several independent color components. The ink patterns in these components is analyzed to relate them to visually perceptible graininess noise.
A promising direction for recovering the lost information in lowresolution headshot images is uti... more A promising direction for recovering the lost information in lowresolution headshot images is utilizing a set of high-resolution exemplars from the same identity. Complementary images in the reference set can improve the generated headshot quality across many different views and poses. However, it is challenging to make the best use of multiple exemplars: the quality and alignment of each exemplar cannot be guaranteed. Using low-quality and mismatched images as references will impair the output results. To overcome these issues, we propose an efficient Headshot Image Super-Resolution with Multiple Exemplars network (HIME) method. Compared with previous methods, our network can effectively handle the misalignment between the input and the reference without requiring facial priors and learn the aggregated reference set representation in an end-to-end manner. Furthermore, to reconstruct more detailed facial features, we propose a correlation loss that provides a rich representation of the local texture in a controllable spatial range. Experimental results demonstrate that the proposed framework not only has significantly fewer computation cost than recent exemplar-guided methods but also achieves better qualitative and quantitative performance.
Printer identification based on a printed document can provide forensic information to protect co... more Printer identification based on a printed document can provide forensic information to protect copyright and verify authenticity. In this work, a stochastic dot interaction model that take in to consideration of scanner characteristics and print-scan channel noise is developed to predict the impact of embedding extrinsic signatures using laser intensity modulation. With this model, reflectance of the printout can be effectively estimated without extensive measurements. In addition, we proposed an optimization framework to select the modulation parameters that will maximize the embedding capacity and detection reliability. Preliminary analysis results such as achievable capacity and correct detection rate are discussed.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a rea... more In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometric distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketchto-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to "fool" the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-tophoto and photo-to-sketch mapping of in-domain data and generalizes it to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches.
Overlayed LR inputs HR intermediate frame Overlayed LR inputs DAIN+Bicubic DAIN+EDVR Ours * Equal... more Overlayed LR inputs HR intermediate frame Overlayed LR inputs DAIN+Bicubic DAIN+EDVR Ours * Equal contribution; † Equal advising. as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network. Then, we propose a deformable ConvL-STM to align and aggregate temporal information simultaneously for better leveraging global temporal contexts. Finally, a deep reconstruction network is adopted to predict HR slow-motion video frames. Extensive experiments on benchmark datasets demonstrate that the proposed method not only achieves better quantitative and qualitative performance but also is more than three times faster than recent two-stage state-of-the-art methods, e.g., DAIN+EDVR and DAIN+RBPN.
Applications that interact with the real world such as augmented reality or robot manipulation re... more Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refine and track the pose of the objects by matching the input image with rendered images.
bioRxiv (Cold Spring Harbor Laboratory), Jan 26, 2023
Abnormalities in biological cell nuclei morphology are correlated with cell cycle stages, disease... more Abnormalities in biological cell nuclei morphology are correlated with cell cycle stages, disease states, and various external stimuli. There have been many deep learning approaches that have described nuclei segmentation and analysis of nuclear morphology. One problem with many deep learning methods is acquiring large amounts of annotated nuclei data, which is generally expensive to obtain. In this paper, we propose a system to segment abnormally shaped nuclei with a limited amount of training data. We first generate specific shapes of synthetic nuclei groundtruth. We randomly sample these synthetic groundtruth images into training sets to train several Mask R-CNNs. We design an ensemble strategy to combine or fuse segmentation results from the Mask R-CNNs. We also design an oval nuclei removal by StarDist to reduce the false positives and improve the overall segmentation performance. Our experiments indicate that our method outperforms other methods in segmenting abnormally shaped nuclei.
IS&T International Symposium on Electronic Imaging Science and Technology, Jan 26, 2020
The evolving algorithms for 2D facial landmark detection empower people to recognize faces, analy... more The evolving algorithms for 2D facial landmark detection empower people to recognize faces, analyze facial expressions, etc. However, existing methods still encounter problems of unstable facial landmarks when applied to videos. Because previous research shows that the instability of facial landmarks is caused by the inconsistency of labeling quality among the public datasets, we want to have a better understanding of the influence of annotation noise in them. In this paper, we make the following contributions: 1) we propose two metrics that quantitatively measure the stability of detected facial landmarks, 2) we model the annotation noise in an existing public dataset, 3) we investigate the influence of different types of noise in training face alignment neural networks, and propose corresponding solutions. Our results demonstrate improvements in both accuracy and stability of detected facial landmarks.
In this paper, we explore the open-domain sketch-tophoto translation, which aims to synthesize a ... more In this paper, we explore the open-domain sketch-tophoto translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometry distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketch-to-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to "fool" the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-to-photo and photo-to-sketch mapping of in-domain data and generalizes them to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches.
In this paper, we address the space-time video super-resolution, which aims at generating a high-... more In this paper, we address the space-time video super-resolution, which aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. A naive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). Nevertheless, temporal interpolation and spatial upscaling are intra-related in this problem. Two-stage approaches cannot fully make use of this natural property. Besides, state-of-the-art VFI or VSR deep networks usually have a large frame reconstruction module in order to obtain high-quality photo-realistic video frames, which makes the two-stage approaches have large models and thus be relatively time-consuming. To overcome the issues, we present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video. Instead of reconstructing missing LR intermediate frames as VFI models do, we temporally interpolate LR frame features of the missing LR frames capturing local temporal contexts by a feature temporal interpolation module. Extensive experiments on widely used benchmarks demonstrate that the proposed framework not only achieves better qualitative and quantitative performance on both clean and noisy LR frames but also is several times faster than recent state-of-the-art two-stage networks. The source code is released in https:// github.com/ Mukosame/ Zooming-Slow-Mo-CVPR-2020.
Label-free cell imaging, where the cell is not "labeled" or modified by fluorescent chemicals, is... more Label-free cell imaging, where the cell is not "labeled" or modified by fluorescent chemicals, is an important research area in the field of biology. It avoids altering the cell's properties which typically happens in the process of chemical labeling. However, without the contrast enhancement from the label, the analysis of label-free imaging is more challenging than label-based imaging. In addition, it provides few human interpretable features, and thus needs machine learning approaches to help with the identification and tracking of specific cells. We are interested in label-free phase contrast imaging to track cells flowing in a cell sorting device where images are acquired at 500 frames/s. Existing Multiple Object Tracking (MOT) methods face four major challenges when used for tracking cells in a microfluidic sorting device: (i) most of the cells have large displacements between frames without any overlap; (ii) it is difficult to distinguish between cells as they are visually similar to each other; (iii) the velocities of cells vary with the location in the device; (iv) the appearance of cells may change as they move in and out of the focal plane of the imaging sensor that observes the isolation process. In this paper, we introduce a method for tracking cells in a predefined flow in the sorting device via phase contrast microscopy. Our proposed method is based on DeepSORT and YOLOv4 and exploits prior knowledge of a cell's velocity to assist tracking. We modify the Kalman filter in DeepSORT to accommodate a non-constant velocity motion model and integrate a representative velocity field obtained from fluid dynamics into the Kalman filter. The experimental results show that our proposed method outperforms several MOT methods for tracking cells in the sorting device.
Due to the limits of bandwidth and storage space, digital images are usually down-scaled and comp... more Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% shorter runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).
The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in ... more The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions ...
Recent progress in deep learning methods has shown that key steps in object detection and recogni... more Recent progress in deep learning methods has shown that key steps in object detection and recognition, including feature extraction, region proposals, and classification, can be done using Convolutional Neural Networks (CNN) with high accuracy. However, the use of CNNs for object detection and recognition has significant technical challenges that still need to be addressed. One of the most daunting problems is the very large number of training images required for each class/label. One way to address this problem is through the use of data augmentation methods where linear and nonlinear transforms are done on the training data to create "new" training images. Typical transformations include spatial flipping, warping and other deformations. An important concept of data augmentation is that the deformations applied to the labeled training images do not change the semantic meaning of the classes/labels. In this paper we investigate several approaches to data augmentation. First, several data augmentation techniques are used to increase the size of the training dataset. Then, a Faster R-CNN is trained with the augmented dataset for detect and recognize objects. Our work is focused on two different scenarios: detecting objects in the wild (i.e. commercial logos) and detecting objects captured using a camera mounted on a computer system (i.e. toy animals).
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
This paper looks at semi-supervised learning (SSL) for image-based text recognition. One of the m... more This paper looks at semi-supervised learning (SSL) for image-based text recognition. One of the most popular SSL approaches is pseudo-labeling (PL). PL approaches assign labels to unlabeled data before retraining the model with a combination of labeled and pseudo-labeled data. However, PL methods are severely degraded by noise and are prone to over-fitting to noisy labels, due to the inclusion of erroneous high confidence pseudo-labels generated from poorly calibrated models, thus, rendering threshold-based selection ineffective. Moreover, the combinatorial complexity of the hypothesis space and the error accumulation due to multiple incorrect autoregressive steps posit pseudo-labeling challenging for sequence models. To this end, we propose a pseudo-label generation and an uncertainty-based data selection framework for semi-supervised text recognition. We first use Beam-Search inference to yield highly probable hypotheses to assign pseudo-labels to the unlabelled examples. Then we adopt an ensemble of models, sampled by applying dropout, to obtain a robust estimate of the uncertainty associated with the prediction, considering both the character-level and word-level predictive distribution to select good quality pseudo-labels. Extensive experiments on several benchmark handwriting and scene-text datasets show that our method outperforms the baseline approaches and the previous stateof-the-art semi-supervised text-recognition methods.
2020 IEEE International Conference on Image Processing (ICIP)
Graininess noise is a common artifact in inkjet printing. While current inkjet printing technolog... more Graininess noise is a common artifact in inkjet printing. While current inkjet printing technologies attempt to control graininess in single color images, the results are often less than optimal for multi-color images. This is due to fluidic interactions between inks of different colors. This paper will describe a color decomposition methodology that can be used to study ink flow patterns in multi-color inkjet printed images at a microscopic scale. This technique is used to decompose multi-color images into several independent color components. The ink patterns in these components is analyzed to relate them to visually perceptible graininess noise.
A promising direction for recovering the lost information in lowresolution headshot images is uti... more A promising direction for recovering the lost information in lowresolution headshot images is utilizing a set of high-resolution exemplars from the same identity. Complementary images in the reference set can improve the generated headshot quality across many different views and poses. However, it is challenging to make the best use of multiple exemplars: the quality and alignment of each exemplar cannot be guaranteed. Using low-quality and mismatched images as references will impair the output results. To overcome these issues, we propose an efficient Headshot Image Super-Resolution with Multiple Exemplars network (HIME) method. Compared with previous methods, our network can effectively handle the misalignment between the input and the reference without requiring facial priors and learn the aggregated reference set representation in an end-to-end manner. Furthermore, to reconstruct more detailed facial features, we propose a correlation loss that provides a rich representation of the local texture in a controllable spatial range. Experimental results demonstrate that the proposed framework not only has significantly fewer computation cost than recent exemplar-guided methods but also achieves better qualitative and quantitative performance.
Printer identification based on a printed document can provide forensic information to protect co... more Printer identification based on a printed document can provide forensic information to protect copyright and verify authenticity. In this work, a stochastic dot interaction model that take in to consideration of scanner characteristics and print-scan channel noise is developed to predict the impact of embedding extrinsic signatures using laser intensity modulation. With this model, reflectance of the printout can be effectively estimated without extensive measurements. In addition, we proposed an optimization framework to select the modulation parameters that will maximize the embedding capacity and detection reliability. Preliminary analysis results such as achievable capacity and correct detection rate are discussed.
Uploads
Papers by Jan Allebach