Papers by Daniel Lopresti
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
The Lehigh Steel Collection (LSC) is an extremely large, heterogeneous set of documents dating fr... more The Lehigh Steel Collection (LSC) is an extremely large, heterogeneous set of documents dating from the 1960's through the 1990's. It was retrieved by Lehigh University after it acquired research facilities from Bethlehem Steel, a now-bankrupt company that was once the second-largest steel producer and the largest shipbuilder in the United States. The documents account for and describe research and development activities that were conducted on site, and consist of a very wide range of technical documentation, handwritten notes and memos, annotated printed documents, etc. This paper addresses only a sub-part of this collection: the approximately 4000 engineering drawings and blueprints that were retrieved. The challenge resides essentially in the fact that these documents come in different sizes and shapes, in a wide variety of conservation and degradation stages, and more importantly in bulk, and without ground-truth. Making them available to the research community through d...
Pattern Recognition and Artificial Intelligence, 2020
As more and more office documents are captured, stored, and shared in digital format, and as imag... more As more and more office documents are captured, stored, and shared in digital format, and as image editing software are becoming increasingly more powerful, there is a growing concern about document authenticity. To prevent illicit activities, this paper presents a new method for detecting altered text in document images. The proposed method explores the relationship between positive and negative coefficients of DCT to extract the effect of distortions caused by tampering by fusing reconstructed images of respective positive and negative coefficients, which results in Positive-Negative DCT coefficients Fusion (PNDF). To take advantage of spatial information, we propose to fuse R, G, and B color channels of input images, which results in RGBF (RGB Fusion). Next, the same fusion operation is used for fusing PNDF and RGBF, which results in a fused image for the original input one. We compute a histogram to extract features from the fused image, which results in a feature vector. The feature vector is then fed to a deep neural network for classifying altered text images. The proposed method is tested on our own dataset and the standard datasets from the ICPR 2018 Fraud Contest, Altered Handwriting (AH), and faked IMEI number images. The results show that the proposed method is effective and the proposed method outperforms the existing methods irrespective of image type.
Second International Conference on Document Image Analysis for Libraries (DIAL'06)
We explore connections between digital libraries and interactive document image analysis. Digital... more We explore connections between digital libraries and interactive document image analysis. Digital libraries can provide useful data and metadata for research in automated document image analysis, and allow unbiased testing of DIA algorithms. With these goals in mind, we suggest criteria for constructing and evaluating interactive DIA tools. Discussion We consider some involuted relationships between digital libraries and document image and content analysis. 1 Exploiting these relationships may accelerate all-around progress. Although we have worked on pieces of these puzzles for many years, we are acutely aware of the need for input by researchers with different backgrounds to channel further research. The questions we propose to explore are the following.
Document Recognition V, Apr 1, 1998
Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93)
Page 1. Certifiable Optical Character Recognition Daniel P. Lopresti and Jonathan S. Sandberg Mat... more Page 1. Certifiable Optical Character Recognition Daniel P. Lopresti and Jonathan S. Sandberg Matsushita Information Technology Laboratory Two Research Way Princeton, NJ 08540 USA Abstract In this paper we describe ...
Proceedings of the Fourth International Conference on Document Analysis and Recognition
Abstract In this paper, we examine the problem of locating and extracting text from images on the... more Abstract In this paper, we examine the problem of locating and extracting text from images on the World Wide Web. We describe ~a text detection algorithm which is baaed on color clustering and connected component analysis. The algorithm first quantizea the color apace of the ...
2011 International Conference on Document Analysis and Recognition, 2011
This contest aims to provide a metric giving indications on the influence of individual document ... more This contest aims to provide a metric giving indications on the influence of individual document analysis stages to overall end-to-end applications. Contestants are provided with a full, working pipeline which operates on a page image to extract useful information. The pipeline is built with clearly identified analysis stages (e.g. binarization, skew detection, layout analysis, OCR ...) that have a formalized input and output. Contestants are invited to contribute their own algorithms as an alternative to one or more of the initially provided stages. The evaluation measures the overall impact of the contributed algorithm on the final (end-of-pipeline) output.
SPIE Proceedings, 2005
The notion of assigning every piece of paper that passes through a printer a unique ID encoded ei... more The notion of assigning every piece of paper that passes through a printer a unique ID encoded either on the surface or in the substrate of the page, regardless of its intended use or perceived importance, could prove to be a breakthrough of magnitude comparable to the now ubiquitous concept of referencing a webpage through the use of its Universal Resource Locater (URL). We see many opportunities for using chipless ID in the world of everyday documents, but also many challenges. In this paper, we begin to explore the ways this new technology can be used to enable advanced document management functions, along with its implications for the ways in which people use documents.
Pattern Analysis & Applications, 2001
IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 2007
Biometric security is a topic of rapidly growing importance in the areas of user authentication a... more Biometric security is a topic of rapidly growing importance in the areas of user authentication and cryptographic key generation. In this paper, we describe our steps toward developing evaluation methodologies for behavioral biometrics that take into account threat models which have been largely ignored. We argue that the pervasive assumption that forgers are minimally motivated (or, even worse, naïve) is too optimistic and even dangerous. Taking handwriting as a case in point, we show through a series of experiments that some users are significantly better forgers than others, that such forgers can be trained in a relatively straightforward fashion to pose an even greater threat, that certain users are easy targets for forgers, and that most humans are a relatively poor judge of handwriting authenticity and hence their unaided instincts cannot be trusted. Additionally, to overcome current labor-intensive hurdles in performing more accurate assessments of system security, we present a generative attack model based on concatenative synthesis that can provide a rapid indication of the security afforded by the system. We show that our generative attacks match or exceed the effectiveness of forgeries rendered by the skilled humans we have encountered.
Computer, 1987
Page 1. oa4wy f C(J# I 'I P-NAC: A Systolic Array for Comparing Nucleic Acid Sequenc... more Page 1. oa4wy f C(J# I 'I P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences Daniel P. Lopresti Dept. ofComputer Science, Brown University, Providence, RI 02912 T he Princeton Nucleic Acid Com-parator (P-NAC ...
Proceedings of the Fifth Annual Symposium on …, 1996
Cornell University - arXiv, Dec 16, 2020
Overview Infectious diseases cause more than 13 million deaths a year, worldwide. Globalization, ... more Overview Infectious diseases cause more than 13 million deaths a year, worldwide. Globalization, urbanization, climate change, and ecological pressures have significantly increased the risk of a global pandemic. The ongoing COVID-19 pandemic-the first since the H1N1 outbreak more than a decade ago and the worst since the 1918 influenza pandemic-illustrates these matters vividly. More than 47M confirmed infections and 1M deaths have been reported worldwide as of November 4, 2020 and the global markets have lost trillions of dollars. The pandemic will continue to have significant disruptive impacts upon the United States and the world for years; its secondary and tertiary impacts might be felt for more than a decade. An effective strategy to reduce the national and global burden of pandemics must: (i) detect timing and location of occurrence, taking into account the many interdependent driving factors; (ii) anticipate public reaction to an outbreak, including panic behaviors that obstruct responders and spread contagion; and (iii) develop actionable policies that enable targeted and effective responses. These three aims will require advances in a number of areas, including: • The development of models that are not just scientifically effective, but that support understanding on the part of the public, as well as actionable insights for policy makers. • Identification and preparation of computational and data resources (data, computational power, expertise) that will allow us to respond quickly and predict effectively in a crisis situation. • Real-time collection and updating of data, models, and model assumptions in rapidly changing environments. These are not purely technological problems. Effective preparation for and response to future pandemics will require integration of solutions that span the full sociotechnical spectrum of challenges that are posed by these devastating events. This will require systemic, national-level support and a coordinated effort by the computing research community, in tandem with a broad coalition of experts from the social and political sciences, economics and the humanities. Such a framework will allow us to develop an understanding across scales, from cells and RNA to epidemic spread through communities and across countries. Only with such a comprehensive understanding will we be prepared to more effectively manage the next pandemic.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
18th International Conference on Pattern Recognition (ICPR'06)
Symbolic Indirect Correlation (SIC) is a nonparametric method that offers significant advantages ... more Symbolic Indirect Correlation (SIC) is a nonparametric method that offers significant advantages for recognition of ordered unsegmented signals. A previously introduced formulation of SIC based on subgraph-isomorphism requires very large reference sets in the presence of noise. In this paper, we seek to address this issue by formulating SIC classification as a maximum likelihood problem. We present experimental evidence that demonstrates that this new approach is more robust for the problem of online handwriting recognition using noisy input.
Detecting potential issues in naturally captured images of water is a challenging task due to vis... more Detecting potential issues in naturally captured images of water is a challenging task due to visual similarities between clean and polluted water, as well as causes posed by image acquisition with different camera angles and placements. This paper presents novel deep invariant texture features along with a deep network for detecting clean and polluted water images. The proposed method first divides an input image into H, S and V components to extract finer details. For each of the color spaces, the proposed approach generates two directional coherence images based on Eigen value analysis and gradient distribution, which results in enhanced images. Then the proposed method extracts scale invariant gradient orientations based on Gaussian first order derivative filters on different standard deviations to study texture of each smoothed image. To strengthen the above features, we explore the combination of Gabor-wavelet-binary pattern for extracting texture of the input water image. The...
Uploads
Papers by Daniel Lopresti