2009 International Conference on Field Programmable Logic and Applications, 2009
In many embedded systems for video surveillance distinctive features are used for the detection o... more In many embedded systems for video surveillance distinctive features are used for the detection of objects. In this contribution a real-time FPGA implementation of a feature detector, namely the SUSAN algorithm is described. As the original SUSAN algorithm performs poorly on non-synthetic images a significant quality improvement of this algorithm is presented. The hardware accelerator outperforms a comparable software version running on an Intel Core2Duo E8400 core at 3.00 GHz and delivers almost the same execution time compared to an implementation of the Harris corner detector running on an Nvidia GeForce 8800 GTX GPU.
We present a novel user interface concept for indoor navigation which uses directional arrows and... more We present a novel user interface concept for indoor navigation which uses directional arrows and panorama images of decision points, such as turns, along the route. The interface supports the mental model of landmark-based navigation, can be used on-and offline, and is highly tolerant to localization inaccuracy. We evaluated the system in a real-world user study where decision points proved to be as efficient for navigation as continuous route instructions and panorama updates. We gained valuable insights on the role of feedback and of the frequency of decision points with relation to user confidence and satisfaction. Based on our experiences, we summarize lessons learned that inspire and guide the further design of UIs for pedestrian navigation systems in indoor environments.
Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are ... more Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are therefore scalable and cheap. The particular requirements to a navigation user interface for a vision-based system, however, have not been investigated so far. Such mobile interfaces should adapt to localization accu- racy, which strongly relies on distinctive reference images, and other factors, such as the phone's
Distinctive visual cues are of central importance for image retrieval applications, in particular... more Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors dynamic objects and clutter significantly impair the retrieval performance. We present an approach which exploits text, a major source of information for humans during orientation and navigation,
Vision-based approaches are a promising method for indoor navigation, but prototyping and evaluat... more Vision-based approaches are a promising method for indoor navigation, but prototyping and evaluating them poses several challenges. These include the effort of realizing the localization component, difficulties in simulating real-world behavior and the interaction between vision-based localization and the user interface. In this paper, we report on initial findings from the development of a tool to support this process. We
International Conference on Indoor Positioning and Indoor Navigation, 2012
We present a visual odometry system for indoor navigation with a focus on long-term robustness an... more We present a visual odometry system for indoor navigation with a focus on long-term robustness and consistency. As our work is targeting mobile phones, we employ monocular SLAM to jointly estimate a local map and the device's trajectory. We specifically address the problem of estimating the scale factor of both, the map and the trajectory. State-of-the-art solutions approach this problem
2012 19th IEEE International Conference on Image Processing, 2012
ABSTRACT Recent advances in the field of content-based image retrieval (CBIR) have made it possib... more ABSTRACT Recent advances in the field of content-based image retrieval (CBIR) have made it possible to quickly search large image databases using photographs or video sequences as a query. With appropriately tagged images of places, this technique can be applied to the problem of visual location recognition. While this task has attracted large interest in the community, most existing approaches focus on outdoor environments only. This is mainly due to the fact that the generation of an indoor dataset is elaborate and complex. In order to allow researchers to advance their approaches towards the challenging field of CBIR-based indoor localization and to facilitate an objective comparison of different algorithms, we provide an extensive, high resolution indoor dataset. The free for use dataset includes realistic query sequences with ground truth as well as point cloud data, enabling a localization system to perform 6-DOF pose estimation.
2014 IEEE International Conference on Image Processing (ICIP), 2014
ABSTRACT Recent progress in the field of content-based image retrieval has enabled camera-based i... more ABSTRACT Recent progress in the field of content-based image retrieval has enabled camera-based indoor positioning. The matching of smartphone recordings with a database of geo-referenced images allows for meter accurate infrastructure-free localization. In mobile scenarios, however, three major constraints have to be considered: limited computational resources of mobile devices, limited network capacity and the need for scalability in large buildings. To address these issues, we modify the state-of-the-art Vector of Locally Aggregated Descriptors (VLAD) image signature to work with recently emerging binary feature descriptors. We show that this results in a substantial reduction in the overall computational complexity, which enables the matching of image signatures directly on the mobile device. The specific properties of this signature form the basis of our proposed scalable streaming approach that preemptively loads image signatures of reference images in the vicinity of the user onto the mobile device to mitigate the effect of network latency. In order to provide efficient streaming, we compress the signatures by exploiting the similarities of spatially neighboring reference images. In combination, the contributions of this paper lead to an indoor localization system, which allows instantaneous camera-based indoor positioning with very low requirements on the available network connection.
IEEE Transactions on Circuits and Systems for Video Technology, 2015
ABSTRACT This paper presents a novel rate control framework for H.264/Advanced Video Coding-based... more ABSTRACT This paper presents a novel rate control framework for H.264/Advanced Video Coding-based video coding that improves the preservation of gradient-based features like scale-invariant feature transform or speeded up robust feature compared with the default rate control algorithm in the JM reference software. First, a criterion (matching score) for feature preservation on the basis of the bag-of-features concept is proposed. Then, the matching scores are collected as a function of the quantization parameters and analyzed for different feature types. With this analysis, macroblocks are categorized into different groups before encoding. Our rate control algorithm assigns different quantization parameters to each group according to the importance of the group for feature extraction. The experimental results show that our rate control algorithm achieves the desired target bit rate, and more features are preserved compared with videos encoded using the default rate control. The proposed approach not only improves feature preservation, but also leads to a noticeable performance improvement in a real image retrieval system. The rate control framework proposed in this paper is fully standard compatible.
Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp '14 Adjunct, 2014
ABSTRACT We propose a graph-based, low-complexity sensor fusion approach for ubiquitous pedestria... more ABSTRACT We propose a graph-based, low-complexity sensor fusion approach for ubiquitous pedestrian indoor positioning using mobile devices. We employ our fusion technique to combine relative motion information based on step detection with WiFi signal strength measurements. The method is based on the well-known particle filter methodology. In contrast to previous work, we provide a probabilistic model for location estimation that is formulated directly on a fully discretized, graph-based representation of the indoor environment. We generate this graph by adaptive quantization of the indoor space, removing irrelevant degrees of freedom from the estimation problem. We evaluate the proposed method in two realistic indoor environments using real data collected from smartphones. In total, our dataset spans about 20 kilometers in distance walked and includes 13 users and four different mobile device types. Our results demonstrate that the filter requires an order of magnitude less particles than state-of-the-art approaches while maintaining an accuracy of a few meters. The proposed low-complexity solution not only enables indoor positioning on less powerful mobile devices, but also saves much-needed resources for location-based end-user applications which run on top of a localization service.
2011 IEEE International Symposium on Multimedia, 2011
Distinctive visual cues are of central importance for image retrieval applications, in particular... more Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors dynamic objects and clutter significantly impair the retrieval performance. We present an approach which exploits text, a major source of information for humans during orientation and navigation, without the need for error-prone optical character recognition. To this end, characters are detected and described using robust feature descriptors like SURF. By quantizing them into several hundred visual words we consider the distinctive appearance of the characters rather than reducing the set of possible features to an alphabet. Writings in images are transformed to strings of visual words termed visual phrases, which provide significantly improved distinctiveness when compared to individual features. An approximate string matching is performed using N-grams, which can be efficiently combined with an inverted file structure to cope with large datasets. An experimental evaluation on three different datasets shows significant improvement of the retrieval performance while reducing the size of the database by two orders of magnitude compared to state-of-the-art. Its low computational complexity makes the approach particularly suited for mobile image retrieval applications.
2013 IEEE International Conference on Image Processing, 2013
ABSTRACT State-of-the-art visual odometry algorithms achieve remarkable efficiency and accuracy. ... more ABSTRACT State-of-the-art visual odometry algorithms achieve remarkable efficiency and accuracy. Under realistic conditions, however, tracking failures are inevitable and to continue tracking, a recovery strategy is required. In this paper, we propose a relocalization system that enables realtime, 6D pose recovery for wide baselines. Our approach targets specifically resource-constrained hardware such as mobile phones. By exploiting the properties of low-complexity binary feature descriptors, nearest-neighbor search is performed efficiently using Locality Sensitive Hashing. Our method does not require time-consuming offline training of hash tables and it can be applied to any visual odometry system. We provide a thorough evaluation of effectiveness, robustness and runtime on an indoor test sequence with available ground truth poses. We investigate the system parameterization and compare the relocalization performance for the three binary descriptors BRIEF, unscaled BRIEF and ORB. In contrast to previous work on mobile visual odometry, we are able to quickly recover from tracking failures within maps with thousands of 3D feature points.
2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2013
ABSTRACT Retrieving the location of a mobile device by matching a query image to a database of ge... more ABSTRACT Retrieving the location of a mobile device by matching a query image to a database of geo-tagged imagery is one popular application of content-based image retrieval (CBIR). Standard CBIR-based approaches exploit appearance features of the environment for the matching process. Many locations, however, are characterized by distinct structural (geometric) features. We investigate whether a standard appearance-based CBIR pipeline can be adapted to perform location retrieval using a range image-based representation of the environment. The contributions are three-fold: We design a rigorous experimental setup using an extensive and challenging indoor dataset. Secondly, we compare the state-of-the-art feature algorithm specifically designed for range images, the Normal Aligned Radial Feature (NARF) [1], against some of the most established appearance-based features. Thirdly, we combine the high key point detection rate of NARF, with the robustness of the Speeded-Up Robust Feature for range-image based location recognition. This detector-descriptor combination, which we coin NURF, leads to 15% improvement in absolute location recognition performance compared to simple NARF in our experimental setup.
Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are ... more Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are therefore scalable and cheap. The particular requirements to a navigation user interface for a vision-based system, however, have not been investigated so far.
We present a novel user interface concept for indoor navigation which uses directional arrows and... more We present a novel user interface concept for indoor navigation which uses directional arrows and panorama images of decision points, such as turns, along the route. The interface supports the mental model of landmark-based navigation, can be used on-and offline, and is highly tolerant to localization inaccuracy. We evaluated the system in a real-world user study where decision points proved to be as efficient for navigation as continuous route instructions and panorama updates. We gained valuable insights on the role of feedback and of the frequency of decision points with relation to user confidence and satisfaction. Based on our experiences, we summarize lessons learned that inspire and guide the further design of UIs for pedestrian navigation systems in indoor environments.
. We present and evaluate a novel user interface for indoor navigation, incorporating two modes. ... more . We present and evaluate a novel user interface for indoor navigation, incorporating two modes. In augmented reality (AR) mode, navigation instructions are shown as an overlay over the live camera image and the phone is held as depicted in Picture a). In virtual reality (VR) mode, a correctly oriented 360 • panorama image is shown when holding the phone as in Picture b). The interface particularly addresses the vision-based localization method by including special UI elements that support the acquisition of "good" query images. Screenshot c) shows a prototype incorporating the presented VR user interface.
Proceedings of the 20th ACM international conference on Multimedia - MM '12, 2012
Determining the pose of a mobile device based on visual information is a promising approach to so... more Determining the pose of a mobile device based on visual information is a promising approach to solve the indoor localization problem. We present an approach that transforms localized images along a mapping trajectory into virtual viewpoints that cover a set of densely sampled camera positions and orientations in a confined environment. The viewpoints are represented by their respective bag-of-features vectors and image retrieval techniques are applied to determine the most likely pose of query images at very low computational complexity. As virtual image locations and orientations are decoupled from actual image locations, the system is able to work with sparse reference imagery and copes well with perspective distortion. Experiments confirm that pose retrieval performance is significantly improved.
2009 International Conference on Field Programmable Logic and Applications, 2009
In many embedded systems for video surveillance distinctive features are used for the detection o... more In many embedded systems for video surveillance distinctive features are used for the detection of objects. In this contribution a real-time FPGA implementation of a feature detector, namely the SUSAN algorithm is described. As the original SUSAN algorithm performs poorly on non-synthetic images a significant quality improvement of this algorithm is presented. The hardware accelerator outperforms a comparable software version running on an Intel Core2Duo E8400 core at 3.00 GHz and delivers almost the same execution time compared to an implementation of the Harris corner detector running on an Nvidia GeForce 8800 GTX GPU.
We present a novel user interface concept for indoor navigation which uses directional arrows and... more We present a novel user interface concept for indoor navigation which uses directional arrows and panorama images of decision points, such as turns, along the route. The interface supports the mental model of landmark-based navigation, can be used on-and offline, and is highly tolerant to localization inaccuracy. We evaluated the system in a real-world user study where decision points proved to be as efficient for navigation as continuous route instructions and panorama updates. We gained valuable insights on the role of feedback and of the frequency of decision points with relation to user confidence and satisfaction. Based on our experiences, we summarize lessons learned that inspire and guide the further design of UIs for pedestrian navigation systems in indoor environments.
Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are ... more Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are therefore scalable and cheap. The particular requirements to a navigation user interface for a vision-based system, however, have not been investigated so far. Such mobile interfaces should adapt to localization accu- racy, which strongly relies on distinctive reference images, and other factors, such as the phone's
Distinctive visual cues are of central importance for image retrieval applications, in particular... more Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors dynamic objects and clutter significantly impair the retrieval performance. We present an approach which exploits text, a major source of information for humans during orientation and navigation,
Vision-based approaches are a promising method for indoor navigation, but prototyping and evaluat... more Vision-based approaches are a promising method for indoor navigation, but prototyping and evaluating them poses several challenges. These include the effort of realizing the localization component, difficulties in simulating real-world behavior and the interaction between vision-based localization and the user interface. In this paper, we report on initial findings from the development of a tool to support this process. We
International Conference on Indoor Positioning and Indoor Navigation, 2012
We present a visual odometry system for indoor navigation with a focus on long-term robustness an... more We present a visual odometry system for indoor navigation with a focus on long-term robustness and consistency. As our work is targeting mobile phones, we employ monocular SLAM to jointly estimate a local map and the device's trajectory. We specifically address the problem of estimating the scale factor of both, the map and the trajectory. State-of-the-art solutions approach this problem
2012 19th IEEE International Conference on Image Processing, 2012
ABSTRACT Recent advances in the field of content-based image retrieval (CBIR) have made it possib... more ABSTRACT Recent advances in the field of content-based image retrieval (CBIR) have made it possible to quickly search large image databases using photographs or video sequences as a query. With appropriately tagged images of places, this technique can be applied to the problem of visual location recognition. While this task has attracted large interest in the community, most existing approaches focus on outdoor environments only. This is mainly due to the fact that the generation of an indoor dataset is elaborate and complex. In order to allow researchers to advance their approaches towards the challenging field of CBIR-based indoor localization and to facilitate an objective comparison of different algorithms, we provide an extensive, high resolution indoor dataset. The free for use dataset includes realistic query sequences with ground truth as well as point cloud data, enabling a localization system to perform 6-DOF pose estimation.
2014 IEEE International Conference on Image Processing (ICIP), 2014
ABSTRACT Recent progress in the field of content-based image retrieval has enabled camera-based i... more ABSTRACT Recent progress in the field of content-based image retrieval has enabled camera-based indoor positioning. The matching of smartphone recordings with a database of geo-referenced images allows for meter accurate infrastructure-free localization. In mobile scenarios, however, three major constraints have to be considered: limited computational resources of mobile devices, limited network capacity and the need for scalability in large buildings. To address these issues, we modify the state-of-the-art Vector of Locally Aggregated Descriptors (VLAD) image signature to work with recently emerging binary feature descriptors. We show that this results in a substantial reduction in the overall computational complexity, which enables the matching of image signatures directly on the mobile device. The specific properties of this signature form the basis of our proposed scalable streaming approach that preemptively loads image signatures of reference images in the vicinity of the user onto the mobile device to mitigate the effect of network latency. In order to provide efficient streaming, we compress the signatures by exploiting the similarities of spatially neighboring reference images. In combination, the contributions of this paper lead to an indoor localization system, which allows instantaneous camera-based indoor positioning with very low requirements on the available network connection.
IEEE Transactions on Circuits and Systems for Video Technology, 2015
ABSTRACT This paper presents a novel rate control framework for H.264/Advanced Video Coding-based... more ABSTRACT This paper presents a novel rate control framework for H.264/Advanced Video Coding-based video coding that improves the preservation of gradient-based features like scale-invariant feature transform or speeded up robust feature compared with the default rate control algorithm in the JM reference software. First, a criterion (matching score) for feature preservation on the basis of the bag-of-features concept is proposed. Then, the matching scores are collected as a function of the quantization parameters and analyzed for different feature types. With this analysis, macroblocks are categorized into different groups before encoding. Our rate control algorithm assigns different quantization parameters to each group according to the importance of the group for feature extraction. The experimental results show that our rate control algorithm achieves the desired target bit rate, and more features are preserved compared with videos encoded using the default rate control. The proposed approach not only improves feature preservation, but also leads to a noticeable performance improvement in a real image retrieval system. The rate control framework proposed in this paper is fully standard compatible.
Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp '14 Adjunct, 2014
ABSTRACT We propose a graph-based, low-complexity sensor fusion approach for ubiquitous pedestria... more ABSTRACT We propose a graph-based, low-complexity sensor fusion approach for ubiquitous pedestrian indoor positioning using mobile devices. We employ our fusion technique to combine relative motion information based on step detection with WiFi signal strength measurements. The method is based on the well-known particle filter methodology. In contrast to previous work, we provide a probabilistic model for location estimation that is formulated directly on a fully discretized, graph-based representation of the indoor environment. We generate this graph by adaptive quantization of the indoor space, removing irrelevant degrees of freedom from the estimation problem. We evaluate the proposed method in two realistic indoor environments using real data collected from smartphones. In total, our dataset spans about 20 kilometers in distance walked and includes 13 users and four different mobile device types. Our results demonstrate that the filter requires an order of magnitude less particles than state-of-the-art approaches while maintaining an accuracy of a few meters. The proposed low-complexity solution not only enables indoor positioning on less powerful mobile devices, but also saves much-needed resources for location-based end-user applications which run on top of a localization service.
2011 IEEE International Symposium on Multimedia, 2011
Distinctive visual cues are of central importance for image retrieval applications, in particular... more Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors dynamic objects and clutter significantly impair the retrieval performance. We present an approach which exploits text, a major source of information for humans during orientation and navigation, without the need for error-prone optical character recognition. To this end, characters are detected and described using robust feature descriptors like SURF. By quantizing them into several hundred visual words we consider the distinctive appearance of the characters rather than reducing the set of possible features to an alphabet. Writings in images are transformed to strings of visual words termed visual phrases, which provide significantly improved distinctiveness when compared to individual features. An approximate string matching is performed using N-grams, which can be efficiently combined with an inverted file structure to cope with large datasets. An experimental evaluation on three different datasets shows significant improvement of the retrieval performance while reducing the size of the database by two orders of magnitude compared to state-of-the-art. Its low computational complexity makes the approach particularly suited for mobile image retrieval applications.
2013 IEEE International Conference on Image Processing, 2013
ABSTRACT State-of-the-art visual odometry algorithms achieve remarkable efficiency and accuracy. ... more ABSTRACT State-of-the-art visual odometry algorithms achieve remarkable efficiency and accuracy. Under realistic conditions, however, tracking failures are inevitable and to continue tracking, a recovery strategy is required. In this paper, we propose a relocalization system that enables realtime, 6D pose recovery for wide baselines. Our approach targets specifically resource-constrained hardware such as mobile phones. By exploiting the properties of low-complexity binary feature descriptors, nearest-neighbor search is performed efficiently using Locality Sensitive Hashing. Our method does not require time-consuming offline training of hash tables and it can be applied to any visual odometry system. We provide a thorough evaluation of effectiveness, robustness and runtime on an indoor test sequence with available ground truth poses. We investigate the system parameterization and compare the relocalization performance for the three binary descriptors BRIEF, unscaled BRIEF and ORB. In contrast to previous work on mobile visual odometry, we are able to quickly recover from tracking failures within maps with thousands of 3D feature points.
2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2013
ABSTRACT Retrieving the location of a mobile device by matching a query image to a database of ge... more ABSTRACT Retrieving the location of a mobile device by matching a query image to a database of geo-tagged imagery is one popular application of content-based image retrieval (CBIR). Standard CBIR-based approaches exploit appearance features of the environment for the matching process. Many locations, however, are characterized by distinct structural (geometric) features. We investigate whether a standard appearance-based CBIR pipeline can be adapted to perform location retrieval using a range image-based representation of the environment. The contributions are three-fold: We design a rigorous experimental setup using an extensive and challenging indoor dataset. Secondly, we compare the state-of-the-art feature algorithm specifically designed for range images, the Normal Aligned Radial Feature (NARF) [1], against some of the most established appearance-based features. Thirdly, we combine the high key point detection rate of NARF, with the robustness of the Speeded-Up Robust Feature for range-image based location recognition. This detector-descriptor combination, which we coin NURF, leads to 15% improvement in absolute location recognition performance compared to simple NARF in our experimental setup.
Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are ... more Vision-based approaches for mobile indoor localization do not rely on the infrastructure and are therefore scalable and cheap. The particular requirements to a navigation user interface for a vision-based system, however, have not been investigated so far.
We present a novel user interface concept for indoor navigation which uses directional arrows and... more We present a novel user interface concept for indoor navigation which uses directional arrows and panorama images of decision points, such as turns, along the route. The interface supports the mental model of landmark-based navigation, can be used on-and offline, and is highly tolerant to localization inaccuracy. We evaluated the system in a real-world user study where decision points proved to be as efficient for navigation as continuous route instructions and panorama updates. We gained valuable insights on the role of feedback and of the frequency of decision points with relation to user confidence and satisfaction. Based on our experiences, we summarize lessons learned that inspire and guide the further design of UIs for pedestrian navigation systems in indoor environments.
. We present and evaluate a novel user interface for indoor navigation, incorporating two modes. ... more . We present and evaluate a novel user interface for indoor navigation, incorporating two modes. In augmented reality (AR) mode, navigation instructions are shown as an overlay over the live camera image and the phone is held as depicted in Picture a). In virtual reality (VR) mode, a correctly oriented 360 • panorama image is shown when holding the phone as in Picture b). The interface particularly addresses the vision-based localization method by including special UI elements that support the acquisition of "good" query images. Screenshot c) shows a prototype incorporating the presented VR user interface.
Proceedings of the 20th ACM international conference on Multimedia - MM '12, 2012
Determining the pose of a mobile device based on visual information is a promising approach to so... more Determining the pose of a mobile device based on visual information is a promising approach to solve the indoor localization problem. We present an approach that transforms localized images along a mapping trajectory into virtual viewpoints that cover a set of densely sampled camera positions and orientations in a confined environment. The viewpoints are represented by their respective bag-of-features vectors and image retrieval techniques are applied to determine the most likely pose of query images at very low computational complexity. As virtual image locations and orientations are decoupled from actual image locations, the system is able to work with sparse reference imagery and copes well with perspective distortion. Experiments confirm that pose retrieval performance is significantly improved.
Uploads
Papers by Robert Huitl