HAL (Le Centre pour la Communication Scientifique Directe), 2010
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best IRIM run has a Mean Inferred Average Precision of 0.0442, which is above the task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. For the instance search task, we used only one of the example images in our queries. The rank is nearly in the middle of the list of participants. The experiment showed that HSV features outperform the concatenation of HSV and edge histograms or the wavelet feature.
HAL (Le Centre pour la Communication Scientifique Directe), Nov 16, 2009
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2009 High Level Features detection task. We evaluated a large number of different descriptors (on TRECVID 2008 data) and tried different fusion strategies, in particular hierarchical fusion and genetic fusion. The best IRIM run has a Mean Inferred Average Precision of 0.1220, which is significantly above TRECVID 2009 HLF detection task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help.
HAL (Le Centre pour la Communication Scientifique Directe), Oct 27, 2008
In this paper, we present the first participation of a consortium of French laboratories, IRIM, t... more In this paper, we present the first participation of a consortium of French laboratories, IRIM, to the TRECVID 2008 BBC Rushes Summarization task. Our approach resorts to video skimming. We propose two methods to reduce redundancy, as rushes include several takes of scenes. We also take into account low and midlevel semantic features in an ad-hoc fusion method in order to retain only significant content
Avec la révolution numérique de cette dernière décennie, la quantité de photos numériques mise à ... more Avec la révolution numérique de cette dernière décennie, la quantité de photos numériques mise à disposition de chacun augmente plus rapidement que la capacité de traitement des ordinateurs. Les outils de recherche actuels ont été conçus pour traiter de faibles volumes de données. Leur complexité ne permet généralement pas d'effectuer des recherches dans des corpus de grande taille avec des temps de calculs acceptables pour les utilisateurs. Dans cette thèse, nous proposons des solutions pour passer à l'échelle les moteurs de recherche d'images par le contenu. Dans un premier temps, nous avons considéré les moteurs de recherche automatique traitant des images indexées sous la forme d'histogrammes globaux. Le passage à l'échelle de ces systèmes est obtenu avec l'introduction d'une nouvelle structure d'index adaptée à ce contexte qui nous permet d'effectuer des recherches de plus proches voisins approximées mais plus efficaces. Dans un second temps,...
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Phase sensitivity to axial strain of microstrustured optical silica fibers
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best IRIM run has a Mean Inferred Average Precision of 0.0442, which is above the task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. For the instance search task, we used only one of the example images in our queries. The rank is nearly in the middle of the list of participants. The experiment showed that HSV features outperform the concatenation of HSV and edge histograms or the wavelet feature.
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best IRIM run has a Mean Inferred Average Precision of 0.0442, which is above the task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. For the instance search task, we used only one of the example images in our queries. The rank is nearly in the middle of the list of participants. The experiment showed that HSV features outperform the concatenation of HSV and Edge histograms or the Wavelet features.
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2009 High Level Features detection task. We evaluated a large number of different descriptors (on TRECVID 2008 data) and tried different fusion strategies, in particular hierarchical fusion and genetic fusion. The best IRIM run has a Mean Inferred Average Precision of 0.1220, which is significantly above TRECVID 2009 HLF detection task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help.
Avec la revolution numerique de cette derniere decennie, la quantite de photos numeriques mise a ... more Avec la revolution numerique de cette derniere decennie, la quantite de photos numeriques mise a disposition de chacun augmente plus rapidement que la capacite de traitement des ordinateurs. Les outils de recherche actuels ont ete concus pour traiter de faibles volumes de donnees. Leur complexite ne permet generalement pas d'effectuer des recherches dans des corpus de grande taille avec des temps de calculs acceptables pour les utilisateurs. Dans cette these, nous proposons des solutions pour passer a l'echelle les moteurs de recherche d'images par le contenu. Dans un premier temps, nous avons considere les moteurs de recherche automatique traitant des images indexees sous la forme d'histogrammes globaux. Le passage a l'echelle de ces systemes est obtenu avec l'introduction d'une nouvelle structure d'index adaptee a ce contexte qui nous permet d'effectuer des recherches de plus proches voisins approximees mais plus efficaces. Dans un second temps,...
Since the digital revolution, the volume of images to be processed has grown exponentially. Inter... more Since the digital revolution, the volume of images to be processed has grown exponentially. Interactive search systems have to deal with these huge databases to remain effective. As the complexity of on-line learning methods is at least linear in the size of the database, scalability is the major problem for these methods. Fast retrieval systems, with index structures for fast navigation, have hence become like a Holy Grail. In this article, we propose a strategy to overcome this scalability limitation. Our technique exploits ultra fast retrieval methods as Locally Sensitive Hashing to speed up active learning system. Experiments on database of 180 K images are reported. The results show that our method is 45 times faster than state of the art approaches for similar accuracy.
This paper presents our approach to select relevant sequences from raw videos in order to generat... more This paper presents our approach to select relevant sequences from raw videos in order to generate summaries to Trecvid 2008 BBC Rush Task. Our system is composed of two major steps: First, the system detects "semantic" shot boundaries and keeps only non-redundant shots; then, the system estimates average motion for each shot, as a criterion of amount of information, to better share out the duration of the summary between remaining shots. The first step is based on a fast near-duplicate retrieval using Locality Sensitive Hashing (LSH) which provides results in few seconds (if we do not take into account decoding and encoding processes). The evaluation of Trecvid shows very promising results, since we ranked 17 th over 43 runs, regarding redundancy measure (RE), and 18 th for object and event inclusion (IN). These balanced results (most of best teams for the first criterion are among the latest for the second one) show that our method offers a quite good trade-off between false negatives (IN) and false positives (RE).
Active learning is a machine learning technique which has attracted a lot of research interest in... more Active learning is a machine learning technique which has attracted a lot of research interest in the content-based image retrieval (CBIR) in recent years. To be effective, an active learning system must be fast and efficient using as few (relevance) feedback iterations as possible. Scalability is the major problem for such an on-line learning method, since the complexity of such methods on a database of size n is in the best case O(n * log(n)). In this article we propose a strategy to overcome this limitation. Our technique exploits ultra fast retrieval methods like Locality Sensitive Hashing (LSH), recently applied for unsupervised image retrieval. Combined with active selection, our method is able to achieve very fast active learning task in very large database. Experiments on VOC2006 database are reported, results are obtained four times faster while preserving the accuracy.
In this paper, we present the first participation of a consortium of French laboratories, IRIM, t... more In this paper, we present the first participation of a consortium of French laboratories, IRIM, to the TRECVID 2008 BBC Rushes Summarization task. Our approach resorts to video skimming. We propose two methods to reduce redundancy, as rushes include several takes of scenes. We also take into account low and midlevel semantic features in an ad-hoc fusion method in order to retain only significant content
The growth of personal image collections has boosted the creation of many applications, many of w... more The growth of personal image collections has boosted the creation of many applications, many of which depend on the existence of fast schemes to match similar image descriptors. In this paper we present multicurves, a new indexing method for multimedia descriptors, able to handle high dimensionalities (100 dimensions and over) and large databases (millions of descriptors). The technique allows a fast implementation of approximate kNN search, and deals easily with data updating (insertions and deletions). The index is based on the simultaneous use of several moderate-dimensional space-filling curves. The combined effect of having more than one curve, and reducing the dimensionality of each individual curve allows overcoming undesirable boundary effects. In empirical evaluations, the method compares favorably with state-of-the-art methods, especially when the constraints of secondary storage are considered.
Although “Bag-of-Features” image models have shown very good potential for object matching and im... more Although “Bag-of-Features” image models have shown very good potential for object matching and image retrieval, such a complex data representation requires computationally expensive similarity measure evaluation. In this paper, we propose a framework unifying dictionary-based and kernel-based similarity functions that highlights the tradeoff between powerful data representation and eff cient similarity computation. On the basis of this formalism, we propose a new kernel-based similarity approach for Bag-of-Feature descriptions. We introduce a method for fast similarity search in large image databases. The conducted experiments prove that our approach is very competitive among State-of-the-art methods for similarity retrieval tasks.
In this paper, we present first results obtained in the frame of the EROS-3D project, which aims ... more In this paper, we present first results obtained in the frame of the EROS-3D project, which aims at dealing with a collection of artwork 3D models, i.e. visualize them, classify them and compare them. Some 3D descriptors are used, in association with our active learning search engine RETIN. 3D features are described as well as our new system of classification and retrieval of objects, which we called RETIN-3D.
In content based image retrieval, the success of any distance-based indexing scheme depends criti... more In content based image retrieval, the success of any distance-based indexing scheme depends critically on the quality of the chosen distance metric. We propose in this paper a kernel-based similarity approach working on sets of vectors to represent images. We introduce a method for fast approximate similarity search in large image databases with our kernel-based similarity metric. We evaluate our algorithm on image retrieval task and show it to be accurate and faster than linear scanning.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
In the past ten years, new powerful algorithms based on efficient data structures have been propo... more In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality Sensitive Hashing algorithm which provides approximate nearest neighbors in a Euclidean space with sub-linear complexity is probably the most popular, the Euclidean metric does not always provide as accurate and as relevant results when considering similarity measure as the Earth-Mover Distance and χ 2 -distance. In this paper, we present a new LSH scheme adapted to χ 2 -distance for approximate nearest neighbors search in high-dimensional spaces. We define the specific hashing functions, we prove their local-sensitivity and compare, through experiments, our method with the Euclidean Locality Sensitive Hashing algorithm in the context of image retrieval on real image databases. The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than Euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed.
HAL (Le Centre pour la Communication Scientifique Directe), 2010
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best IRIM run has a Mean Inferred Average Precision of 0.0442, which is above the task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. For the instance search task, we used only one of the example images in our queries. The rank is nearly in the middle of the list of participants. The experiment showed that HSV features outperform the concatenation of HSV and edge histograms or the wavelet feature.
HAL (Le Centre pour la Communication Scientifique Directe), Nov 16, 2009
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2009 High Level Features detection task. We evaluated a large number of different descriptors (on TRECVID 2008 data) and tried different fusion strategies, in particular hierarchical fusion and genetic fusion. The best IRIM run has a Mean Inferred Average Precision of 0.1220, which is significantly above TRECVID 2009 HLF detection task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help.
HAL (Le Centre pour la Communication Scientifique Directe), Oct 27, 2008
In this paper, we present the first participation of a consortium of French laboratories, IRIM, t... more In this paper, we present the first participation of a consortium of French laboratories, IRIM, to the TRECVID 2008 BBC Rushes Summarization task. Our approach resorts to video skimming. We propose two methods to reduce redundancy, as rushes include several takes of scenes. We also take into account low and midlevel semantic features in an ad-hoc fusion method in order to retain only significant content
Avec la révolution numérique de cette dernière décennie, la quantité de photos numériques mise à ... more Avec la révolution numérique de cette dernière décennie, la quantité de photos numériques mise à disposition de chacun augmente plus rapidement que la capacité de traitement des ordinateurs. Les outils de recherche actuels ont été conçus pour traiter de faibles volumes de données. Leur complexité ne permet généralement pas d'effectuer des recherches dans des corpus de grande taille avec des temps de calculs acceptables pour les utilisateurs. Dans cette thèse, nous proposons des solutions pour passer à l'échelle les moteurs de recherche d'images par le contenu. Dans un premier temps, nous avons considéré les moteurs de recherche automatique traitant des images indexées sous la forme d'histogrammes globaux. Le passage à l'échelle de ces systèmes est obtenu avec l'introduction d'une nouvelle structure d'index adaptée à ce contexte qui nous permet d'effectuer des recherches de plus proches voisins approximées mais plus efficaces. Dans un second temps,...
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Phase sensitivity to axial strain of microstrustured optical silica fibers
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific ... more HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best IRIM run has a Mean Inferred Average Precision of 0.0442, which is above the task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. For the instance search task, we used only one of the example images in our queries. The rank is nearly in the middle of the list of participants. The experiment showed that HSV features outperform the concatenation of HSV and edge histograms or the wavelet feature.
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2010 semantic indexing and instance search tasks. For the semantic indexing task, we evaluated a number of different descriptors and tried different fusion strategies, in particular hierarchical fusion. The best IRIM run has a Mean Inferred Average Precision of 0.0442, which is above the task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. For the instance search task, we used only one of the example images in our queries. The rank is nearly in the middle of the list of participants. The experiment showed that HSV features outperform the concatenation of HSV and Edge histograms or the Wavelet features.
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes our participation to the TRECVID 2009 High Level Features detection task. We evaluated a large number of different descriptors (on TRECVID 2008 data) and tried different fusion strategies, in particular hierarchical fusion and genetic fusion. The best IRIM run has a Mean Inferred Average Precision of 0.1220, which is significantly above TRECVID 2009 HLF detection task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help.
Avec la revolution numerique de cette derniere decennie, la quantite de photos numeriques mise a ... more Avec la revolution numerique de cette derniere decennie, la quantite de photos numeriques mise a disposition de chacun augmente plus rapidement que la capacite de traitement des ordinateurs. Les outils de recherche actuels ont ete concus pour traiter de faibles volumes de donnees. Leur complexite ne permet generalement pas d'effectuer des recherches dans des corpus de grande taille avec des temps de calculs acceptables pour les utilisateurs. Dans cette these, nous proposons des solutions pour passer a l'echelle les moteurs de recherche d'images par le contenu. Dans un premier temps, nous avons considere les moteurs de recherche automatique traitant des images indexees sous la forme d'histogrammes globaux. Le passage a l'echelle de ces systemes est obtenu avec l'introduction d'une nouvelle structure d'index adaptee a ce contexte qui nous permet d'effectuer des recherches de plus proches voisins approximees mais plus efficaces. Dans un second temps,...
Since the digital revolution, the volume of images to be processed has grown exponentially. Inter... more Since the digital revolution, the volume of images to be processed has grown exponentially. Interactive search systems have to deal with these huge databases to remain effective. As the complexity of on-line learning methods is at least linear in the size of the database, scalability is the major problem for these methods. Fast retrieval systems, with index structures for fast navigation, have hence become like a Holy Grail. In this article, we propose a strategy to overcome this scalability limitation. Our technique exploits ultra fast retrieval methods as Locally Sensitive Hashing to speed up active learning system. Experiments on database of 180 K images are reported. The results show that our method is 45 times faster than state of the art approaches for similar accuracy.
This paper presents our approach to select relevant sequences from raw videos in order to generat... more This paper presents our approach to select relevant sequences from raw videos in order to generate summaries to Trecvid 2008 BBC Rush Task. Our system is composed of two major steps: First, the system detects "semantic" shot boundaries and keeps only non-redundant shots; then, the system estimates average motion for each shot, as a criterion of amount of information, to better share out the duration of the summary between remaining shots. The first step is based on a fast near-duplicate retrieval using Locality Sensitive Hashing (LSH) which provides results in few seconds (if we do not take into account decoding and encoding processes). The evaluation of Trecvid shows very promising results, since we ranked 17 th over 43 runs, regarding redundancy measure (RE), and 18 th for object and event inclusion (IN). These balanced results (most of best teams for the first criterion are among the latest for the second one) show that our method offers a quite good trade-off between false negatives (IN) and false positives (RE).
Active learning is a machine learning technique which has attracted a lot of research interest in... more Active learning is a machine learning technique which has attracted a lot of research interest in the content-based image retrieval (CBIR) in recent years. To be effective, an active learning system must be fast and efficient using as few (relevance) feedback iterations as possible. Scalability is the major problem for such an on-line learning method, since the complexity of such methods on a database of size n is in the best case O(n * log(n)). In this article we propose a strategy to overcome this limitation. Our technique exploits ultra fast retrieval methods like Locality Sensitive Hashing (LSH), recently applied for unsupervised image retrieval. Combined with active selection, our method is able to achieve very fast active learning task in very large database. Experiments on VOC2006 database are reported, results are obtained four times faster while preserving the accuracy.
In this paper, we present the first participation of a consortium of French laboratories, IRIM, t... more In this paper, we present the first participation of a consortium of French laboratories, IRIM, to the TRECVID 2008 BBC Rushes Summarization task. Our approach resorts to video skimming. We propose two methods to reduce redundancy, as rushes include several takes of scenes. We also take into account low and midlevel semantic features in an ad-hoc fusion method in order to retain only significant content
The growth of personal image collections has boosted the creation of many applications, many of w... more The growth of personal image collections has boosted the creation of many applications, many of which depend on the existence of fast schemes to match similar image descriptors. In this paper we present multicurves, a new indexing method for multimedia descriptors, able to handle high dimensionalities (100 dimensions and over) and large databases (millions of descriptors). The technique allows a fast implementation of approximate kNN search, and deals easily with data updating (insertions and deletions). The index is based on the simultaneous use of several moderate-dimensional space-filling curves. The combined effect of having more than one curve, and reducing the dimensionality of each individual curve allows overcoming undesirable boundary effects. In empirical evaluations, the method compares favorably with state-of-the-art methods, especially when the constraints of secondary storage are considered.
Although “Bag-of-Features” image models have shown very good potential for object matching and im... more Although “Bag-of-Features” image models have shown very good potential for object matching and image retrieval, such a complex data representation requires computationally expensive similarity measure evaluation. In this paper, we propose a framework unifying dictionary-based and kernel-based similarity functions that highlights the tradeoff between powerful data representation and eff cient similarity computation. On the basis of this formalism, we propose a new kernel-based similarity approach for Bag-of-Feature descriptions. We introduce a method for fast similarity search in large image databases. The conducted experiments prove that our approach is very competitive among State-of-the-art methods for similarity retrieval tasks.
In this paper, we present first results obtained in the frame of the EROS-3D project, which aims ... more In this paper, we present first results obtained in the frame of the EROS-3D project, which aims at dealing with a collection of artwork 3D models, i.e. visualize them, classify them and compare them. Some 3D descriptors are used, in association with our active learning search engine RETIN. 3D features are described as well as our new system of classification and retrieval of objects, which we called RETIN-3D.
In content based image retrieval, the success of any distance-based indexing scheme depends criti... more In content based image retrieval, the success of any distance-based indexing scheme depends critically on the quality of the chosen distance metric. We propose in this paper a kernel-based similarity approach working on sets of vectors to represent images. We introduce a method for fast approximate similarity search in large image databases with our kernel-based similarity metric. We evaluate our algorithm on image retrieval task and show it to be accurate and faster than linear scanning.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
In the past ten years, new powerful algorithms based on efficient data structures have been propo... more In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality Sensitive Hashing algorithm which provides approximate nearest neighbors in a Euclidean space with sub-linear complexity is probably the most popular, the Euclidean metric does not always provide as accurate and as relevant results when considering similarity measure as the Earth-Mover Distance and χ 2 -distance. In this paper, we present a new LSH scheme adapted to χ 2 -distance for approximate nearest neighbors search in high-dimensional spaces. We define the specific hashing functions, we prove their local-sensitivity and compare, through experiments, our method with the Euclidean Locality Sensitive Hashing algorithm in the context of image retrieval on real image databases. The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than Euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed.
Uploads
Papers by david gorisse