Nearest Keyword Set Search in Multidimensional Datasets
Nearest Keyword Set Search in Multidimensional Datasets
Nearest Keyword Set Search in Multidimensional Datasets
ABSTRACT
Keyword-based search in text-rich multi-dimensional datasets facilitates many novel
applications and tools. In this paper, we consider objects that are tagged with keywords and are
embedded in a vector space. For these datasets, we study queries that ask for the tightest groups
of points satisfying a given set of keywords. We propose a novel method called ProMiSH
(Projection and Multi Scale Hashing) that uses random projection and hash-based index
structures, and achieves high scalability and speedup. We present an exact and an approximate
version of the algorithm. Our empirical studies, both on real and synthetic datasets, show that
ProMiSH has a speedup of more than four orders over state-of-the-art tree-based techniques. Our
scalability tests on datasets of sizes up to 10 million and dimensions up to 100 for queries having
up to 9 keywords show that ProMiSH scales linearly with the dataset size, the dataset dimension,
the query size, and the result
EXISTING SYSTEM:
In this paper, we study nearest keyword set (referred to as NKS) queries on text-rich
multi-dimensional datasets. An NKS query is a set of user-provided keywords, and the result of
the query may include k sets of data points each of which contains all the query keywords and
forms one of the top-k tightest clusters in the multi-dimensional space. NKS query over a set of
two-dimensional data points. In tree-based indexes suggest possible solutions to NKS queries on
multidimensional datasets, the performance of these algorithms deteriorates sharply with the
increase of size or dimensionality in datasets.
DISADVANTAGES:
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar,
Vellore 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218
Website: www.shakastech.com, Email - id: [email protected],
[email protected]
NKS queries are useful for graph pattern search, where labeled graphs are embedded in a
EXISTING ALGORITHM
PROPOSED SYSTEM:
In this paper, we propose ProMiSH (short for Projection and Multi-Scale Hashing) to
enable fast processing for NKS queries. In particular, we develop an exact ProMiSH (referred to
as ProMiSH-E) that always retrieves the optimal top-k results, and an approximate ProMiSH
(referred to as ProMiSH-A) that is more efficient in terms of time and space, and is able to obtain
near-optimal results in practice. ProMiSH-E uses a set of hash tables and inverted indexes to
perform a localized search. Based on this index, we developed ProMiSH-E that finds an optimal
subset of points and ProMiSH-A which searches near-optimal results with better efficiency.
ProMiSH is faster than state-of-the-art tree-based techniques, with multiple orders of magnitude
performance improvement.
ADVANTAGES:
PROPOSED ALGORITHM
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar,
Vellore 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218
Website: www.shakastech.com, Email - id: [email protected],
[email protected]
STP Algorithm
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar,
Vellore 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218
Website: www.shakastech.com, Email - id: [email protected],
[email protected]