An Efficient and Secure Multilevel Keyword Ranked Search For Encrypted Data

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.

003

An Efficient and Secure Multilevel Keyword Ranked


Search for Encrypted Data
Prasanth K. Baby1, Nikhil Samuel2
1
Christ College of Engineering, irinjalakuda, Thrissur
2
Christ College of Engineering, irinjalakuda, Thrissur

(E-mail: [email protected],[email protected])

In standard data searching service is basically on plain text


Abstract— with the growing digital communication and keyword search, the solution of downloading all the data and
networks, the data owners are motivated to outsource their decrypting locally impractical. This is due to the huge amount
complex data to the global storage space. Greater flexibility and of band width required. For avoiding the local storage
economic saving are the advantages of this global storage space. management, the data are storing to the storage servers. Easily
Before outsourcing the sensitive data, it has to be encrypted in searched and utilized, explores security and effective search
order to enforce the data privacy. In the encrypted data, search service for encrypted data is of paramount importance. For the
service is important to get the necessary data. The stored data is useful data retrieval in the large number of documents, request
relatively large so it requires multiple keywords in the search the server to perform result relevance ranking, rather than
query and it will return document in the order of their relevance returning undifferentiated results. This ranked search helps to
to these keywords searched. Related works on searchable avoid the unnecessary network traffic. For the privacy
encryption focus on single keyword search or Boolean keyword protection, such ranking does not leak any keyword related
search and rarely sort the result and for the multi-keyword information.
search coordinate matching, i.e., as many matches as possible, to
effectively capture the relevance of outsourced documents to the To improving accuracy in search result and enhance the
query keywords and inner product similarity to evaluate such user search experience, ranking system has to support multiple
similarity measure. In Multi-keyword Ranked Search under the keyword search. As a user routine practice, provide multiple
coordinate matching, the ranking helps for the efficient retrieval. keywords as a search user interest to retrieve data, the each
The multilevel keyword ranked search is implemented by using keyword in search request helps to restrict the search result.
the cache to reduce the search time. For this coordinate matching is used. This coordinate
matching is used widely in the plaintext information retrieval
Keywords— information retrieval; keyword search; multilevel;
(IR). Coordinate matching means as many matches as
ranked search; searchable encryption.
possible.
I. INTRODUCTION The proposed method describes the technique for reducing
With the evolution of different technologies the amount of the searching time by using multilevel keyword ranked search.
data handled by users also increased. Hackers and malicious This method will help the user for consuming less band width,
programs all pose a threat to your computer and the reduce the searching time over large encrypted document, and
information it contains. Advanced data storage service for user also help the user for searching multi-keyword (set of
is data outsourcing, store complex data to global storage space keywords). For restricted the search result, each keyword in
provider. The outsourced data’s are managed on remote the search query helps. Multilevel keyword ranked search uses
servers are maintained by the trusted third party outsourcing coordinate matching, i.e. as many matches as possible. It is an
vendors. In today’s distributed nature of data management, efficient similarity measure among multi-keyword semantics
gives assurances to detect and correct faulty behavior. For the to refine the result relevance, and has been used in plain text
relevance of outsourced data, data owners place their sensitive information retrieval (IR). The search query consists of
data into specialized storage area. Data owners outsource their keyword and the top k retrieval (top k rank). Here the search
data without assurances of confidentiality and security. result is on the rank ordered search. There are using several
Achieving confidentiality by encrypting the data, the major searching technique for searching on encrypted data. For
challenge is that how to enable search and retrieval over such performing better search reduced time, here cache is
encrypted data. implemented in multilevel keyword ranked search. This cache
helps for the better search reduction time. In the cache there
store the document and there weight.

15
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

The rest of the paper is organized into four sections. set, and this technique together with blinding indexes with
Section II deals with the related works. Methodology is random tokens, ensures that there indexes are semantic
discussed in section III. Section IV involves the analysis of the security against adaptive chosen keyword attack (ind-cka)
work and the conclusion of the work is given in section V. secure.
Kmara et al [8] has developed a similar per-file index
I. Related Works scheme. In this method the files are stored in encrypted form,
later the user U wants to retrieve files containing (indexed by)
This session includes the related works that are done in the some keyword search, is not possible. There is no any straight
field of Searchable encryption are Single keyword Searchable forward way to do keyword search unless U leak the
Encryption and Boolean Keyword Searchable Encryption. decryption key. Here U has to create keyword index associates
Single keyword searchable encryption schemes usually build each keyword with its associated files. All keyword searches
an encrypted searchable index such that its content is by U are based on this index; hence their scheme does not
concealed to the server unless it is given appropriate trapdoors offer full pattern-matching generality with the real text. In
generated via secret keys [8]. In boolean keyword searchable practice, this should be sufficient for most users. It is worth
encryption, conjunctive and disjunctive search are used. All noting that in this system U can have complete control over
these searchable encryption are not dealing with the searching what words are keywords and which keywords are associated
time, considering with the basic introduction to cache is much with which files, a power that can be useful for many
better for those problems.as well, for math, etc. applications. let U use pseudo-random bits to mask a
dictionary-based keyword index for each file and send it to S
A. Single Keyword Searchable Encryption in such a way that later U can use the short seeds to help S
Song et al [4] introduced the notation of searchable recover selective parts of the index, while maintaining the
encryption, based on symmetric key setting, where each word remaining parts pseudo-random.
in the file is encrypted independently under a special two
layered encryption construction. Thus, a searching overhead is B. Boolean Keyword Searchable Encryption
linear to the whole file collection length. In this method the In the Boolean keyword search conjunctive and disjunctive
searching is done by, an index contains a list of keywords. search are used. Conjunctive keyword search returns “ all-or-
With each keyword a list of pointers to documents where the nothing . Which means that returns those documents in which
key word appears. These keywords are words of interest that all the keywords specified by the search query appear. In
user want to search later. One possible advantage for this disjunctive keyword search that returns undifferentiated
scheme is that the request could be embedded in other results. This means it returns every document that contains a
retrievals so that Bob might have uncertainty about the subset of the specific keyword, even only one key word of
correlation of the search request and the retrieval request for interest. The paper [10] gives a basic idea for privacy
cipher text. The disadvantage is that user has to spend an extra preserving Multi-keyword ranked search over encrypted data
round-trip time to retrieve the documents. A general (MSRE). This is based on secure inner product computation,
disadvantage for index search is that whenever user changes and give two significantly improved MRSE schemes to
the documents, its index must be updated. achieve various strict privacy requirements.
The limitations of [4] was the work load for each search 1) Conjunctive & Disjunctive Keyword Search: Golle et
request proportional to the number of files in the collection. al [11] proposed conjunctive keyword search over encrypted
To overcome this limitation E.J Goh [5] developed a Bloom
data. Consider a user that stores encrypted documents on an
filter based per-file index, reducing the work load for each
search request proportional to the number of files in the untrusted server. Let p be the total number of documents, and
collection. A secure index is a data structure that permits a assume there are m keyword fields associated with each
queries with a trapdoor for a word X to test in O(1) time only document. For an example consider, documents were emails.
if the index contains X; The index disclose no information They define the following 4 keyword fields: “From”, “To”,
about its contents without valid trapdoors, and trapdoors can “Date” and “Subject”. For the simplicity, they make the
only be generated with a secret key. Secure indexes [5] are a following assumptions:
natural extension of the problem of constructing data  Assume that the same keyword never arise in two
structures with privacy ensures such as those provided by different keyword fields. The easiest way to satisfy this
oblivious and history independent data structures. They use requirement is to prepend keywords with the name of
Bloom filter [1] as a per document index to track words in the field they belong to. Thus for example, the keyword
each document. In their scheme, a word is represented in an “From:James” belongs to the “From” field and cannot
index by a code word obtained by applying pseudo-random be confused with the keyword “To:James” that belongs
functions twice once with the word as input and once with a
to the “To” field.
unique document identifier as input. This non-standard use of
pseudo-random functions ensures that the codewords  Assume that every keyword field is defined for every
representing a word X are different for each document in the document, requirement can be easily satisfied. In our

16
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

email example, they may assign the key word “ order of the documents but not the relevance score of the
Subject:NULL ” in the “Subject” field to the emails documents) and understand an “as-strong-as possible” ranked
that have no subject. searchable symmetric encryption..
The documents are identified with the vector of m In information retrieval (IR) [14], a ranking function is
keywords which characterize them. For (i = 1... n), they used to evaluating relevance scores of matching files to a
denote the ith document by D i = (Wi;1 ...... Wi;m), where Wi;j is given search request. The most commonly used statistical
the keyword of document Di in the jth keyword field. The measurement for calculating relevance score in the
body of the ith data document can be encrypted with a information retrieval community uses the TF×IDF rule, where
standard symmetric key cipher and stored on the server next to term frequency (TF) is the number of times a given term or
the vector of keywords Di. For ease of presentation they may keyword appears within a file (to measure the significance of
ignore the body of the document and concern themselves only the term within the particular file), and inverse document
with the encryption of the keyword vector D i. When frequency (IDF) is obtained by dividing the number of files in
discussing a capability that enables the server to verify that a the whole collection of document by the number of files
document contains a specific keyword in field j, they denote containing the term (to measure the overall importance of the
the keyword by Wj. In this conjunctive keyword search term within the collection of documents). Among several
scheme there, it returns true if the expression ((W i;j1 = Wj1 ) ˄ hundred variations of the TF×IDF weighting scheme, no
(Wi;j2 = Wj2 ) ˄ .....˄ (W i;j1 = Wj1 )) holds and false otherwise. single combination of them out performs any of the others
In predicate encryption [7] scheme are support both universally [13]. Thus, without loss of generality, they choose
conjunctive and disjunctive search. an example formula that is commonly used and widely seen in
the literature (see [6]) for the relevance score calculation is as
C. Keyword Ranked Search follows:
In keyword ranked search Curtmola et al [12] discussed
about vector space model working and similarity based
ranking multi-keyword text search scheme. Vector space
model is well known technique which provides TF-IDF
weight rule through which we obtain accurate ranking result.
In this they gives the revised Searchable Symmetric
Encryption Scheme for, SSE-1 is efficient, it was only proven
secure against non-adaptive adversaries. A second Searchable
Symmetric Encryption SSE-2, which accomplishes semantic
security against adaptive adversaries, and Multi-user
Searchable Symmetric Encryption MSSE.
Searchable encryption permits data owner to outsource his
data in an encrypted manner while maintaining the selectively
searching capability over the encrypted data. Generally, In [2] focus on single keyword search, this case, the IDF
searchable encryption can be accomplished in its full factor is always constant with regard to the given searched
functionality using an oblivious RAMs [3]. Although hiding keyword. Thus, search results can be accurately ranked based
everything during the search from a malicious server only on the term frequency and file length information
(including access pattern), by the utilization of oblivious RAM contained within the single file.
usually brings the cost of logarithmic number of interactions
between the user and the server for each search request. Thus, D. Multi-keyword Ranked Search
in order to accomplish more efficient solutions, almost all the In Multi-keyword ranked search [10], allows multiple
relative works on searchable encryption writing resort to the keyword in the search request and return documents in the
weakened security guarantee, i.e., revealing the access pattern order of their relevance to these keywords, in the ranking
and search pattern but nothing else. Here, access pattern principle uses coordinate matching. This means the presence
alludes to the result of the search query output, i.e., which files of keyword in the document or the query is shown as ‘1’ or
have been retrieved. The search pattern incorporates the else ‘0’ in the data vector or the query vector. In the search
equality pattern among the two search requests (whether two query there are more factors which make impact on the search
searches were performed for the same keyword), any usability. For example, when one keyword appears in most
information derived thereafter from this statement. documents in the data set, the significance of this keyword in
Curtmola et al [12] shows that following the exactly same the query is less than other keywords which appears in less
security certification of existing SSE scheme, it would be very documents. Similarly, if one document contains a query
inefficient to accomplish ranked keyword search, which keyword in numerous locations, the user may prefer this to the
motivates us to further weaken the security guarantee of other document which contains the query keyword in only one
existing SSE appropriately (only leak the relative relevance location. To capture these information in the search process,

17
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

we use the TF×IDF weighting rule within the vector space enable searching capabilities over E for the effective data
model to calculate such similarity measure. Here in the utilization, the user (data owner) need to build an index I from
building index is a onetime process that means the index the documents F. The index building comes before
construction is done before outsourcing the document or data, outsourcing the documents F. After building the index from
i.e. if any modification is necessary then the index the data document, the user (data owner) outsources encrypted
construction has to done again for all document. This is document collection E and the index I to the storage server,
because that TF-IDF rule is related to whole document. In this for searching the document collection. In the proposed system
similarity measure introduces high computation cost during it is assumed that another basic system exists for trust server
the index construction and trapdoor generation; it captures which consists of user access control and search control.
more related information on the content of documents and
query that returns better results for user’s interest.

E. Server Cache
Server cache is techniques used for caching objects for
reduce the server computation. This will help for the user to
access the data easily. By the help of caching improve access
speed and reduce the work load on the server. In [9] server
caching provides better performance for the auctions and also
suggests that cache at the application server can save a critical
number of accesses to a backend database and thus reduce the
server-side latency. In general, work of web caching can be
classified into browser caching, client-side proxy caching,
network caching (as in Content Delivery Networks), and
server side caching. Any time request is performed, a cache hit
occurs when keyword present, otherwise a cache miss occurs Fig. 1: Architecture of multilevel keyword ranked search for encrypted data.
and the information about the request has to be retrieved from
the storage. In traditional distributed file system, the server
maintains a cache of blocks that have been accessed by the
clients. The server cache is lower in the storage hierarchy than After receiving the user search request from the user (data
the client caches and therefore has a lower hit rate. Studies search user), the server is responsible to search the index I and
show that the server cache is still powerful in reducing server return the corresponding set of encrypted documents (TF-IDF
disk traffic and improving the performance of the file system. and document-id). For improving the document retrieval
accuracy, search result should be ranked according to some
In [15], the idea of cache memories is identical to virtual ranking criteria. For reducing the communication cost, user
memory in that some active portion of a low-speed memory is (data search user) sends an optional number k (retrieve top
stored in duplicate in a higher-speed cache memory. When a rank up to k) along with search query and the storage server
memory request is produced, the request is first presented to sends back the Top k documents that are most relevant to the
the cache memory, and if the cache cannot replies, the request search query.
is then presented to main memory. For READ operations that
cause a cache miss, the item is retrieved from main memory B. Index Construction
and copied into the cache.
In the proposed method the user U (data owner) has a
collection of data documents. In Fig.2 shows the index
II. Procedure construction of the proposed method. Before encrypting the
document the user need to build an index, by using the index
In this session the multilevel keyword ranked search for the user can search on the encrypted document. Each of these
encrypted data with caching feature is described. The main indexes (user interest keywords) is referred to the documents
contents of this session are System model, Index construction E. Before building up the indexes these documents need to go
and Search construction. through several pre-processing stages. In information retrieval
the pre-processing methods like Tokenization, Stemming
A. System model process, Stop words removal are used and the rest is
The proposed system consist of 3 different entities, data considered as the keyword. In normal cases these keywords
user, trust server and storage server, as illustrated in figure 1. are taken as indexes. The pre-processing is done using Natural
Here the users are of two type i.e. owner of the data document Language Processing (NLP) tool-kit. For stemming process
and the user who access or search these documents. The user there are several stemming algorithms each of these have their
(data owner) has a collection of data documents F to be own drawbacks and works on the basis of certain criteria or
outsourced to the storage server in the encrypted form E. For algorithms. For avoiding these draw backs, here in the

18
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

stemming process dictionary based stemming is used.


Dictionary stemmers work quite differently from algorithmic
The TF is calculated by the following expression:
stemmers. Instead of applying a standard set of rules to each
word, they simply look up the word in the dictionary.
Theoretically, they could produce much better results than an
algorithmic stemmer. A dictionary stemmer should be able to
do the following: (1)
 Return the correct root word for irregular forms such as
feet and mice.
Inverse Document Frequency (IDF)is calculated using the
 Recognize the distinction between words that are expression:
similar but have different word senses, for example,
organ and organization.
The above features are the advantage of the dictionary
based stemming. (2)
After the stemming process stop words removal is done.
Stop words are natural language words which have very little
meaning, such as “and”, “the”, “a”, “an” and similar words. Then the TF-IDF value is calculated for these indexes by
Stop words are filtered out before or after processing of using the following expression:
natural language data (text). Though stop words usually refer
to the most common words in a language, there is no single
universal list of stop words used by all natural language
processing tools. Some tools specifically avoid removing these (3)
stop words to support phrase search.
After these processes have been completed the remaining
The above expressions are used as the basic rules applied
words are taken as keywords. In normal cases these keywords
for how important the word is for a document. The TF-IDF is
are used as index for documents. For each documents there
the weighting factor which is commonly used in information
may be different keywords associated with the document. For
retrieval. A hashing is applied to encrypt these indexes; the
selecting the index some criteria is used. The criteria are that
hashed indexes and the corresponding TF-IDF are stored to
the word which has a count more than a threshold value are
the storage server. The data document are encrypted by AES
selecting as the index for each document. After the index is
encryption, and outsourced.
generated, the term frequency (TF) for that word is calculated.
C. Search Construction
The keywords are the user interest words used for
searching the documents. The generated keywords are the
indexes constructed for the encrypted document. These
keywords help the user while searching the encrypted
document. Then the user has to enter the keyword in the
search area with top k retrieval (rank order less than top k).
When the user enters the keyword, pre-processing is done for
each keyword and hashing is also done for these keyword.
Then these keywords are send to the server, and if match
found with the stored index, the corresponding TF-IDF weight
is taken. If match occurs for more documents then each
document is taken in the order of their TF-IDF weight. The
top k documents are retrieved; these retrieved documents are
relevant to the search query keyword. If a user searches a
particular keyword repeatedly over a particular threshold, then
that keyword relevance document id and TF-IDF value are
cached, so on the next search for that keyword the document
retrieval would be from the cache. By using the cache the
searching could be done easily with reduced search time.
In the single keyword search, the user enters the keyword
M and top k. The M will go through pre-processing steps and

19

Fig. 2: Flow Chart for User build index and storage


Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

hashing is done for encrypting the keyword. If the user


searches a particular keyword repeatedly over a threshold
value(in this case uses 3 as threshold), then that keyword
relevance document-id and TF-IDF weight are cached, the
user search keyword count is less than the threshold value then
the keyword count is get incremented by 1. Consider, user
search occurred when the keyword count less than threshold This (TF×IDF)(D1) is the weight for the document D1 in
value 3, then the M relevance document-id and TF-IDF weight case of multi-keywords, similarly for the remaining
are retrieved from the storage, in this case there occur cache documents. By using this calculated TF-IDF weights the
miss. The searched keyword count is greater than the ranking is done for the documents. The k value is used for
threshold value 3, and then the M relevance document-id and retrieving the top k no of documents which are relevant to the
TF-IDF weight are retrieved from the cache, the resultant keywords. The result is obtained for the keywords M 1 and M2.
documents would be ranked according to the TF-IDF weight. If the user needs to search for multi-keywords (M 1, M2,.....Mn),
In single keyword search the TF-IDF weight of the document there are n keywords for searching, then a keyword search in
is the TF-IDF of M to that document. the document D1 for M1, M2,.....Mn is done, there after similar
As the user tends to outsource a large volume of encrypted for keyword search is also done for the remaining documents.
document, search by multi-keyword is necessary which would The resultant TF-IDF of document D 1 is the sum of TF-IDF of
help to retrieve relevant documents with user interest. In case each keyword M1, M2 ...Mn for D1. In general, there are n
of a multi-keyword, the user enters multi-keywords (M1, M2) keywords and h documents of the form:-
in the search area, then pre-processing is done for M 1 & M2
keyword. After that each Keyword is hashed (i.e. first hashing
is perform for M1 and then for M2), then it is of the form
H(M1) and H(M2), and send to the server. In the storage server
the keyword H(M1) and H(M2) is checked with the stored hash (5)
keywords, if match occurred then the corresponding document
TF-IDF is retrieved. Finally, the checking is done as: The equation (5) can be summarized as follows:-
 First check H(M1) in document D1, if match found then
the TF-IDF for that word in D1 is taken.
 Then check H(M2) in document D1, if again match
found then the TF-IDF for that word in D1 is taken. (6)
 Finally, the TF-IDF obtained for different keywords of In figure 3 shows how the proposed system work with
single document D1 is added up and the result obtained multilevel search and reduce the search time, multilevel
is the TF-IDF of that document, as given below: keyword ranked search would help the users to cache the
results. Here the cache is applied for each keyword. Each
keyword is associated with the TF-IDF weight and document-
id of the corresponding document. While performing the
search operation again with the same keywords, the TF-IDF
(4) weight and document-id of the corresponding keyword would
be retrieved from the cache, so that search time complexity
could be reduced. The proposed method would be an efficient
method to overcome the problem of search time.

III. Results And Discussion


This session deals with the results and analysis of the
proposed method. The analysis focuses on the reduced search
time for document retrieval using search keywords. The
proposed method with cache is compared with the method
without cache. Based on the analysis, the results are presented
below.

TABLE I: Effect of Search Time with Cache Vs


without Cache
Query size Search Time Search Time
(No of (Milliseconds) (Milliseconds)
keyword without cache with cache
Fig. 3: Flow Chart for User Search searched)
7 57 2
8 69 2
20
9 73 2
10 79 2
11 84 2
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

The table II shows p + q keywords in the query search, in


this p =7 and q =1.In case without cache, all the 8 keyword’s
TF-IDF weight and document-id are to be searched from the
storage server with search time 70 milliseconds but by using
cache, 7 keyword’s TF-IDF weight and document-id is taken
The dataset consist of 67 files and each containing from the cache and 1 keyword’s TF-IDF weight and
different no of keywords. The TF-IDF calculation is a onetime document-id is taken from the storage server reducing the
process, if modification is done to any of the files then search time to 11 milliseconds. Now from the analysis it is
keyword construction and recalculation of the TF-IDF Value clear that the search time in the proposed method is lower than
is needed. The table I shows the search time with the effect of the search time in the method without cache.
cache and without effect of cache. The caching is done for the
keywords related with the document id and their TF-IDF Similarly for the remaining p + q keywords, p =7 is made
weights. The analysis is done with the query size (No of constant and q is given values 2, 3, 4 in this analysis and the
keywords searched) of 7, 8, 9, 10, 11 and their related search search time without cache is shown as 84, 116, 129
time with cache and without cache is obtained. Consider that, millisecond and with cache is shown as 22, 25, 37 millisecond
when search occurs with query size 7 (i.e. no of keywords respectively. From these values it is clearly seen that with
searched) the corresponding results are, with cache is 2 cache the search time can be reduced to a great extent.
millisecond and without cache is 57 millisecond. From this it
is seen that the search time could be reduced by using the
multilevel cache method. The retrieved result is the relevant IV. Conclusion
documents in the ranked order; ranking is done with the In the proposed method of multilevel keyword ranked
related keyword’s TFIDF weights. search, the problem of increased search time have been
Similarly, for the remaining query size 8, 9, 10, 11 (i.e. No reduced. The coordinate matching i.e. as many matches as
of keywords searched) their related search time with cache is 2 possible to effectively capture the relevance of outsourced
millisecond and without cache is 69, 73, 79, 84 respectively as documents to the query keywords and use inner product
shown in the table I. From the table it is clear that the search similarity to quantitatively evaluate such similar measures was
time without cache is higher than search time with cache. The used. Along with the coordinate matching the ranking
proposed method with cache gives better search time as mechanism by TF-IDF weight rule was used for the relevance
compared to that method without cache. The table II shows, of the data documents and also the cache was included in the
the comparison between p + q keywords search time with proposed method. The TF-IDF weights and the document ids
cache and without cache. In this p and q are no of keywords of the keywords that are frequently used are stored in the
with cache and without cache respectively, i.e. p keywords cache. Then after the keywords TF-IFD weight and document
document id and their TF-IDF weight is in the cache, and q id is taken directly from the cache which reduces the search
keywords TF-IDF weight and document id are taken from the time rather than from retrieving the information from the
storage. In this analysis, the value of p is made constant i.e. 7 storage. The analysis was done based on the performance with
keywords and the value q is varied from 1 to 4. and without cache resulting in reduced search time in case of
the method with cache.
As future work, enhancements using a clustering based
Effect of Search Time with
TABLE II: search method, to reduce the search time and by adding
multiple levels at multiple locations, to gain more efficiency
p keywords Cache and q could be added along with the proposed method.
keywords not Cache Vs without
Cache for p+q keyword Search. References
[1] Steven M. and William R. Cheswick, “Privacy-Enhanced
Query No of No of Search Search Searches Using Encrypted Bloom Filters”, Cryptology ePrint
size (No keywords keywords Time Time Archive, Report 2004/022, http://eprint.iacr.org, 2004.
of cached (p) not (Millisecon (Millise
[2] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked
keywor cached ds) without conds) Keyword Search over Encrypted Cloud Data””, Proc. IEEE 30th
d (q) cache with Intl Conf. Distributed Computing Systems, 2010,
searche cache
[3] C. Wang, N. Cao, K. Ren, and W. Lou, “Enabling Secure and
d)
Efficient Ranked Keyword Search over Outsourced Cloud
8 7 1 70 11 Data”, IEEE Trans. Parallel and Distributed Systems, vol- 23(8),
Aug 2012, pp. 1467–1479.
9 7 2 84 22
10 7 3 116 25
11 7 4 129 37

21
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003

[4] D. Song, D. Wagner, and A. Perrig, “Practical Techniques for [10] N. Cao, C.Wang, M. Li, K. Ren, and W. Lou, “Privacy-
Searches on Encrypted Data”, Proc. IEEE Symp. Security and Preserving Multi-Keyword Ranked Search over Encrypted
Privacy, 2000, Cloud Data”, IEEE TRANSACTIONS ON PARALLEL AND
[5] E.J.Goh, “Secure Indexes”, IEEE Transactions on Information DISTRIBUTED SYSTEMS,Vol-25, NO.1,JANUARY,2014.
Theory, Cryptology ePrint Archive, [11] P. Golle, J. Staddon, and B.Waters,“Secure Conjunctive
http://eprint.iacr.org/2003/216. 2003. Keyword Search over Encrypted Data”, Proc. Applied
[6] I.H. Witten, A. Moffat, and T.C. Bell. , “Managing Gigabytes: Cryptography and Network Security, 2004,pp:-31-45,
Compressing and Indexing Documents and Images”, Morgan [12] R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky,
Kaufmann, May, 1999, “Searchable Symmetric Encryption: Improved Definitions and
[7] J. Katz, A. Sahai, and B. Waters, “Predicate Encryption Efficient Constructions, Proc. 13th ACM Conf. Computer and
Supporting Disjunctions, Polynomial Equations, and Inner Comm. Security, 2006,
Products”,Proc. 27th Ann. Intl Conf. Theory and Applications of [13] J. Zobel and A. Moffat, “Exploring the Similarity Space”,
Cryptographic Techniques (EUROCRYPT), 2008. SIGIR Forum, 32(1), 1998, pp:-18-34.
[8] S. Kamara and K. Lauter, “Cryptographic Cloud Storage”, Proc. [14] Singhal (2001). Modern information retrieval: A brief
14th Intl Conf. Financial Cryptography and Data Security, Jan, overview,. IEEE Data Eng, 24, 35–43.
2010, [15] Hongqing Liu, Stacy Weng, and Wei Sun, “Introduction of
[9] Daniel A. Menasc and Vasudeva Akula,“Improving the Cache Memory”, 2001 https://www.cs.umd.edu/class/fall-
Performance of Online Auctions Through Server-side Activity 2001/cmsc411/proj01/cache/cache.pdf
Based Caching”, http://cs.gmu.edu/ menasce/papers/menasce-
akula-wwwj.pdf

22

You might also like