An Efficient and Secure Multilevel Keyword Ranked Search For Encrypted Data
An Efficient and Secure Multilevel Keyword Ranked Search For Encrypted Data
An Efficient and Secure Multilevel Keyword Ranked Search For Encrypted Data
003
(E-mail: [email protected],[email protected])
15
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003
The rest of the paper is organized into four sections. set, and this technique together with blinding indexes with
Section II deals with the related works. Methodology is random tokens, ensures that there indexes are semantic
discussed in section III. Section IV involves the analysis of the security against adaptive chosen keyword attack (ind-cka)
work and the conclusion of the work is given in section V. secure.
Kmara et al [8] has developed a similar per-file index
I. Related Works scheme. In this method the files are stored in encrypted form,
later the user U wants to retrieve files containing (indexed by)
This session includes the related works that are done in the some keyword search, is not possible. There is no any straight
field of Searchable encryption are Single keyword Searchable forward way to do keyword search unless U leak the
Encryption and Boolean Keyword Searchable Encryption. decryption key. Here U has to create keyword index associates
Single keyword searchable encryption schemes usually build each keyword with its associated files. All keyword searches
an encrypted searchable index such that its content is by U are based on this index; hence their scheme does not
concealed to the server unless it is given appropriate trapdoors offer full pattern-matching generality with the real text. In
generated via secret keys [8]. In boolean keyword searchable practice, this should be sufficient for most users. It is worth
encryption, conjunctive and disjunctive search are used. All noting that in this system U can have complete control over
these searchable encryption are not dealing with the searching what words are keywords and which keywords are associated
time, considering with the basic introduction to cache is much with which files, a power that can be useful for many
better for those problems.as well, for math, etc. applications. let U use pseudo-random bits to mask a
dictionary-based keyword index for each file and send it to S
A. Single Keyword Searchable Encryption in such a way that later U can use the short seeds to help S
Song et al [4] introduced the notation of searchable recover selective parts of the index, while maintaining the
encryption, based on symmetric key setting, where each word remaining parts pseudo-random.
in the file is encrypted independently under a special two
layered encryption construction. Thus, a searching overhead is B. Boolean Keyword Searchable Encryption
linear to the whole file collection length. In this method the In the Boolean keyword search conjunctive and disjunctive
searching is done by, an index contains a list of keywords. search are used. Conjunctive keyword search returns “ all-or-
With each keyword a list of pointers to documents where the nothing . Which means that returns those documents in which
key word appears. These keywords are words of interest that all the keywords specified by the search query appear. In
user want to search later. One possible advantage for this disjunctive keyword search that returns undifferentiated
scheme is that the request could be embedded in other results. This means it returns every document that contains a
retrievals so that Bob might have uncertainty about the subset of the specific keyword, even only one key word of
correlation of the search request and the retrieval request for interest. The paper [10] gives a basic idea for privacy
cipher text. The disadvantage is that user has to spend an extra preserving Multi-keyword ranked search over encrypted data
round-trip time to retrieve the documents. A general (MSRE). This is based on secure inner product computation,
disadvantage for index search is that whenever user changes and give two significantly improved MRSE schemes to
the documents, its index must be updated. achieve various strict privacy requirements.
The limitations of [4] was the work load for each search 1) Conjunctive & Disjunctive Keyword Search: Golle et
request proportional to the number of files in the collection. al [11] proposed conjunctive keyword search over encrypted
To overcome this limitation E.J Goh [5] developed a Bloom
data. Consider a user that stores encrypted documents on an
filter based per-file index, reducing the work load for each
search request proportional to the number of files in the untrusted server. Let p be the total number of documents, and
collection. A secure index is a data structure that permits a assume there are m keyword fields associated with each
queries with a trapdoor for a word X to test in O(1) time only document. For an example consider, documents were emails.
if the index contains X; The index disclose no information They define the following 4 keyword fields: “From”, “To”,
about its contents without valid trapdoors, and trapdoors can “Date” and “Subject”. For the simplicity, they make the
only be generated with a secret key. Secure indexes [5] are a following assumptions:
natural extension of the problem of constructing data Assume that the same keyword never arise in two
structures with privacy ensures such as those provided by different keyword fields. The easiest way to satisfy this
oblivious and history independent data structures. They use requirement is to prepend keywords with the name of
Bloom filter [1] as a per document index to track words in the field they belong to. Thus for example, the keyword
each document. In their scheme, a word is represented in an “From:James” belongs to the “From” field and cannot
index by a code word obtained by applying pseudo-random be confused with the keyword “To:James” that belongs
functions twice once with the word as input and once with a
to the “To” field.
unique document identifier as input. This non-standard use of
pseudo-random functions ensures that the codewords Assume that every keyword field is defined for every
representing a word X are different for each document in the document, requirement can be easily satisfied. In our
16
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003
email example, they may assign the key word “ order of the documents but not the relevance score of the
Subject:NULL ” in the “Subject” field to the emails documents) and understand an “as-strong-as possible” ranked
that have no subject. searchable symmetric encryption..
The documents are identified with the vector of m In information retrieval (IR) [14], a ranking function is
keywords which characterize them. For (i = 1... n), they used to evaluating relevance scores of matching files to a
denote the ith document by D i = (Wi;1 ...... Wi;m), where Wi;j is given search request. The most commonly used statistical
the keyword of document Di in the jth keyword field. The measurement for calculating relevance score in the
body of the ith data document can be encrypted with a information retrieval community uses the TF×IDF rule, where
standard symmetric key cipher and stored on the server next to term frequency (TF) is the number of times a given term or
the vector of keywords Di. For ease of presentation they may keyword appears within a file (to measure the significance of
ignore the body of the document and concern themselves only the term within the particular file), and inverse document
with the encryption of the keyword vector D i. When frequency (IDF) is obtained by dividing the number of files in
discussing a capability that enables the server to verify that a the whole collection of document by the number of files
document contains a specific keyword in field j, they denote containing the term (to measure the overall importance of the
the keyword by Wj. In this conjunctive keyword search term within the collection of documents). Among several
scheme there, it returns true if the expression ((W i;j1 = Wj1 ) ˄ hundred variations of the TF×IDF weighting scheme, no
(Wi;j2 = Wj2 ) ˄ .....˄ (W i;j1 = Wj1 )) holds and false otherwise. single combination of them out performs any of the others
In predicate encryption [7] scheme are support both universally [13]. Thus, without loss of generality, they choose
conjunctive and disjunctive search. an example formula that is commonly used and widely seen in
the literature (see [6]) for the relevance score calculation is as
C. Keyword Ranked Search follows:
In keyword ranked search Curtmola et al [12] discussed
about vector space model working and similarity based
ranking multi-keyword text search scheme. Vector space
model is well known technique which provides TF-IDF
weight rule through which we obtain accurate ranking result.
In this they gives the revised Searchable Symmetric
Encryption Scheme for, SSE-1 is efficient, it was only proven
secure against non-adaptive adversaries. A second Searchable
Symmetric Encryption SSE-2, which accomplishes semantic
security against adaptive adversaries, and Multi-user
Searchable Symmetric Encryption MSSE.
Searchable encryption permits data owner to outsource his
data in an encrypted manner while maintaining the selectively
searching capability over the encrypted data. Generally, In [2] focus on single keyword search, this case, the IDF
searchable encryption can be accomplished in its full factor is always constant with regard to the given searched
functionality using an oblivious RAMs [3]. Although hiding keyword. Thus, search results can be accurately ranked based
everything during the search from a malicious server only on the term frequency and file length information
(including access pattern), by the utilization of oblivious RAM contained within the single file.
usually brings the cost of logarithmic number of interactions
between the user and the server for each search request. Thus, D. Multi-keyword Ranked Search
in order to accomplish more efficient solutions, almost all the In Multi-keyword ranked search [10], allows multiple
relative works on searchable encryption writing resort to the keyword in the search request and return documents in the
weakened security guarantee, i.e., revealing the access pattern order of their relevance to these keywords, in the ranking
and search pattern but nothing else. Here, access pattern principle uses coordinate matching. This means the presence
alludes to the result of the search query output, i.e., which files of keyword in the document or the query is shown as ‘1’ or
have been retrieved. The search pattern incorporates the else ‘0’ in the data vector or the query vector. In the search
equality pattern among the two search requests (whether two query there are more factors which make impact on the search
searches were performed for the same keyword), any usability. For example, when one keyword appears in most
information derived thereafter from this statement. documents in the data set, the significance of this keyword in
Curtmola et al [12] shows that following the exactly same the query is less than other keywords which appears in less
security certification of existing SSE scheme, it would be very documents. Similarly, if one document contains a query
inefficient to accomplish ranked keyword search, which keyword in numerous locations, the user may prefer this to the
motivates us to further weaken the security guarantee of other document which contains the query keyword in only one
existing SSE appropriately (only leak the relative relevance location. To capture these information in the search process,
17
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003
we use the TF×IDF weighting rule within the vector space enable searching capabilities over E for the effective data
model to calculate such similarity measure. Here in the utilization, the user (data owner) need to build an index I from
building index is a onetime process that means the index the documents F. The index building comes before
construction is done before outsourcing the document or data, outsourcing the documents F. After building the index from
i.e. if any modification is necessary then the index the data document, the user (data owner) outsources encrypted
construction has to done again for all document. This is document collection E and the index I to the storage server,
because that TF-IDF rule is related to whole document. In this for searching the document collection. In the proposed system
similarity measure introduces high computation cost during it is assumed that another basic system exists for trust server
the index construction and trapdoor generation; it captures which consists of user access control and search control.
more related information on the content of documents and
query that returns better results for user’s interest.
E. Server Cache
Server cache is techniques used for caching objects for
reduce the server computation. This will help for the user to
access the data easily. By the help of caching improve access
speed and reduce the work load on the server. In [9] server
caching provides better performance for the auctions and also
suggests that cache at the application server can save a critical
number of accesses to a backend database and thus reduce the
server-side latency. In general, work of web caching can be
classified into browser caching, client-side proxy caching,
network caching (as in Content Delivery Networks), and
server side caching. Any time request is performed, a cache hit
occurs when keyword present, otherwise a cache miss occurs Fig. 1: Architecture of multilevel keyword ranked search for encrypted data.
and the information about the request has to be retrieved from
the storage. In traditional distributed file system, the server
maintains a cache of blocks that have been accessed by the
clients. The server cache is lower in the storage hierarchy than After receiving the user search request from the user (data
the client caches and therefore has a lower hit rate. Studies search user), the server is responsible to search the index I and
show that the server cache is still powerful in reducing server return the corresponding set of encrypted documents (TF-IDF
disk traffic and improving the performance of the file system. and document-id). For improving the document retrieval
accuracy, search result should be ranked according to some
In [15], the idea of cache memories is identical to virtual ranking criteria. For reducing the communication cost, user
memory in that some active portion of a low-speed memory is (data search user) sends an optional number k (retrieve top
stored in duplicate in a higher-speed cache memory. When a rank up to k) along with search query and the storage server
memory request is produced, the request is first presented to sends back the Top k documents that are most relevant to the
the cache memory, and if the cache cannot replies, the request search query.
is then presented to main memory. For READ operations that
cause a cache miss, the item is retrieved from main memory B. Index Construction
and copied into the cache.
In the proposed method the user U (data owner) has a
collection of data documents. In Fig.2 shows the index
II. Procedure construction of the proposed method. Before encrypting the
document the user need to build an index, by using the index
In this session the multilevel keyword ranked search for the user can search on the encrypted document. Each of these
encrypted data with caching feature is described. The main indexes (user interest keywords) is referred to the documents
contents of this session are System model, Index construction E. Before building up the indexes these documents need to go
and Search construction. through several pre-processing stages. In information retrieval
the pre-processing methods like Tokenization, Stemming
A. System model process, Stop words removal are used and the rest is
The proposed system consist of 3 different entities, data considered as the keyword. In normal cases these keywords
user, trust server and storage server, as illustrated in figure 1. are taken as indexes. The pre-processing is done using Natural
Here the users are of two type i.e. owner of the data document Language Processing (NLP) tool-kit. For stemming process
and the user who access or search these documents. The user there are several stemming algorithms each of these have their
(data owner) has a collection of data documents F to be own drawbacks and works on the basis of certain criteria or
outsourced to the storage server in the encrypted form E. For algorithms. For avoiding these draw backs, here in the
18
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003
19
21
Website: ijetms.in Issue:4, Volume No.4, July-2020 DOI: 10.46647/ijetms.2020.v04i04.003
[4] D. Song, D. Wagner, and A. Perrig, “Practical Techniques for [10] N. Cao, C.Wang, M. Li, K. Ren, and W. Lou, “Privacy-
Searches on Encrypted Data”, Proc. IEEE Symp. Security and Preserving Multi-Keyword Ranked Search over Encrypted
Privacy, 2000, Cloud Data”, IEEE TRANSACTIONS ON PARALLEL AND
[5] E.J.Goh, “Secure Indexes”, IEEE Transactions on Information DISTRIBUTED SYSTEMS,Vol-25, NO.1,JANUARY,2014.
Theory, Cryptology ePrint Archive, [11] P. Golle, J. Staddon, and B.Waters,“Secure Conjunctive
http://eprint.iacr.org/2003/216. 2003. Keyword Search over Encrypted Data”, Proc. Applied
[6] I.H. Witten, A. Moffat, and T.C. Bell. , “Managing Gigabytes: Cryptography and Network Security, 2004,pp:-31-45,
Compressing and Indexing Documents and Images”, Morgan [12] R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky,
Kaufmann, May, 1999, “Searchable Symmetric Encryption: Improved Definitions and
[7] J. Katz, A. Sahai, and B. Waters, “Predicate Encryption Efficient Constructions, Proc. 13th ACM Conf. Computer and
Supporting Disjunctions, Polynomial Equations, and Inner Comm. Security, 2006,
Products”,Proc. 27th Ann. Intl Conf. Theory and Applications of [13] J. Zobel and A. Moffat, “Exploring the Similarity Space”,
Cryptographic Techniques (EUROCRYPT), 2008. SIGIR Forum, 32(1), 1998, pp:-18-34.
[8] S. Kamara and K. Lauter, “Cryptographic Cloud Storage”, Proc. [14] Singhal (2001). Modern information retrieval: A brief
14th Intl Conf. Financial Cryptography and Data Security, Jan, overview,. IEEE Data Eng, 24, 35–43.
2010, [15] Hongqing Liu, Stacy Weng, and Wei Sun, “Introduction of
[9] Daniel A. Menasc and Vasudeva Akula,“Improving the Cache Memory”, 2001 https://www.cs.umd.edu/class/fall-
Performance of Online Auctions Through Server-side Activity 2001/cmsc411/proj01/cache/cache.pdf
Based Caching”, http://cs.gmu.edu/ menasce/papers/menasce-
akula-wwwj.pdf
22