Building An Encrypted and Searchable Audit Log: (Balfanz, Gdurfee
Building An Encrypted and Searchable Audit Log: (Balfanz, Gdurfee
Building An Encrypted and Searchable Audit Log: (Balfanz, Gdurfee
1
Princeton University 2 Palo Alto Research Center
Computer Science Department 3333 Coyote Hill Road
Princeton, NJ 08544 Palo Alto, CA 94304
[email protected] {balfanz, gdurfee,
smetters}@parc.com
Abstract to be audited even when that system has been under ac-
tive attack by malicious insiders or outsiders [7, 9]. Cor-
rectly designed secure audit logging mechanisms can de-
Audit logs are an important part of any secure system, tect unauthorized past activity, even when the person per-
and they need to be carefully designed in order to give a forming that action goes to great lengths to cover their
faithful representation of past system activity. This is espe- tracks. The existence of such logs can be used to en-
cially true in the presence of adversaries who might want force correct user behavior, by holding users accountable
to tamper with the audit logs. While it is important that au- for their actions as recorded in the audit log. Such logs can
ditors can inspect audit logs to assess past system activity, be used in a wide variety of systems, from a control system
the content of an audit log may contain sensitive informa- that logs the commands a user issues, to a database system
tion, and should therefore be protected from unauthorized that logs the queries a user makes.
parties. Typically, when an organization wishes to inspect past
Protecting the contents of audit logs from unauthorized activity it will search the audit log for relevant informa-
parties (i.e., encrypting it), while making it efficiently tion. For example, if a certain user was suspected of be-
searchable by authorized auditors poses a problem. We de- having improperly the organization might search for all ac-
scribe an approach for constructing searchable encrypted tions performed by that particular user. If the organization
audit logs which can be combined with any number of exist- wishes to see all actions of a certain type, it might search
ing approaches for creating tamper-resistant logs. In par- for all log entries that match a given keyword. For an audit
ticular, we implemented an audit log for database queries log to be useful in practice, it is critical that it be efficiently
that uses hash chains for integrity protection and identity- searchable for keywords of interest.
based encryption with extracted keywords to enable search- At the same time, the contents of an audit log can be con-
ing on the encrypted log. Our technique for keyword search sidered to be sensitive information. For instance, knowing
on encrypted data has wide application beyond searchable what actions are made by a certain user could violate that
audit logs. individual’s privacy. If the log contains information about
not only what query was made, but what results were re-
turned, access to the audit log would imply effective ac-
1. Introduction cess to the database, circumventing database access con-
trols. The organization that owns the system being logged
System logs provide an invaluable view into the current might consider the information the log holds to be valuable
and past state of almost any type of complex system. Most and not wish to share it with others, while for robustness’
server software in existence today includes some logging sake, the organization may want to store backup copies of
mechanisms. the audit log information at sites it may not completely con-
Secure versions of such logs, designed to defend against trol. In general, this means that the contents of the audit log
malicious tampering, allow the current state of the system must be encrypted. However, this makes it extremely diffi-
cult to search.
∗ The majority of this work was completed while the author was a sum-
Using traditional techniques, searching the log would re-
mer intern at PARC.
quire decrypting every record. This approach has several The rest of the paper is organized as follows. In Sec-
disadvantages. First, it requires decrypting all of the log tion 2 we describe related work. Sections 3 and 4 introduce
data, regardless of what information one is looking for; this secure audit logs in general, and our system in particular.
opens opportunities for unintended access to log records Section 5.1 presents a symmetric key based scheme, while
other than the ones relevant to the current investigation. Section 5.2 presents an public-key scheme based on IBE. In
Second, it requires the entity with the decryption key to Section 6 we present our implementation of a proxy server
interactively process all the log data, which can be quite that creates a searchable audit log of database queries and
large. In many applications, one would like to entrust the discuss its performance. Finally, we conclude in Section 7.
ability to decrypt audit logs to an entity or system with high
levels of trust and assurance; requiring that system to also 2. Related Work
be able to process large quantities of log data in an on-line
fashion limits one’s choice of trusted parties. It would be S EARCHING ON E NCRYPTED DATA . Song et al. [11]
preferable to be able to selectively delegate the ability to study the problem of searching on encrypted data in a
search the log to parties with the means to process the data. symmetric-key setting. In a symmetric key based scheme,
The key challenge to building a successful, secure audit the keys that are used to create the encrypted entries also
logging system is to simultaneously protect the integrity of allow search and decryption of the audit log. Thus, servers
the audit log, control access to contents, and maintain its that construct audit log entries possess keys capable of de-
usefulness by making it searchable. crypting log entries. We discuss the shortcomings of such
In this paper, we present a design for an encrypted audit an approach in Section 5.1 and contrast it with a public-key
log that allows a designated trusted party, the audit escrow based scheme. Our public-key based scheme easily allows
agent, to construct keyword search capabilities, which al- audit escrow agents (and only those agents) to create capa-
low (less trusted) investigators in possession of such capa- bilities to search the audit log for certain keywords.
bilities to search for and decrypt entries matching a given Goh examines how Bloom Filters can be used to make
keyword. The escrow agent can distribute a capability to searching on encrypted data more efficient [4]. Like Song
an investigator if he deems it appropriate. Since we expect et al., Goh presents a scheme in the symmetric key setting.
keyword search capabilities to be distributed rather infre- He presents a scheme where a encrypted data consists of
quently, the escrow agent can be made to be very secure the encryption of a document and a Bloom Filter attached
from attack. that is used for keyword searching.
We developed a public key based cryptographic scheme Boneh et al. [2] have also recently examined the problem
that allows keyword searching on encrypted data by adapt- of searching on publicly encrypted data. They indepen-
ing Boneh and Franklin’s [3] Identity-Based Encryption dently devised a scheme based on the Identity-Based En-
(IBE) scheme. (We note that the cryptographic scheme we cryption scheme of Boneh and Franklin [3]. Their scheme
use is similar to a scheme that was independently discov- is similar to our underlying cryptographic scheme in its
ered by Boneh et al. [2]; see Section 2 for details.) In an construction and security properties. The contribution of
IBE scheme, public keys can be arbitrary strings – e.g., their work is different, however: they provide a detailed
“[email protected]”. Private keys are derived from public theoretical analysis which includes a precise definition of
keys through use of a system-wide master secret, known what they call searchable public-key encryption, along with
by a trusted authority. In our design, search keywords are three constructions that are provably secure in their model
used as IBE public keys, and the master secret is held by under suitable cryptographic assumptions. Our work, on
an authority trusted to issue keyword search capabilities for the other hand, introduces our independently developed
a given audit log, in our case, the audit escrow agent de- construction and focuses on the pragmatic security con-
scribed above. cerns regarding integrating it in a system for creating se-
In our design, the server generating audit log entries en- cure audit logs.
crypts entries with the public keys corresponding to the
keywords that are derived from those entries. The escrow AUDIT L OGS . Schneier and Kelsey [7, 8, 9] describe a
agent, which holds the IBE master secret, can construct a secure audit logging scheme capable of detecting any at-
search capability for a given keyword as the private key cor- tempt to delete or alter past audit log entries, even on a
responding to the given keyword. Furthermore, additional host that has been compromised (assuming the entries were
security properties of Boneh and Franklin’s scheme imply made before the compromise). Such tampering can be de-
that an adversary cannot tell which public key was used to tected even if the compromised host has not been able to
create a ciphertext when given the ciphertext. Thus, when offload any state information to another host; an operation
an encrypted audit log entry is created, even its search key- referred to as “checkpointing” in the discussion below. To
words are hidden. accomplish this, a system opening a new audit log first es-
tablishes a shared secret A0 with a trusted third party. After
each audit record is generated, the current shared secret, present and have not been altered. Audit logs can either be
Ai , is evolved – it is completely replaced by a new shared publicly verifiable – verifiable by anyone holding appro-
secret, Ai+1 , computed as the cryptographic digest of the priately authenticated public information, e.g., the logging
previous shared secret, Ai . Each audit record is encrypted system’s public key, or an authenticated hash of all exist-
under a key Ki which is derived from the current value of ing audit entries. Or, they may require a trusted verifier –
Ai , and then the encrypted record is protected using a Mes- they can only be verified by a designated party holding one
sage Authentication Code (MAC) keyed with Ai . Records or more secrets, e.g., a MAC key. The choice of approach
are linked using a hash chain [6]. is application-dependent. Publicly verifiable audit log sys-
Because the secrets used to encrypt and authenticate each tems, e.g., systems that simply digitally sign each log entry
log record are completely replaced on the logging host af- they generate, allow easy storage of audit logs on untrusted
ter the record is generated, an attacker compromising that systems, and the increased trust resulting from the ability
host does not have the necessary information to go back of any interested party to verify the log. On the other hand,
and replace, delete, or modify existing log records stored trusted verifier systems, such as the Schneier and Kelsey
on that host. Any attempt to do so can be detected by the scheme described in [7, 8, 9] allow for a greater degree of
trusted third party, who retains A0 ,and can check that there forward security in an audit log system, making it possible
is a valid record authenticated with each MAC key Ai . This to detect attempts to delete audit log entries made any time
constitutes a form of forward security for the audit log. before a system is compromised, without requiring any in-
The use of symmetric MACs to authenticate log records formation about those entries to be communicated to the
means that only the trusted third party (or someone to outside world.
whom it has delegated a record authentication key, Ai ) can To verify an audit log, it must contain two types of in-
verify the audit log. As each record is encrypted with a formation. First, each entry must contain enough informa-
different key, the trusted third party can delegate the ability tion to verify its authenticity when considered on its own.
to decrypt particular audit records to designated individu- If some entries are altered or deleted, the ability to indi-
als, by giving them the keys used to encrypt those records. vidually verify the remaining entries (or blocks of entries)
However, it does not allow any form of search on the en- makes it possible to recover some useful information from
crypted audit data. the damaged log. Second, the individual entries must also
be linked together in a way that makes it possible to de-
3. Characteristics of a Secure Audit Log termine whether any entries are missing. Serial numbers
allow one to check whether all entries are present, but turn
We can identify three important properties a secure audit the problem of tampering with the log into one of attacking
log: those designed to prevent and detect tampering, and each entry individually. Hash chaining [6, 7, 8, 9], where
those designed to control data and search access. each entry contains a cryptographic digest of the previous
TAMPER R ESISTANCE . A secure audit log must be tamper entry, is a better solution, as it tightly links all entries in
resistant – it must guarantee that no one other than the cre- the chain.1 It also allows a very simple form of public ver-
ator of the log can create valid entries, and that once entries ifiability, where the hash of the most recent audit entry is
have been created, they cannot be altered. checkpointed, i.e., published via a trusted third party (e.g.,
One cannot prevent an attacker who has compromised the New York Times).
the system creating the log from altering what that system DATA ACCESS C ONTROL AND S EARCHABILITY. Given
will put in future log entries [7]. One also cannot prevent that the data in an audit log may be sensitive, it must be en-
him from deleting any log entries that have not already been crypted. However, one would like to be able to allow legit-
copied to another system. The goal of a secure audit log in imate search access to a subset of all audit log entries (e.g.,
such cases is to make sure that he cannot alter existing log all entries matching the keyword “Smith”). We present a
entries, and that any attempts to delete such existing entries new criteria for the construction of useful secure audit logs,
will be detected. Ideally, one would like to detect attempts namely that they allow the secure delegation of search ca-
to delete or alter any entries created up to the time a host pabilities.
is compromised [7, 8]. For for some applications it may Delegation of capabilities is important so that an inves-
be enough to have the logging host “checkpoint” its state tigator can search and view entries of a narrow scope. For
periodically – to copy its log data, or some function (e.g., a example, if Alice Smith wanted to investigate all entries
signature) of its log data to another host, and simply be able related to her the audit escrow agent might give her the
to assure that no entries up till the most recent checkpoint capability to search for all entries matching the keyword
have been deleted or altered.
1 In order to provide individual entry verifiability in a hash-chained
V ERIFIABILITY. A secure audit log must also be verifiable audit log, each log entry must explicitly contain the hash of the previous
– it must be possible to check that all entries in the log are entry to allow some recovery if that entry is missing.
“Smith”, but not give her anything more. The alternative These are the keywords that can be used to search for that
of having the master secret holder perform the searches is record in the future. Next, it encrypts the entry using the
undesirable since it unnecessarily exposes a highly trusted key Ki , producing the keyword search information cwn in
component of the system. the process. In Sections 5.1 and 5.2 we present two con-
For such delegation to be considered secure, it must be crete instantiations of this procedure. Finally, the server
impossible for an adversary to learn the content of entries constructs the verification data Vi . In our implementation,
in the audit log that he should not have access to (up to the we periodically “checkpoint” the audit log by publishing
security provided by the underlying encryption function, the most recent verification value, Vi , to one or more other
which might not, for instance, disguise characteristics such servers, producing a publicly verifiable audit log.
as the length of the audit log entry.) We allow our adversary
K EYWORD E XTRACTION . In our implementation, queries
to be an insider in the sense that he may be both a user of
are made in SQL. See Figure 1 for an explanation of how
the system, and may have had some legitimate search ca- we extract keywords from a query. Our set of keywords not
pabilities explicitly given to him by the audit escrow agent. only contains keywords from the query, but also metadata
We would like to ensure that, assuming he does not com-
such as the user who made the query and the time when the
promise the escrow agent itself, he is unable to view the
query was issued. Note that we prefix keywords with suit-
contents of any audit log entry, or even to learn which key-
able labels so that we can distinguish the case where user
words match an entry beyond those set of keywords and
“Alice Smith” is making a query from the case where some-
entries for which he has legitimate access. one makes a query mentioning the name “Alice Smith”.
user:
user: Alice
Alice Smith
Smith
keyword:
keyword: cars
cars
keyword:
keyword: make
make
keyword:
keyword: ford
ford
time:
time: 2003/08/26
2003/08/26 23:34:24
23:34:24
Figure 1. Extracting keywords for an audit record: audit records cannot only be searched by key-
words contained in the query logged, but also by meta-data such as user name and time.
H keyed with the secret S. In practice, HMAC-SHA1 can adversary is unable to link queries that had similar key-
be used in place of H. Let E be a symmetric encryption words, since r is an uniformly independent random value.
function; we denote by EK the function E keyed with K.
Suppose the server is to encrypt the log entry, m, along Search and Decryption: Recall there are t audit log
with keywords w1 , w2 , . . ., wn . Let flag be a constant bit- servers in our scheme, with the jth server holding a se-
string of length `. The server executes the following steps: cret S j . Suppose an investigator wishes to obtain a search
capability for the keyword w. The audit escrow agent (if he
1. The server chooses a random symmetric encryption
approves) constructs the search capability as
key, K, to be used only for this entry.
2. The server computes the encryption EK (m). dw := hHS1 (w), . . . , HSt (w)i.
3. The server chooses a random bit string r of some fixed j
We denote dw := HS j (w) as the search capability compo-
length. The random string r is uniformly indepen- nent corresponding to the jth server.
dently drawn for each entry. Once given the capability, the investigator visits each au-
4. For i from 1 to n the server computes dit log server. At the jth server, the investigator executes
the following:
ai := HS (wi ), bi := Hai (r), ci := bi ⊕ (flag|K).
1. The investigator computes p := Hd j (r), where r is the
w
In other words, for each keyword wi the PRF is first random string stored with the query.
keyed with S and is given input wi . The result ai is 2. For each ci in the entry, the investigator computes
then used to key the PRF which is then called with p ⊕ ci . If the first ` bits of the result matches flag, then
input r to give bi . The result bi is then XORed with the party extracts K as the remainder of the result; oth-
the concatenation of flag and the symmetric key K to erwise, the computation is disregarded. If none of the
give the output ci . results begin with flag, then the query is not a keyword
5. The server writes hEK (m), r, c1 , c2 , . . ., cn i as the en- match, and the investigator moves to the next query.
crypted entry to the audit log. 3. If one of the results did match, the investigator uses
the computed K to decrypt EK (m) to obtain m, the
Informally, an adversary that does not know S is unable
original audit log entry.
to compute ai = HS (wi ), and thus, bi , as long as the keyed
PRF H is secure. The adversary is thus unable to learn K, Suppose when encrypting an entry, the jth server uses the
and therefore cannot decrypt the entry. Additionally, the keyword wi to create ci . If wi = w, we have Hd j (r) ⊕ ci =
w
“user: Alice Smith”
1 capability
for search master
secret
capability
for search
2 audit
audit
record
record
audit
audit
record
record …
audit
audit
record
record
Figure 2. Searching the log: First, the investigator has to obtain a search capability for the keyword
in question. Then, the she can search the audit log for that keyword.
(flag|K); otherwise, this XOR will look random. There- Nevertheless, even with an efficient key update mecha-
fore, the beginning bits of the result can be tested against nism, this scheme has serious drawbacks. The most sig-
flag to determine if there is a match. (There is a 2−l chance nificant is that in order for the servers to be updated, the
of a false positive from this check. However, even in the audit escrow agent must have a “live” connection to a net-
event of a false positive from this check the decryption at- work shared with the servers. This makes the audit escrow
tempt will fail with very high probability. Since the length agent more vulnerable to attacks. Another concern is that
of ` does not actually affect the security of the scheme, the after v key updates for each of t servers, the size of a key-
length of the bitstring flag can have a length significantly word search capability is proportional to vt (which can be-
less than that of an encryption key. We note that a CRC or come quite large). Finally, if the adversary compromises
other simple checksum could also be used.) In the case of a the server, he may be able to learn a secret that would allow
match, the remainder of the result is the symmetric key K, him to act as the server and receive key updates from the
which can be used to decrypt the query. audit escrow agent. In this event, the audit log entries from
We note that the use of a pseudorandom function H to de- a compromised server will continue to be vulnerable, even
rive the search keyword capability implies that capabilities after the compromise was detected and the server repaired.
for different keywords appear to be independently random. These security issues indicate that it is best to put as little
In other words, an investigator receiving a capability dw for secret information into a server as possible, motivating the
a keyword w learns no new information about the capability asymmetric scheme outlined in the next section.
corresponding to any other keyword w0 .
5.2. Asymmetric Scheme
Discussion: The primary problem with the symmetric
The shortcomings of the symmetric key based scheme
method occurs in the case where an adversary is able to
suggest that an asymmetric key based scheme is necessary.
compromise a server’s secrets. If the adversary learns S j ,
We now present an asymmetric key based scheme for cre-
he can be able to create a search capability for any keyword
ating encrypted and searchable log entries. Our scheme is
that he wishes that can be used to search and decrypt on the
based on the Identity-Based Encryption scheme of Boneh
jth server.
and Franklin [3]. We first provide the reader with a brief
This problem is partially alleviated if we allow the server
review of IBE, then describe our scheme and discuss its
keys to be updated or evolved over time. If a particular
attributes.
secret S j was stolen from a server that used a key-update
scheme then the adversary will be able to use S j to search
all entries that were created since the last update, however, Identity-Based Encryption: In this section, we provide
he would not be able to read past log entries. a brief review of Identity-Based Encryption and some nec-
essary mathematical details.2 The Identity-Based Encryp- Setup: To set up our scheme, we first set up an instance
tion scheme we use is based on Tate pairings over super- of the above Identity-Based Encryption scheme. In our sys-
singular elliptic curves. tem, the audit escrow agent is given the IBE master secret
In an Identity-Based Encryption scheme, any arbitrary s, and all servers that contribute to the audit log are given
string can comprise a public key. If Alice wishes to send a the system parameters P.
message to Bob, she simply uses a string uniquely iden-
tifying Bob – say “[email protected]” – as the encryption Encryption: Suppose the server is to encrypt the log en-
key to encrypt her message. A system-wide master secret try m, along with keywords w1 , w2 , . . . , wn . The server per-
is used by a trusted escrow agent to generate the private forms the following steps:
key corresponding to a public key. Bob authenticates to
1. The server chooses a random symmetric encryption
the trusted third party (in the same way he might authen-
key, K, to be used only for this entry.
ticate to a CA) to obtain the private key corresponding to
“[email protected]”, which he then may use for decryption. 2. The server encrypts the log entry using K, to get
EK (m).
IBE S ETUP. To set up the system, one first selects large
3. For each keyword wi , the server computes the
primes p and q, two groups G1 and G2 of order3 q, and an
Identity-Based Encryption ci of the string (flag|K) us-
arbitrary generator P0 ∈ G1 . One also picks an admissible
ing wi as the public key and P as the public parame-
bilinear map e : G1 × G1 → G2 and two cryptographic hash
ters.
functions H1 : {0, 1}∗ → G1 and H2 : G2 → {0, 1}n . The
master secret is a random value s ∈ Zq , known only to the 4. The server writes EK (m), c1 , c2 , . . . , cn as the entry to
trusted escrow agent. The system parameters are the audit log.
Since a new key K is generated for each log entry (and is
P = (p, q, G1 , G2 , e, P0 , P1 ), where P1 = sP0 , thrown away by the server immediately after the log entry
is generated), the only way to recover a log entry is to de-
and are known by all parties. crypt one of the ci ’s and obtain K. It follows directly from
IBE K EY G ENERATION . To issue the private key corre- the security of Identity-Based Encryption [3] that the only
sponding to the public key w, the escrow agent uses the way to recover m is to know the private key correspond-
master secret s to compute dw := sH1 (w) ∈ G1 . ing to one of the keywords wi . The particular Identity-
Based Encryption scheme we have chosen also satisfies a
IBE E NCRYPTION . To encrypt the plaintext m ∈ {0, 1}n
stronger security property, namely key-privacy [1], which
using a string w as the public key, one (1) computes Qw =
implies that an adversary can obtain no information about
H1 (w) ∈ G1 , (2) computes gw = e(Qw , P1 ), (3) picks a ran-
what public key wi was used to produce any ciphertext ci .
dom r ∈ Zq , and (4) computes
(We refer the reader to [2] for a detailed proof of the key-
c = hrP0 , m ⊕ H2 (grw )i. privacy property for IBE.) This implies that the presence
of the ci in the log entry reveals no information about what
keywords are present in the log entry, and an attacker can-
IBE D ECRYPTION . To decrypt a ciphertext c = hU,V i us- not correlate entries in the audit log based on their keyword
ing dw as the private key, one computes tags.
m = V ⊕ H2 (e(dw ,U)).
Search and Decryption: Suppose an investigator wishes
Since e is a bilinear map4 ,
it follows that decryption opera- to obtain a search capability for the keyword w. The audit
tion is the inverse of the encryption operation. We refer the escrow agent (if he approves) constructs the capability dw
reader to [2, 3] for the details regarding the security of this as the Identity-Based Encryption private key corresponding
scheme. to the string w. For each audit log entry, the investigator
executes the following:
2 There are actually two IBE schemes described by Boneh and
Franklin: the simpler is semantically secure, the other satisfies chosen 1. For each ci the investigator attempts to IBE-decrypt
ciphertext security. To keep the presentation clear, we base our discussion ci using the private key dw . If the prefix of the result
on the semantically secure scheme. It is straightforward to generalize our matches flag then the investigator extracts K as the
work to the scheme satisfying chosen ciphertext security, and we recom-
remainder of the result. If none of the results begin
mend using their more secure scheme in any implementation of the secure
audit log system described here. with flag then the log entry does not match and the
3 To be precise, for the scheme we use, G is an order-q subgroup of
1 investigator moves to the next log entry.
elliptic curve group over a supersingular elliptic curve, while G2 is an
order-q subgroup F p2 . 2. If one of the results did match, the capability holder
4 That is, e(aP, bQ) = e(P, Q)ab for all P, Q ∈ G and all a, b ∈ Z . may compute K to decrypt EK (m) to obtain m.
1 q
Notice that the investigator holding some capability dw the servers may collect queries into “blocks” to be sent to
for keyword w will not be able to gain a capability dw0 to the audit log all at once.
search for another keyword w0 . Again, this follows directly Suppose a server collects log entries m1 , . . . , mt to be
from the security of the Identity-Based Encryption scheme: sent to the audit log, sharing in total the set of keywords
the capabilities correspond to different private keys in an w1 , . . . , wu . The server creates an audit log block and index
Identity-Based Encryption scheme, which cannot be de- as follows.
rived from each other, even if large numbers private keys
1. The server chooses random symmetric encryption
are known.
keys, K1 , . . ., Kt , for one-time use.
2. The server encrypts each log entry mi using Ki , to get
Discussion: This asymmetric scheme corrects many of
EKi (mi ).
the drawbacks of the symmetric scheme. Since each server
only stores public parameters, there are no secret keys for 3. For each distinct keyword w j , the server finds the
an attacker to steal. Compromising a server does not allow indices {i j,1 , . . ., i j,`( j) } for which w j is a keyword
the attacker to search or decrypt any entries in the audit log where `( j) is the number of entries for which w j is a
that have already been generated and stored. keyword. (That is, w j is a keyword in qi exactly when
A drawback of this scheme is the performance overhead i ∈ {i j,1 , . . . , i j,`( j) }.)
of using Identity-Based Encryption; however, optimiza- 4. The server computes the Identity-Based Encryption c j
tions (discussed in the next section) are available for speed- of the string
ing up our use of IBE.
We note that this scheme is also easy to modify to al- (flag|i j,1 |Ki j,1 | · · · |i j,`( j) |Ki j,`( j) )
low separating the ability to find records matching a given using w as the public key and P as the public parame-
keyword from the ability to decrypt those records. To do ters.
this, we omit the record key K in the IBE encryptions per-
5. The server writes EK1 (m1 ), . . . , EKt (mt ), c1 , . . . , cu as
formed that generate the tags ci (leaving only the flag), and
the block and index to the audit log.
add an encryption of K encrypted under another public key
belonging to the escrow agent. This introduces an extra As the length of the IBE-encrypted strings grow we may
“round trip” to the escrow agent to decrypt those records use hybrid encryption for efficiency: for a long string M we
for which a match is discovered. compute the IBE encryption a one-time symmetric key K0 ,
then perform block encryption of M using the symmetric
5.3. Optimizations for the Asymmetric Scheme key K0 . (This was not necessary in the non-indexed case,
as the strings encrypted were very short.)
The operations in the asymmetric scheme are signifi- Indexing introduces a significant performance advan-
cantly more expensive than those of the symmetric scheme.
tage for searching/decryption when keywords are repeated
The main bottlenecks are the computations of the pairing
among several audit log entries within a block. When a
and modular exponentiations for each keyword w. How-
keyword w is present in k entries in a log block, only one
ever, if the same keywords are used frequently then inter- pairing operation and one modular exponentiation are re-
mediate results can be reused. We discuss three such opti- quired to find and decrypt the k audit log entries.
mizations in this section.
Using indexing also results in a big performance win for
PAIRING R EUSE . Our first observation is that the com- audit log generation. For a keyword w appearing k times
putation of gw only needs to be performed once per key- in a block, again only one pairing operation and one modu-
word. Subsequent Identity-Based Encryptions using w as lar exponentiation are required to generate the index entry
the public key can reuse gw if it has already been computed relevant to w.
for some other log entry. Encryption then simply becomes We note that this method may open up a slight vulner-
a matter of picking a random r and following steps (3) and ability: an attacker may obtain partial information about
(4) of encryption (see explanation of Identity-Based En- the frequency of keywords present in a single block by ob-
cryption above). This speeds up encryption: over a set serving the lengths of the IBE encrypted strings within the
of log entries in which a keyword repeated k times, only index. This can be thwarted by adjusting the block size to
one pairing operation and k modular exponentiations are be small enough to limit the amount of statistical knowl-
required. edge obtained (which, in the limit of t = 1, reduces to the
I NDEXING . Further savings are possible by creating an in- security of the non-indexed solution.)
dex of keywords at periodic intervals in the log, instead of R ANDOMNESS R EUSE . Lastly, we consider an optimiza-
storing IBE encryptions with each log entry. If the system tion for the decryption process. We perform an indepen-
design allows buffering of entries sent to the audit log, then dent IBE encryption to creating the ci corresponding to the
optimization method encryption search/decryption
pairings exponentiations pairings
none t ·v t ·v t ·v
pairing reuse (PR) u t ·v t ·v
indexing u u u
randomness reuse (RR) t ·v t ·v t
PR + RR u t ·v t
all three u u 1
Table 1. Number of compute-intensive operations needed to process a block of t log entries, includ-
ing in total u distinct keywords, with an average of v keywords per log entry.
keywords wi for given log entry. However, it is possible to ings as described in Section 5.3. The cache is implemented
reuse an intermediate result of the IBE encryption process: as a simple hash table which associates the pairing result
we may save the value r chosen in step (3) of the encryp- gw with the keyword w. Every time a keyword w that has
tion that produces c1 to use in calculation of c2 , . . ., cn . As not been seen before is used, the newly computed pairing
long as the wi are distinct keywords, this reuse of the ran- gw is stored in the hash table. Another optimization we
domness produces results indistinguishable from the origi- implemented is the reuse of randomness described in Sec-
nal method. This speeds up decryption, as only one pairing tion 5.3.
is needed for each distinct r chosen. This implies that in- We also implemented the hash chain method of check-
stead of n pairings required to test if any of the ci match a pointing described in Section 4. The audit log server com-
given keyword, only 1 pairing is required. putes the updated value of the hash chain for every audit
O PTIMIZATION S UMMARY Table 1 summarizes the num- log entry it constructs. The current hash value can be read
ber of compute-intensive operations to process (encrypt or at any point in time. A party which reads this value can use
search/decrypt) a block of audit log entries. Table 2 sum- it later to check the integrity of the audit log for all entries
written before the hash checkpoint.
marizes the storage requirements of a block of audit log
Finally, we implemented the tool the investigator uses to
entries.
search the audit log when given a capability. The tool re-
trieves all records from the audit log and searches them one
6. Implementation record at a time. As mentioned above, our implementation
We implemented a database audit log system that creates uses the same randomness for each encryption within an
asymmetrically encrypted and searchable entries. The log- entry. Therefore, searching an entry only requires one IBE
ger is implemented as a MySQL proxy server. The user pairing. In future work, we plan to implement the indexing
signs onto the proxy and makes SQL queries. The proxy algorithm described in Section 5.3.
server, upon receiving a query, logs the query in addition to Our performance measurements were taken on a Pentium
passing it to the MySQL database server. IV processor machine running RedHat Linux 9.0 with 2GB
The proxy was developed on a Linux platform and is of memory. The speed of the processor is 2.8GHz.
multi-threaded so that multiple users can be served simul- We measured the added cost of encryption for each
taneously and that the logging component runs in parallel searchable keyword that is part of the query. If a keyword
with the rest of the system. The audit log server attaches w does not have a corresponding cache entry gw , then the
the date and time to the audit log entry. The log entries are server must hash w into the group G1 , execute a pairing
written to another MySQL database server that is dedicated to compute gw , and also compute a modular exponentia-
to storing audit log entries. tion. The cost of these operations totals 180ms. However,
We used the Stanford Identity-Based Encryption library if there is a cache entry for gw , the server only needs to
[12] for the basic IBE operations5 and the Cryptlib library execute an exponentiation and the cost is 5ms.
[5] as the implementation of the the symmetric encryp- Clearly, the use of the cache is important for efficient op-
tion of the query itself. We parameterize IBE with values eration of the system. We expect that in most applications
p = 1024 and q = 160. We use a 128-bit AES key for the most extracted keywords will have a corresponding pub-
symmetric encryption. lic key in the cache (provided the system has been running
The server software has a cache that is used to reuse pair- for a sufficient amount of time). We also do not anticipate
memory limitations to effect most caches. A 100MB cache
5 We note that an implementation of IBE that is approximately twice
can hold approximately 800, 000 public keys. In compari-
as fast has recently become available as part of the miracl package [10].
optimization method storage requirement (in bits)
none t · (M + v · log2 p + v · nH2 )
indexing t · M + u · (log2 p + nH2 ) + t · v · log2 t
randomness reuse (RR) t · (M + log2 p + v · nH2 )
indexing + RR t · M + log2 p + u · nH2 + t · v · log2 t
Table 2. Storage requirements of a block of t log entries, including in total u distinct keywords, with
an average of v keywords per log entry. M is the average bit length of a log entry, p is the prime used
for IBE operations, and nH2 is the output bit length of the hash function H2 used for IBE operations.
Pairing reuse has no effect on storage requirements.
son the number of entries in the second edition of the Ox- [2] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano.
ford English Dictionary is approximately 300, 000. Searchable public key encryption. Submitted for publica-
The tool which searches the encrypted audit log must tion. See http://eprint.iacr.org/2003/195/.
compute a pairing per entry. This operation takes 81ms. [3] D. Boneh and M. Franklin. Identity-based encryption from
the Weil pairing. In Proc. CRYPTO 01, pages 213–229.
Springer-Verlag, 2001. LNCS 2139.
7. Conclusion [4] E.-J. Goh. Building secure indexes for searching efficiently
Designing a secure audit log is not a trivial task. Apart on encrypted compressed data. Submitted for publication.
from guaranteeing properties such as tamper resistance and See http://eprint.iacr.org/2003/216/.
[5] P. Gutmann. cryptlib. http://www.cs.auckland.
verifiability, the contents of the audit log may itself be con-
ac.nz/˜pgut001/cryptlib/.
sidered sensitive, and need to be protected from unautho- [6] S. Haber and W. Stornetta. How to time-stamp a digital doc-
rized access. ument. In A. Menezes and S. A. Vanstone, editors, Proc.
A natural approach to such protection is to encrypt the CRYPTO 90, pages 437–455. Springer-Verlag, 1991. Lec-
audit log, which needs to be done in such a way that the log ture Notes in Computer Science No. 537.
still remains effectively searchable. We presented a scheme [7] B. Schneier and J. Kelsey. Cryptographic support for se-
in which we use identity-based encryption to protect sym- cure logs on untrusted machines. In Proceedings of the
metric keys that are used to encrypt audit log entries. Priv- 7th USENIX Security Symposium, pages 53–62. USENIX
ileged audit escrow agents can create search capabilities Press, 1998.
[8] B. Schneier and J. Kelsey. Minimizing bandwidth for re-
that allow their bearer to search the audit log for records
mote access to cryptographically protected audit logs. In
matching certain keywords. Web Proceedings of the 2nd International Workshop on
We implemented our scheme as a secure audit log for Recent Advances in Intrusion Detection. USENIX Press,
MySQL database queries. It turns out that the identity- 1999.
based encryption scheme we use introduces considerable [9] B. Schneier and J. Kelsey. Secure audit logs to support com-
overhead (although small enough to be negligible in an in- puter forensics. ACM Transactions on Information and Sys-
teractive system), but it buys us security and convenience tem Security (TISSEC), 2(2):159–176, 1999.
over symmetric key based schemes. [10] Shamus Software Ltd. MIRACL: Multiprecision Inte-
Our current implementation relies on checkpointing to ger and Rational Arithmetic C/C++ Library. http://
secure the integrity and verifiability of the audit log. While indigo.ie/˜mscott/.
[11] D. X. Song, D. Wagner, and A. Perrig. Practical techniques
the focus of our work so far has been to investigate the
for searches on encrypted data. In IEEE Symposium on Se-
searchability of the audit log, we plan to implement more curity and Privacy, pages 44–55, 2000.
advanced integrity protection mechanisms to improve the [12] Stanford Applied Cryptography Group. IBE secure e-mail.
overall security of the system. http://crypto.stanford.edu/ibe.
8. Acknowledgments
This work was sponsored by DARPA grant F30602-03-
C-0037.
References
[1] M. Bellare, A. Boldyreva, A. Desai, and D. Pointcheval.
Key-privacy in public-key encryption. Lecture Notes in
Computer Science, 2248, 2001.