Popets 2019 0056
Popets 2019 0056
Popets 2019 0056
Kirill Nikitin*, Ludovic Barman*, Wouter Lueks, Matthew Underwood, Jean-Pierre Hubaux und
Bryan Ford
and is padded to minimize information leakage via its of costly public-key operations, however, decoding per-
length while minimizing space overhead. Unlike tradi- formance is comparable to PGP, and is almost indepen-
tional formats, a PURB does not leak the encryption dent of the number of recipients (up to 10,000).
schemes used, who or how many recipients can decrypt Analysis of real-world data sets show that many ob-
it, or what application or software version created it. jects are trivially identifiable by their unique sizes with-
While simple in concept, because PURBs by definition out padding, or even after padding to a fixed block size
contain no cleartext structure or markers, encoding and (e.g., that of a block cipher or a Tor cell). We show that
decoding them efficiently presents practical challenges. Padmé can significantly reduce the number of objects
This paper’s first key contribution is Multi-Suite uniquely identifiable by their sizes: from 83% to 3% for
PURB or MsPURB, a cryptographically agile PURB 56k Ubuntu packages, from 87% to 3% for 191k Youtube
encoding scheme that supports any number of recipi- videos, from 45% to 8% for 848k hard-drive user files,
ents, who can use either shared passwords or public- and from 68% to 6% for 2.8k websites from the Alexa
private key pairs utilizing multiple cryptographic suites. top 1M list. This much stronger leakage protection in-
The main technical challenge is providing efficient de- curs an average space overhead of only 3%.
cryption to recipients without leaving any cleartext In summary, our main contributions are as follows:
markers. If efficiency was of no concern, the sender could – We introduce MsPURB, a novel encrypted data for-
simply discard all metadata and expect the recipient to mat that reveals no metadata information to ob-
parse and trial-decrypt the payload using every possi- servers without decryption keys, while efficiently
ble format version, structure, and cipher suite. Real- supporting multiple recipients and cipher suites.
world adoption requires both decryption efficiency and – We introduce Padmé, a padding scheme that
cryptographic agility, however. MsPURB combines a asymptotically minimizes information leakage from
variable-length header containing encrypted metadata data lengths while also limiting size overheads.
with a symmetrically-encrypted payload. The header’s – We implement these encoding and padding schemes,
structure enables efficient decoding by legitimate recip- evaluating the former’s performance against PGP
ients via a small number of trial decryptions. MsPURB and the latter’s efficiency on real-world data.
facilitates the seamless addition and removal of sup-
ported cipher suites, while leaking no information to
third parties without a decryption key. We construct
our scheme starting with the standard construction of 2 Motivation and Background
the Integrated Encryption Scheme (IES) [2] and use the
ideas of multi-recipient public-key encryption [7, 34] as We first offer example scenarios in which PURBs may
a part of the multi-recipient development. be useful, and summarize the Integrated Encryption
To reduce information leakage from data lengths, Scheme that we later use as a design starting point.
this paper’s second main contribution is Padmé, a
padding scheme that groups encrypted PURBs into
indistinguishability sets whose visible lengths are rep- 2.1 Motivation and Applications
resentable as limited-precision floating-point numbers.
Like obvious alternatives such as padding to the next Our goal is to define a generic method applicable to
power of two, Padmé reduces maximum information most of the common data-encryption scenarios such
leakage to O(log log M ) bits, where M is the maximum that the techniques are flexible to the application type,
length of encrypted blob a user or application produces. to the cryptographic algorithms used, and to the num-
Padmé greatly reduces constant-factor overhead with ber of participants involved. We also seek to enhance
respect to obvious alternatives, however, enlarging files plausible deniability such that a user can deny that a
by at most +12%, and less as file size increases. PURB is created by a given application or that the user
In our evaluation, creating a MsPURB ciphertext owns the key to decrypt it. We envision several immedi-
takes 235 ms for 100 recipients on consumer-grade hard- ate applications that could benefit from using PURBs.
ware using 10 different cipher suites, and takes only E-mail Protection. E-mail systems traditionally use
8 ms for the common single-recipient single-suite sce- PGP or S/MIME for encryption. Their packet for-
nario. Our implementation is in pure Go without assem- mats [14], however, exposes format version, encryption
bly optimizations that might speed up public-key oper- methods, number and public-key identities of the recip-
ations. Because the MsPURB design limits the number ients, and public-key algorithms used. In addition, the
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 8
payload is padded only to the block size of a symmetric- 2.2 Integrated Encryption Scheme
key algorithm used, which does not provide “size pri-
vacy”, as we show in §5.3. Using PURBs for encrypted The Integrated Encryption Scheme (IES) [2] is a hybrid
e-mail could minimize this metadata leakage. Further- encryption scheme that enables the encryption of arbi-
more, as e-mail traffic is normally sparse, the moderate trary message strings (unlike ElGamal, which requires
overhead PURBs incur can easily be accommodated. the message to be a group element), and offers flexibility
in underlying primitives. To send an encrypted message,
Initiation of Cryptographic Protocols. In most
a sender first generates an ephemeral Diffie-Hellman key
cryptographic protocols, initial cipher suite negotiation,
pair and uses the public key of the recipient to derive a
handshaking, and key exchange are normally performed
shared secret. The choice of the Diffie-Hellman group is
unencrypted. In TLS 1.2 [20], an eavesdropper who mon-
flexible, e.g., multiplicative groups of integers or elliptic
itors a connection from the start can learn many details
curves. The sender then relies on a cryptographic hash
such as cryptographic schemes used. The unencrypted
function to derive the shared keys used to encrypt the
Server Name Indication (SNI) enables an eavesdropper
message with a symmetric-key cipher and to compute
to determine which specific web site a client is con-
a MAC using the encrypt-then-MAC approach. The re-
nected to among the sites hosted by the same server.
sulting ciphertext is structured as shown in Figure 1.
The eavesdropper can also fingerprint the client [46] or
distinguish censorship-circumvention tools that try to
mimic TLS traffic [23, 29]. TLS 1.3 [45] takes a few pro- pks enc(M ) σmac
tective measures: e.g., less unencrypted metadata dur-
ing the handshake, and an experimental extension for Fig. 1. Ciphertext output of the Integrated Encryption Scheme
where pks is an ephemeral public key of the sender, and σmac and
encrypted SNI [45, 47]. These measures are only par-
enc(M ) are generated using the DH-derived keys.
tial, however, and leave other metadata, such as proto-
col version number, cipher suites, and public keys, still
visible. PURBs could facilitate fully-encrypted hand-
shaking from the start, provided a client already knows
at least one public key and cipher suite the server sup- 3 Hiding Encryption Metadata
ports. Clients might cache this information from prior
connections, or obtain it out-of-band while finding the This section addresses the challenges of encoding and
server, e.g., via DNS-based authentication [28]. decoding Padded Uniform Random Blobs or PURBs in
a flexible, efficient, and cryptographically agile way. We
Encrypted Disk Volumes. VeraCrypt [30] uses a
first cover notation, system and threat models, followed
block cipher to turn a disk partition into an encrypted
by a sequence of strawman approaches that address dif-
volume where the partition’s free space is filled with
ferent challenges on the path towards the full MsPURB
random bits. For plausible deniability and coercion pro-
scheme. We start with a scheme where ciphertexts are
tection, VeraCrypt supports so-called hidden volumes:
encrypted with a shared secret and addressed to a sin-
an encrypted volume whose content and metadata is
gle recipient. We then improve it to support public-key
indistinguishable from the free space of a primary en-
operations with a single cipher suite, and finally to mul-
crypted volume hosting the hidden volume. This protec-
tiple recipients and multiple cipher suites.
tion is limited, however, because a primary volume can
host only a single hidden volume. A potential coercer
might therefore assume by default that the coercee has
3.1 Preliminaries
a hidden volume, and interpret a claim of non-possession
of the decryption keys as a refusal to provide them. Let λ be a standard security parameter. We use $ to
PURBs might enhance coercion protection by enabling $
indicate randomness, ← to denote random sampling,
an encrypted volume to contain any number of hidden
k to denote string concatenation and |value| to denote
volumes, facilitating a stronger “N + 1” defense. Even if
the bit-length of “value”. We write PPT as an abbrevi-
a coercee reveals up to N “decoy” volumes, the coercer
ation for probabilistic polynomial-time. Let Π = (E, D)
cannot know whether there are any more.
be an ind$-cca2-secure authenticated-encryption (AE)
scheme [8] where EK (m) and DK (c) are encryption and
decryption algorithms, respectively, given a message
m, a ciphertext c, and a key K. Let MAC = (M, V)
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 9
be strongly unforgeable Message Authentication Code 1. An outsider adversary who does not hold a private
(MAC) generation and verification algorithms. An au- key or a password valid for decryption;
thentication tag generated by MAC must be indistin- 2. An insider adversary who is a “curious” and active
guishable from a random bit string. legitimate recipient with a valid decryption key.
Let G be a cyclic finite group of prime order p Both adversaries are adaptive.
generated by the group element g where the gap-CDH Naturally, the latter adversary has more power, e.g., she
problem is hard to solve (e.g., an elliptic curve or a can recover the plaintext payload. Hence, we consider
multiplicative group of integers modulo a large prime). different security goals given the adversary type:
Let Hide : G(1λ ) → {0, 1}λ be a mapping that en- 1. We seek ind$-cca2 security against the outsider ad-
codes a group element of G to a binary string that versary, i.e., the encoded content and all metadata
is indistinguishable from a uniform random bit string must be indistinguishable from random bits under
(e.g., Elligator [10], Elligator Squared [3, 51]). Let an adaptive chosen-ciphertext attack;
Unhide: {0, 1}λ → G(1λ ) be the counterpart to Hide 2. We seek recipient privacy [4] against the insider ad-
that decodes a binary string into a group element of G. versary under a chosen-plaintext attack, i.e., a re-
Let H : G → {0, 1}2λ and Ĥ : {0, 1}∗ → {0, 1}2λ be cipient must not be able to determine the identities
two distinct cryptographic hash functions. Let PBKDF : of the ciphertext’s other recipients.
{salt, password} → {0, 1}2λ be a secure password-based Recipient privacy is a generalization of the key indistin-
key-derivation function [11, 33, 41], a “slow” hash func- guishability notion [6] where an adversary is unable to
tion that converts a salt and a password into a bit string determine whether a given public key has been used for
that can be used as a key for symmetric encryption. a given encryption.
Let data be an application-level unit of data (e.g., a We wish to achieve two system goals beyond security:
file or network message). A sender wants to send an – PURBs must provide cryptographic agility. They
encrypted version of data to one or more recipients. We should accommodate either one or multiple recip-
consider two main approaches for secure data exchanges: ients, allow encryption for each recipient using a
(1) Via pre-shared secrets, where the sender shares shared password or a public key, and support dif-
with the recipients long-term one-to-one passphrases ferent cipher suites. Adding new cipher suites must
Ŝ1 , ..., Ŝr that the participants can use in a password- be seamless and must not affect or break backward
hashing scheme to derive ephemeral secrets S1 , ..., Sr . compatibility with other cipher suites.
(2) Via public-key cryptography, where sender and – PURBs’ encoding and decoding must be “reason-
recipients derive ephemeral secrets Zi = H(X yi ) = ably” efficient. In particular, the number of expen-
H(Yi x ) using a hash function H. Here (x, X = g x ) de- sive public-key operations should be minimized, and
notes the sender’s one-time (private, public) key pair padding must not impose excessive space overhead.
and (yi , Yi = g yi ) is the key pair of recipient i ∈ 1, ..., r.
In both scenarios, the sender uses ephemeral secrets
S1 , ..., Sr or Z1 , ..., Zr to encrypt (parts of) the PURB 3.2 Encryption to a Single Passphrase
header using an authenticated encryption (AE) scheme.
We refer to a tuple S = hG, p, g, Hide(·), Π, H, Ĥi We begin with a simple strawman PURB encoding for-
used in the PURB generation as a cipher suite. This mat allowing a sender to encrypt data using a single
can be considered similar to the notion of a cipher suite long-term passphrase Ŝ shared with a single recipient
in TLS [20]. Replacing any component of a suite (e.g., (e.g., out-of-band via a secure channel). The sender and
the group) results in a different cipher suite. recipient use an agreed-upon cipher suite defining the
scheme’s components. The sender first generates a fresh
symmetric session key K and computes the PURB pay-
3.1.2 Threat Model and Security Goals load as EK (data). The sender then generates a random
salt and derives the ephemeral secret S = PBKDF(salt, Ŝ).
We will consider two different types of computationally The sender next creates an entry point (EP ) containing
bounded adversaries: the session key K, the position of the payload and po-
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 10
tentially other metadata. The sender then encrypts the 3.4 Multiple Public Keys, Single Suite
EP using S. Finally, the sender concatenates the three
segments to form the PURB as shown in Figure 2. We often wish to encrypt a message to several recipients,
e.g., in multicast communication or mobile group chat.
We hence add support for encrypting one message under
salt ES (K k meta) EK (data)
multiple public keys that are of the same suite.
entry point payload As the first step, we adopt the idea of multi-recipient
public-key encryption [7, 34] where the sender gener-
Fig. 2. A PURB addressed to a single recipient and encrypted with
ates a single key pair and uses it to derive an ephemeral
a passphrase-derived ephemeral secret S.
secret with each of the intended recipients. The sender
creates one entry point per recipient. These entry points
contain the same session key and metadata but are en-
crypted with different ephemeral secrets.
3.3 Single Public Key, Single Suite As a PURB’s purpose is to prevent metadata leak-
age, including the number of recipients, a PURB cannot
We often prefer to use public-key cryptography, instead reveal how many entry points exist in the header. Yet
of pre-shared secrets, to establish secure communication a legitimate recipient needs to have a way to enumer-
or encrypt data at rest. Typically the sender or initiator ate possible candidates for her entry point. Hence, the
indicates in the file’s cleartext metadata which public primary challenge is to find a space-efficient layout of
key this file is encrypted for (e.g., in PGP), or else par- entry points—with no cleartext markers—such that the
ties exchange public-key certificates in cleartext during recipients are able to find their segments efficiently.
communication setup (e.g., in TLS). Both approaches Linear Table. The most space-efficient approach is to
generally leak the receiver’s identity. We address this place entry points sequentially. In fact, OpenPGP sug-
use case with a second strawman PURB encoding for- gests a similar approach for achieving better privacy [14,
mat that builds on the last by enabling the decryption Section 5.1]. However, in this case, decryption is inef-
of an entry point EP using a private key. ficient: the recipients have to attempt sequentially to
To expand our scheme to the public-key scenario, decrypt each potential entry point, before finding their
we adopt the idea of a hybrid asymmetric-symmetric own or reaching the end of the PURB.
scheme from the IES (see §2.2). Let (y, Y ) denote the
Fixed Hash Tables. A more computationally-efficient
recipient’s key pair. The sender generates an ephemeral
approach is to use a hash table of a fixed size. The sender
key pair (x, X), computes the ephemeral secret Z =
creates a hash table and places each encrypted entry
H(Y x ), then proceeds as before, except it encrypts K
point there, identifying the corresponding position by
and associated metadata with Z instead of S. The
hashing an ephemeral secret. Once all the entry points
sender replaces the salt in the PURB with her encoded
are placed, the remaining slots are filled with random
ephemeral public key Hide(X), where Hide(·) maps a
bit strings, hence a third-party is unable to deduce the
group element to a uniform random bit string. The re-
number of recipients. The upper bound, corresponding
sulting PURB structure is shown in Figure 3.
to the size of the hash table, is public information. This
approach, however, yields significant space overhead: in
Hide(X) EZ (K k meta) EK (data) the common case of a single recipient, all the unpopu-
encoded pk entry point payload lated slots are filled with random bits but still transmit-
ted. This approach also has the downside of imposing
Fig. 3. A PURB addressed to a single recipient that uses a public an artificial limit on the number of recipients.
key Y , where X is the public key of the sender and Z = H(Y x ) is
Expanding Hash Tables. We therefore include not
the ephemeral secret.
one but a sequence of hash tables whose sizes are consec-
utive powers of two. Immediately following the encoded
public key, the sender encodes a hash table of length
one, followed (if needed) by a hash table of length two,
one of length four, etc., until all the entry points are
placed. Unpopulated slots are filled with random bits.
To decrypt a PURB, a recipient decodes the public key
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 11
X, derives the ephemeral secret, computes the hash in- With r recipients, the worst-case compactness is hav-
dex in the first table (which is always zero), and tries ing r hash tables (if each insertion leads to a collision),
to decrypt the corresponding entry point. On failure, which happens with exponentially decreasing probabil-
the recipient moves to the second hash table, seeks the ity. The expected number of trial decryptions is log2 r.
correct position and tries again, and so on.
Definitions. We now formalize this scheme. Let r be
the number of recipients and (y1 , Y1 ), . . . , (yr , Yr ) be 3.5 Multiple Public Keys and Suites
their corresponding key pairs. The sender generates a
In the real world, not all data recipients’ keys might use
fresh key pair (x, X) and computes one ephemeral secret
the same cipher suite. For example, users might prefer
ki = H(Yi x ) per recipient. The sender uses a second hash
different key lengths or might use public-key algorithms
function Ĥ to derive independent encryption keys as
in different groups. Further, we must be able to intro-
Zi = Ĥ(“key” k ki ) and position keys as Pi = Ĥ(“pos” k
duce new cipher suites gradually, often requiring larger
ki ). Then the sender encrypts the data and creates r en-
and differently-structured keys and ciphertexts, while
try points EZ1 (K, meta), ..., EZr (K, meta). The position
preserving interoperability and compatibility with old
of an entry in a hash table j is (Pi mod 2j ). The sender
cipher suites. We therefore build on the above strawman
iteratively tries to place an entry point in HT0 (hash ta-
schemes to produce Multi-Suite PURB or MsPURB,
ble 0), then in HT1, and so on, until placement succeeds
which offers cryptographic agility by supporting the en-
(i.e., no collision occurs). If placement fails in the last
cryption of data for multiple different cipher suites.
existing hash table HTj, the sender appends another
When a PURB is multi-suite encrypted, the recipi-
hash table HT(j + 1) of size 2j+1 and places the entry
ents need a way to learn whether a given suite has been
point there. An example of a PURB encrypted for five
used and where the encoded public key of this suite is
recipients is illustrated in Figure 4.
located in the PURB. There are two obvious approaches
to enabling recipients to locate encoded public keys for
encoded pk HT0 HT1 HT2 payload multiple cipher suites: to pack the public keys linearly at
the beginning of a PURB, or to define a fixed byte posi-
Hide(X) EZ1 (K) EZ3 (K) EZ4 (K) EK (data)
tion for each cipher suite. Both approaches incur unde-
EZ2 (K) random sirable overhead. In the former case, the recipients have
EZ5 (K) to check all possible byte ranges, performing an expen-
sive public-key operation for each. The latter approach
random
results in significant space overhead and lack of agility,
Fig. 4. A PURB with hash tables of increasing sizes (HT0, HT1, as unused fixed positions must be filled with random
HT2). Five and two slots of the hash tables are filled with entry bits, and adding new cipher suites requires either assign-
points and random bit strings respectively. The metadata “meta” ing progressively larger fixed positions or compatibility-
in the entry points is omitted from the figure. Hash-table entries breaking position changes to existing suites.
are put one after another in the byte representation of a PURB.
Set of Standard Positions. To address this chal-
lenge, we introduce a set of standard byte positions
To decode, a recipient reads the public key; derives per suite. These sets are public and standardized for
the ephemeral secret ki , the encryption key Zi and the all PURBs. The set refers to positions where the suite’s
position key Pi ; and iteratively tries matching positions public key could be in the PURB. For instance, let us
in hash tables until the decryption of the entry point consider a suite PURB_X25519_AES128GCM_SHA256. We
succeeds. Although the recipient does not initially know can define—arbitrarily for now—the set of positions as
the number of hash tables in a PURB, the recipient {0, 64, 128, 1024}. As the length of the encoded pub-
needs to do only a single expensive public-key operation, lic key is fully defined by the suite (32 bytes here, as
and the rest are inexpensive symmetric-key decryption Curve25519 is used), the recipients will iteratively try
trials. In the worst case of a small message encrypted to decode a public key at [0:32), then [64:96), etc.
to many recipients, or a non-recipient searching for a If the sender wants to encode a PURB for two suites
nonexistent entry point, the total number of trial de- A and B, she needs to find one position in each set
cryptions required is logarithmic in the PURB’s size. such that the public keys do not overlap. For instance,
In the common case of a single recipient, only a sin- if setA = {0, 128, 256} and setB = {0, 32, 64, 128}, and
gle hash table of size 1 exists, and the header is compact. the public keys’ lengths are 64 and 32, respectively, one
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 12
possible choice would be to put the public key for suite the last encoded public key or hash table, and its start
A in [0:64), and the public key for suite B in [64:96). position is recorded in the meta in each entry point.
All suites typically have position 0 in their set, so that Decoding Efficiency. We have not yet achieved our
in the common case of a PURB encoded for only one decoding efficiency goal, however: the recipient must
suite, the encoded public key is at the beginning of the perform several expensive public-key operations for each
PURB for maximum space efficiency. Figure 5 illus- cipher suite, one for each potential position until the
trates an example encoding. With well-designed sets, correct position is found. We reduce this overhead to a
in which each new cipher suite is assigned at least one single public-key operation per suite by removing the
position not overlapping with those assigned to prior recipient’s need to know in which of the suite positions
suites, the sender can encode a PURB for any subset of the public key was actually placed. To accomplish this,
the suites. We address efficiency hereunder, and provide a sender XORs bytes at all the suite positions and places
a concrete example with real suites in Appendix B. the result into one of them. The sender first constructs
the whole PURB as before, then she substitutes the
encoded pkA HT0 HT1 HT2 payload bytes of the already-written encoded public key with
the XOR of bytes at all the defined suite positions (if
Hide(XA ) rnd Hide(XB ) EZ2 (K) EK (data)
they do not exceed the PURB length), which could even
EZ1 (K) random correspond to encrypted payload. To decode a PURB,
EZ3 (K) a recipient starts by reading and XORing the values at
all the positions defined for a suite. This results in an
random
encoded public key, if that suite was used in this PURB.
Fig. 5. Example of a PURB encoded for three public keys in two Encryption Flexibility. Although multiple cipher
suites (suite A and B). The sender generates one ephemeral key suites can be used in a PURB, so far these suites must
pair per suite (XA and XB ). In this example, XA is placed at the
agree on one payload encryption scheme, as a payload
first allowed position, and XB moves to the second allowed position
appears only once. To lift this constraint, we decou-
(since the first position is taken by suite A). Those positions are
public and fixed for each suite. HT0 cannot be used for storing an ple encryption schemes for entry points and payloads.
entry point, as XA partially occupies it; HT0 is considered “full” An entry-point encryption scheme is a part of a cipher
and the entry point is placed in subsequent hash tables - here HT1. suite, whereas a payload encryption scheme is indicated
separately in the metadata “meta” in each entry point.
MAC algorithm in the meta in the entry points, along Algorithms HdrPURB.
with the payload encryption scheme that now does not HdrPURB.Encap(R, S) → (τ, k1 , . . . , kr ): Given a set of
need to be authenticated. The sender places the tag public keys R = {pk1 = Y1 , . . . , pkr = Yr } of a suite S:
at the very end of the PURB, which covers the entire (1) Pick a fresh x ∈ Zp and compute X = g x where
PURB including encoded public keys, entry point hash p, g are defined in S.
tables, payload ciphertext, and any padding required. (2) Derive k1 = H(Y1x ), . . . , kr = H(Yrx ).
Because the final authentication tag covers the en- (3) Map X to a uniform string τX = Hide(X).
tire PURB, the sender must calculate it after all other (4) Output an encoded public key τ = τX and
PURB content is finalized, including the XOR-encoding k1 , . . . , kr .
of all the suites’ public key positions. Filling in the tag HdrPURB.Decap(sk(S), τ ) → k: Given a private key
would present a problem, however, if the tag’s position sk = y of a suite S and an encoded public key τ :
happened to overlap with one of the public key positions (1) Retrieve X = Unhide(τ ).
of some cipher suite, because filling in the tag would cor- (2) Compute and output k = H(X y ).
rupt the suite’s XOR-encoded public key. To handle this
situation, the sender is responsible for ensuring that the Algorithms MsPURB.
authentication tag does not fall into any of the possible MsPURB.Setup(1λ ) → S: Initialize a cipher suite S =
public key positions for the cipher suites in use. hG, p, g, Hide(·), Π, H, Ĥi.
To encode a PURB, a sender prepares entry points, MsPURB.KeyGen(S) → (sk, pk): Given a suite S =
lays out the header, encrypts the payload, adds padding hG, p, g, . . .i, pick x ∈ Zp and compute X = g x . Out-
(see §4), and computes the PURB’s total length. If any put (sk = x, pk = X).
of the byte positions of the authentication tag to be ap- MsPURB.Enc(R, m) → c: Given a set of public keys of
pended overlap with public key positions, the sender in- an indicated suite R = {pk1 (S1 ), . . . , pkr (Sr )} and a
creases the padding to next bracket, until the public-key message m:
positions and the tag are disjoint. The sender proceeds (1) Pick an appropriate symmetric-key encryption
with XOR-encoding all suites’ public keys, and comput- scheme (Enc, Dec) with key length λK , a MAC
ing and appending the tag. Upon receipt of a PURB, algorithm MAC = (M, V), and a hash function
a decoder computes the potential public keys, finds and H0 : {0, 1}∗ → {0, 1}λK such that the key length
decrypts her entry point, learns the decryption scheme λK matches the security level of the most con-
and the MAC algorithm with the size of its tag. She then servative suite.
verifies the PURB’s integrity and decrypts the payload. (2) Group R into R1 , . . . , Rn , s.t. all public keys in a
group Ri share the same suite Si . Let ri = |Ri |.
(3) For each Ri :
3.7 Complete Algorithms (a) Run (τi , k1 , . . . , kri ) = HdrPURB.Encap(Ri , Si );
(b) Compute entry-point keys keysi = (Z1 =
We summarize the encoding scheme by giving detailed Ĥ(“key” k k1 ), . . . , Zri = Ĥ(“key” k kri ))
algorithms. We begin by defining helper HdrPURB al- and positions auxi = (P1 = Ĥ(“pos” k
gorithms that encode and decode a PURB header’s data k1 ), . . . , Pri = Ĥ(“pos” k kri )).
$
for a single cipher suite. We then use these algorithms (4) Pick K ← {0, 1}λK .
in defining the final MsPURB encoding scheme. (5) Record (Enc, Dec), MAC and H0 in meta.
Recall the notion of a cipher suite S = (6) Compute a payload key Kenc = H0 (“enc” k K)
hG, p, g, Hide(·), Π, H, Ĥi, where G is a cyclic group of and a MAC key Kmac = H0 (“mac” k K).
order p generated by g; Hide is a mapping: G → {0, 1}λ ; (7) Obtain cpayload = EncKenc (m).
Π = (E, D) is an authenticated-encryption scheme; and (8) Run c0 ← Layout(τ1 , . . . , τn , keys1 , . . . , keysn ,
H : G → {0, 1}2λ , Ĥ : {0, 1}∗ → {0, 1}2λ are two distinct aux1 , . . . , auxn , S1 , . . . , Sn , K, meta, cpayload ) (see
cryptographic hash functions. Let sk and pk be a private Algorithm 2 on page 26).
key and a public key, respectively, for hG, p, gi defined in (9) Derive an authentication tag σ = MKmac (c0 ) and
a cipher suite. We then define the full HdrPURB and output c = c0 k σ.
MsPURB algorithms as follows: MsPURB.Dec(sk(S), c) → m/⊥: Given a private key sk
of a suite S and a ciphertext c:
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 14
(1) Look up the possible positions of a public key 3.8 Practical Considerations
defined by S and XOR bytes at all the positions
to obtain the encoded public key τ . Cryptographic agility (i.e., changing the encryption
(2) Run k ← HdrPURB.Decap(sk, τ ). scheme) for the payload is provided by the metadata
(3) Derive Z = Ĥ(“key” k k) and P = Ĥ(“pos” k k). embedded in the entry points. For entry points them-
(4) Parse c as growing hash tables and, using the selves, we recall that the recipient uses trial-decryption
secret Z as the key, trial-decrypt the entries de- and iteratively tests suites from a known, public, or-
fined by P to obtain K k meta. If no decryption dered list. To add a new suite, it suffices to add it to
is successful, return ⊥. this list. With this technique, a PURB does not need
(5) Look up the hash function H0 , a MAC = (M, V) version numbers. There is, however, a trade-off between
algorithm and the length of MAC output tag σ the number of supported suites and the maximum de-
from meta. Parse c as hc0 k σi. Derive Kmac = cryption time. It is important that a sender follows the
H0 (“mac” k K) and run VKmac (c0 , σ). On failure, fixed order of the cipher suites during encoding because
return ⊥. a varying order might result in a different header length,
(6) Derive Kenc = H0 (“enc” k K), read the start and given the same set of recipients and sender’s ephemeral
the end of the payload from meta (it is written by keys, which could be used by an insider adversary.
Layout) to parse c0 as hhdr k cpayload k paddingi, If a nonce-based authenticated-encryption scheme is
and return DecKenc (cpayload ) where Dec is the used for entry points, a sender needs to include a distinct
payload decryption algorithm specified in meta. random nonce as a part of entry-point ciphertext (the
nonce of each entry point must be unique per PURB).
Theorem 1. If for each cipher suite S = hG, p, g, Some schemes, e.g., AES-GCM [9], have been shown to
Hide(·), Π, H, Ĥi used in a PURB we have that: the gap- retain their security when the same nonce is reused with
CDH problem is hard relative to G, Hide maps group different keys. When such a scheme is used, there can
elements in G to uniform random strings, Π is ind$- be a single global nonce to reuse by each entry point.
cca2-secure, and H, Ĥ and H0 are modeled as a ran- However, generalizing this approach of a global nonce
dom oracle; and moreover that MAC is strongly unforge- to any scheme requires further analysis.
able with its MACs being indistinguishable from ran-
Hardening Recipient Privacy. The given instantia-
dom, and the scheme for payload encryption (Enc, Dec)
tion of MsPURB provides recipient privacy only un-
is ind$-cpa-secure, then MsPURB is ind$-cca2-secure
der a chosen-plaintext attack. If information about de-
against an outsider adversary.
cryption success is leaked, an insider adversary could
learn identities of other recipients of a PURB by al-
Proof. See Appendix D.2.
tering the header, recomputing the MAC, and querying
Theorem 1 also implies that an outsider adversary can- candidates. A possible approach to achieving ind$-cca2
not break recipient privacy under an ind$-cca2 attack, recipient privacy is to sign a complete PURB using a
as long as the two possible sets of recipients N0 , N1 in- strongly existentially unforgeable signature scheme and
duce the same distribution on the length of a PURB. to store the verification key in each entry point, as simi-
larly done in the broadcast-encryption scheme by Barth
Theorem 2. If for each cipher suite S = hG, p, g, et al. [4]. This approach, however, requires adaptation to
Hide(·), Π, H, Ĥi, used in a PURB we have that: the gap- the multi-suite settings, and it will result in a significant
CDH problem is hard relative to G, Hide maps group increase of the header size and decrease in efficiency. We
elements in G to uniform random strings, Π is ind$- leave this question for future work.
cca2-secure, H and Ĥ are modeled as a random oracle,
Limitations. The MsPURB scheme above is not se-
and the order in which cipher suites are used for encod-
cure against quantum computers, as it relies on discrete
ing is fixed; then MsPURB is recipient-private against
logarithm hardness. It is theoretically possible to sub-
an ind$-cpa insider adversary.
stitute IES-based key encapsulation with a quantum-
Proof. See Appendix D.3. resistant variant to achieve quantum ind$-cca2 security.
The requirements for substitution are ind$-cca2 secu-
rity and compactness (it must be possible to securely
reuse sender’s public key to derive shared secrets with
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 15
multiple recipients). Furthermore, as MsPURB is non- bits (i.e., information) in its mantissa than in its ex-
interactive, they do not offer forward secrecy. ponent. This constraint limits information leakage to
Simply by looking at the sizes (of the header for at most double that of conservatively padding to the
a malicious insider, or the total size for a malicious next power of two, while reducing overhead through
outsider), an adversary can infer a bound on the to- logarithmically-increasing precision for larger objects.
tal number of recipients. We partially address this with Many defenses already exist for specific scenarios,
padding in §4. However, no reasonable padding scheme e.g., against website fingerprinting [21, 58]. Padmé does
can perfectly hide this information. If this is a problem not attempt to compete with tailored solutions in their
in practice, we suggest adding dummy recipients. domains. Instead, Padmé aims for a substantial increase
Protecting concrete implementations against timing in application-independent length leakage protection as
attacks is a highly challenging task. The two following a generic measure of security/privacy hygiene.
properties are required for basic hardening. First, the
implementations of PURBs should always attempt to
decrypt all potential entry points using all the recipi- 4.1 Design Criterion
ent’s suites. Second, decryption errors of any source as
well as inability to recover the payload should be pro- We design Padmé again using intermediate strawman
cessed in constant time and always return ⊥. approaches for clarity. To compare these straightforward
alternatives with our proposal, we define a game where
an adversary guesses the plaintext behind a padded en-
crypted blob. This game is inspired by related work such
4 Limiting Leakage via Length as defending against a perfect attacker [58].
The encoding scheme presented above in §3 produces Padding Game. Let P denote a collection of plaintext
blobs of data that are indistinguishable from random objects of maximum length M : e.g., data, documents, or
bit-strings of the same length, thus leaking no infor- application data units. An honest user chooses a plain-
mation to the adversary directly via their content. The text p ∈ P , then pads and encodes it into a PURB
length itself, however, might indirectly reveal informa- c. The adversary knows almost everything: all possible
tion about the content. Such leakage is already used ex- plaintexts P , the PURB c and the parameters used to
tensively in traffic-analysis attacks, e.g., website finger- generate it, such as schemes and number of recipients.
printing [21, 39, 56, 57], video identification [43, 44, 50], The adversary lacks only the private inputs and decryp-
and VoIP traffic fingerprinting [15, 61]. Although so- tion keys for c. The adversary’s goal is to guess the
lutions involving application- or network-level padding plaintext p based on the observed PURB c of length |c|.
are numerous, they are typically designed for a specific Design Goals. Our goal in designing the padding func-
problem domain, and the more basic problem of length- tion is to manage both space overhead from padding and
leaking ciphertexts remains. In any practical solution, maximum information leaked to the adversary.
some leakage is unavoidable. We show, however, that
typical approaches such as padding to the size of a block
cipher are fundamentally insufficient for efficiently hid- 4.2 Definitions
ing the plaintext length effectively, especially for plain-
texts that may vary in size by orders of magnitude. Overhead. Let c be a padded ciphertext resulting from
We introduce Padmé, a novel padding scheme de- PURB-encoding plaintext p. For simplicity we focus
signed for, though not restricted to, encoding PURBs. here purely on overhead incurred by padding, by assum-
Padmé reduces length leakage for a wide range of en- ing an unrealistic, “perfectly-efficient” PURB encoding
crypted data types, ensuring asymptotically lower leak- that (unlike MsPURB) incurs no space overhead for en-
age of O(log log M ), rather than O(log M ) for com- cryption metadata. We define the additive overhead of
mon stream- and block-cipher-encrypted data. Padmé’s |c| over |p| to be |c|−|p|, the number of extra bytes added
space overhead is moderate, always less than 12% and by padding. The multiplicative overhead of padding is
decreasing with file size. The intuition behind Padmé |c|−|p|
|p| , the relative fraction by which |c| expands |p|.
is to pad objects to lengths representable as limited-
Leakage. Let P be a finite space of plaintexts of max-
precision floating-point numbers. A Padmé length is
imum length M . Let f : N → N be a padding function
constrained in particular to have no more significant
that yields the padded size |c| given a plaintext length
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 16
|p|, for p ∈ P . The image of f is a set R of padded overhead, whereas for larger files, blocks are larger and
lengths that f can produce from plaintexts p ∈ P . group more plaintext lengths together, improving leak-
We quantify the leakage of padding function f in terms age asymptotically. A simple approach is to pad plain-
of the number of elements in R. More precisely, we define texts into buckets bi of size varying as a power of some
the leakage as the number of bits (amount of informa- base, e.g., two, so bi = 2i . The padding function is thus
tion entropy) required to distinguish a unique element of f (L) = 2dlog2 Le . We call this strawman NextP2.
R, which is dlog2 |R|e. Intuitively, a function that pads Because NextP2 pads plaintexts of maximum
everything to a constant size larger than all plaintexts length M into at most dlog2 M e buckets, the image R
(e.g., f (p) = 1 Tb) leaks no information to the adver- of f contains only O(log M ) elements. This represents
sary, because |R| = 1 (and observing |c| = 1 Tb leaks only O(log log M ) bits of entropy or information leakage,
no information about the plaintext), whereas more fine- a major asymptotic improvement over fixed-size blocks.
grained padding functions leak more bits. The maximum overhead is substantial, however, al-
most +100%: e.g., a 17 GB Blu-Ray movie would be
padded into 32 GB. Using powers of another base x > 2,
4.3 Strawman Padding Approaches we reduce leakage further at a cost of more overhead:
e.g., padding to the nearest power of 3 incurs overhead
We first explore two strawman designs, based on differ- up to +200%, with less leakage but still O(log log M ).
ent padding functions f . A padding function that offers We could reduce overhead by using a fractional base
any useful protection cannot be one-to-one, otherwise 1 < x < 2, but fractional exponents are cumbersome in
the adversary could trivially invert it and recover |p|. practical padding functions we would prefer to be sim-
We also exclude randomized padding schemes for sim- ple and operate only on integers. Although this second
plicity, and because in practice adversaries can typically strawman succeeds in achieving asymptotically lower
cancel out and defeat random padding factors statisti- leakage than padding to fixed-size blocks, it is less at-
cally over many observations. Therefore, only padding tractive in practice due to high overhead when x ≥ 2
functions that group many plaintext lengths into fewer and due to computation complexity when 1 < x < 2.
padded ciphertexts are of interest in our analysis.
Strawman 1: Fixed-Size Blocks. We first consider
a padding function f (L) = b · dL/be, where b is a block 4.4 Padmé
size in bytes. This is how objects often get “padded” by
We now describe our padding scheme Padmé, which
default in practice, e.g., in block ciphers or Tor cells.
limits information leakage about the length of the plain-
In this case, the PURB’s size is a multiple of b, the
text for wide range of encrypted data sizes. Similarly
maximum additive overhead incurred is b − 1 bytes, and
to the previous strawman, Padmé also asymptotically
the leakage is dlog2 M/be = O(log M ), where M is the
leaks O(log log M ) bits of information, but its overhead
maximum plaintext size.
is much lower (at most 12% and decreasing with L).
In practice, when plaintext sizes differ by orders
of magnitude, there is no good value for b that serves Intuition. In NextP2, any permissible padded length
all plaintexts well. For instance, consider b = 1 MB. L has the form L = 2n . We can therefore represent L as
Padding small files and network messages would in- a binary floating-point number with a blog2 nc + 1-bit
cur a large overhead: e.g., padding Tor’s 512 B cells exponent and a mantissa of zero, i.e., no fractional bits.
to 1 MB would incur overheads of 2000×. In contrast, In Padmé, we similarly represent a permissible
padding a 700 MB movie with at most 1 MB of chaff padded length as a binary floating-point number, but
would add only a little confusion to the adversary, as we allow a non-zero mantissa at most as long as the ex-
this movie may still be readily distinguishable from oth- ponent (see Figure 6). This approach doubles the num-
ers by length. To reduce information leakage asymp- ber of bits used to represent an allowed padded length
totically over a vast range of cleartext sizes, therefore, – hence doubling absolute leakage via length – but al-
padding must depend on plaintext size. lows for more fine-grained buckets, reducing overhead.
Padmé asymptotically leaks the same number of bits as
Strawman 2: Padding to Powers of 2. The next
NextP2, differing only by a constant factor of 2, but
step is to pad to varying-size blocks, which is the basis
reduces space overhead by almost 10× (from +100% to
for our actual scheme. The intuition is that for small
+12%). More importantly, the multiplicative expansion
plaintexts, the blocks are small too, yielding modest
overhead decreases with L (see Figure 7).
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 17
blog2 nc + 1-bit exponent 0-bit mantissa exponent: it is “too precise” and therefore not a permit-
ted padded length. The value 10 is permitted, however,
In the strawman NextP2, the allowed length L = 2n can be rep-
so a 9 byte-long ciphertext is padded to 10 bytes.
resented as a binary floating-point number with a blog(n) + 1c bits
of exponent and no mantissa.
To understand why Padmé requires the low E − S
bits to be 0, notice that forcing all the last E bits to 0 is
equivalent to padding to a power of two. In comparison,
blog2 nc + 1-bit exponent blog2 nc + 1-bit mantissa
Padmé allows S extra bits to represent the padded size,
Fig. 6. Padmé represents lengths as floating-point numbers, allow- with S defined as the bit length of the exponent.
ing the mantissa to be of at most blog2 nc + 1 bits. Algorithm 1 specifies the Padmé function precisely.
Algorithm 1: Padmé
100 PadMé Data: length of content L
Next power of 2
80 Result: length of padded content L0
padding overhead [%]
20
m ← (1 z) − 1 // mask of z 1’s in LSB
// round up using mask m to clear last z bits
0
L0 ← (L + m) & ∼m
103 104 105 106
original size L [B]
5.2.1 Methodology
5.1 Implementation
We ran the encoding experiments on a consumer-grade
laptop, with a quad-core 2.2 GHz Intel Core i7 processor
We implemented a prototype of the PURB encoding
and 16 GB of RAM, using Go 1.12.5. To compare with
and padding schemes in Go. The implementation follows
an OpenPGP implementation, we use and modify Key-
the algorithms in §3.7, and it consists of 2 kLOC. Our
base’s fork3 of the default Golang crypto library4 , as the
implementation relies on the open-source Kyber library1
fork adds support for the ECDH scheme on Curve25519.
for cryptographic operations. The code is designed to be
We further modify Keybase’s implementation to
easy to integrate with existing applications. The code
add the support for the anonymized OpenPGP scheme.
is still proof-of-concept, however, and has not yet gone
All the encoding experiments use a PURB suite based
through rigorous analysis and hardening, in particular
on the Curve25519 elliptic-curve group, AES128-GCM
against timing attacks.
for entry point encryption and SHA256 for hashing. We
Reproducibility. All the datasets, the source code for also apply the global nonce optimization, as discussed in
PURBs and Padmé, as well as scripts for reproducing §3.8. For experiments needing more than one suite, we
all experiments, are available in the main repository2 . use copies the above suite to ensure homogeneity across
timing experiments. The payload size in each experi-
ment is 1 KB. For each data point, we generate a new
5.2 Performance of the PURB Encoding set of keys, one per recipient. We measure each data
point 20 times, using fresh randomness each time, and
The main question we answer in the evaluation of the depict the median value and the standard deviation.
encoding scheme is whether it has a reasonable cost, in
terms of both time and space overhead, and whether it
scales gracefully with an increasing number of recipients 5.2.2 Results
and/or cipher suites. First, we measure the average CPU
time required to encode and decode a PURB. Then,
Encoding Performance. In this experiment, we first
we compare the decoding performance with the perfor-
evaluate how the time required to encode a PURB
mance of plain and anonymized OpenPGP schemes de-
changes with a growing number of recipients and cipher
scribed below. Finally, we show how the compactness of
suites, and second, how the main computational com-
the header changes with multiple recipients and suites,
ponents contribute to this duration. We divide the total
as a percentage of useful bits in the header.
encoding time into three components. The first is au-
Anonymized PGP. In standard PGP, the identity— thenticated encryption of entry points. The second is
more precisely, the public key ID—of the recipient is the generation and Elligator encoding of sender’s public
embedded in the header of the encrypted blob. This keys, one per suite. A public key is derived by multi-
plaintext marker speeds up decryption, but enables a plying a base point with a freshly generated private key
third party to enumerate all data recipients. In the so- (scalar). If the resultant public key is not encodable,
called anonymized or “hidden” version of PGP [14, Sec- which happens in half of the cases, a new key is gen-
tion 5.1], this key ID is substituted with zeros. In this erated. Point multiplication dominates this component,
case, the recipient sequentially tries the encrypted en- constituting ≈ 90% of the total time. The third is the
tries of the header with her keys. We use the hidden
1 https://github.com/dedis/kyber 3 https://github.com/keybase/go-crypto
2 https://github.com/dedis/purb 4 https://github.com/golang/crypto
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 19
104
EncHeader 1 suite PGP standard
KeyGen 3 suites PGP hidden
103 SharedSecrets 10 suites 102 PURBs flat
Total time
101
1
10
optimization
Assembly-
100 100
10−1
10−1
10−2
1 3 10 100 100 101 102 103 104
Number of Recipients Number of Recipients
(a) The CPU cost of encoding a PURB given the number of recip- (b) The worst-case CPU cost of decoding for PGP, PGP with hidden
ients and of cipher suites. EncHeader: encryption of entry points; recipients, PURBs without hash tables (flat), and standard PURBs.
KeyGen: generation and hiding of public keys; SharedSecrets: com-
putation of shared secrets.
is noticeable after 100 recipients and will become more Table 2. Datasets used in the evaluation of anonymity provided by
pronounced if point multiplication is optimized. Padmé.
Dataset # of objects
Header Compactness. Compared with placing the
Ubuntu packages 56,517
header elements linearly, our expanding hash table de- YouTube videos 191,250
sign is less compact, but enables more efficient decod- File collections 3,027,460
ing. Figure 8b shows an example of this trade-off, PGP Alexa top 1M Websites 2,627
hidden versus PURBs standard.
In Figure 9, we show the compactness, or the per-
centage of the PURB header that is filled with actual instance. As packages can be referenced in multiple
data, with respect to the number of recipients and cipher repositories, we filtered the list by name and architec-
suites. Not surprisingly, an increasing number of recip- ture. The reason for padding Ubuntu software updates
ients and/or suites increases the collisions and reduces is that the knowledge of updates enables a local eaves-
compactness: 45% for 100 recipients and 1 suite, 36% for dropper to build a list of packages and their versions
100 recipients and 10 suites. In the most common case of that are installed on a machine. If some of the packages
having one recipient in one suite, however, the header is are outdated and have known vulnerabilities, an adver-
perfectly compact. Finally, there is a trade-off between sary might use it as an attack vector. A percentage of
compactness and efficient decryption. We can easily in- software updates still occurs over un-encrypted connec-
crease compactness by resolving entry point hash table tions, which is still an issue; but encrypted connections
collisions linearly, instead of directly moving to the next to software-update repositories also expose which dis-
hash table. The downside is that the recipient has more tribution and the kind of update being done (security
entry points to try. / restricted5 / multiverse6 / etc). We hope that this
unnecessary leakage will disappear in the near future.
The YouTube dataset contains 191,250 unique
5.3 Performance of Padmé Padding videos, obtained by iteratively querying the YouTube
API. One semantic video is generally represented by
In evaluating a padding scheme, one important met- 2 − 5 .webm files, which corresponds to various video
ric is overhead incurred in terms of bits added to the qualities. Hence, each object in the dataset is a unique
plaintexts. By design, Padmé’s overhead is bounded (video, quality) pair. We use this dataset as if the videos
1
by 2·log . As discussed in §4.4, Padmé does not es- were downloaded in bulk rather than streamed; that
2L
cape the typical overhead-to-leakage trade-off, hence is, we pad the video as a single file. The argument
Padmé’s novelty does not lie in this tradeoff. Rather, for padding YouTube videos as whole files is that, as
the novelty lies in the practical relation between L and shown by related work [43, 44, 50], variable-bitrate en-
the overhead. Padmé’s overhead is moderate, at most coding combined with streaming leak which video is be-
+12% and much less for large PURBs. ing watched. If YouTube wanted to protect the privacy
A more interesting question is how effectively, given of its users, it could re-encode everything to constant-
an arbitrary collection of plaintexts P , Padmé hides bitrate encoding and still stream it, but then the total
which plaintext is padded. Padmé was designed to work length of the stream would still leak information. Alter-
with an arbritrary collection of plaintexts P . It remains natively, it could adopt a model similar to that of the
to be seen how Padmé performs when applied to a spe- iTunes store, where videos have variable bit-rate but
cific set of plaintexts P , i.e., with a distribution coming are bulk-downloaded; but again, the total downloaded
from the real world, and to establish how well it groups length would leak information, requiring some padding.
files into sets of identical length. In the next section, we Hence, we explore how unique the YouTube videos are
experiment with four datasets made of various objects: a by length with and without padding.
collection of Ubuntu packages, a set of YouTube videos, The files dataset was constituted by collecting
a set of user files, and a set of Alexa Top 1M websites. the file sizes in the home directories (‘~user/’) of 10
co-workers and contains 3,027,460 of both personal files
and configuration files. These files were collected on ma-
5.3.1 Datasets and Methodology
The Ubuntu dataset contains 56,517 unique packages, 5 Contains proprietary software and drivers.
parsed from the official repository of a live Ubuntu 16.04 6 Contains software restricted by copyright.
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 21
100 alexa For all these datasets, despite containing very differ-
youtube
80 files home
ent objects, a large percentage of objects have a unique
ubuntu packages size: 87% in the case of YouTube video (Figure 11a),
Percentile
(b) Dataset ‘files’: The closest related work PURBs build on is Broadcast
Encryption [4, 13, 19, 22, 24], which formalizes the se-
100 curity notion behind a ciphertext for multiple recipi-
80
ents. In particular, the most relevant notion in (Private)
Broadcast Encryption is Recipient Privacy [4], in which
Percentile
60
There are two fundamental differences with PURBs.
40 First, PURBs focus on a single unit of data; we do
Unpadded
not yet explore the question of the time distribution
20 Tor cell (512B)
Padmé of multiple PURBs. Second, traffic-morphing systems,
Next power of 2 in most cases, try to mimic a specific transport and
0
100 101 102 103 104 sometimes are designed to only hide the traffic of one
Anonymity set size
given tool, whereas PURBs are universal and arguably
(d) Dataset ‘Alexa’: adaptable to any underlying application. Moreover, it
has been argued that most traffic-morphing tools do
100
not achieve unobservability in real-world settings due
80 to discrepancies between their implementations and the
systems that they try to imitate, because of the un-
Percentile
60
covered behavior of side protocols, error handling, re-
40 sponses to probing, etc. [23, 29, 54]. We believe that for
Unpadded
Tor cell (512B) a wide class of applications, using pseudo-random uni-
20
Padmé form blobs, either alone or in combination with other
Next power of 2
0 lower-level tools, is a potential solution in a different
100 101 102
Anonymity set size
direction.
Traffic analysis aims at inferring the contents of
Fig. 11. Analysis of the anonymity provided by various padding encrypted communication by analyzing metadata. The
approaches: NextP2, Padmé, padding with a constant block size most well-studied application of it is website fingerprint-
and no padding. We measure for each object with how many other ing [21, 39, 56, 57], but it has also been applied to video
objects it becomes indistinguishable after being padded, and plot identification [43, 44, 50] and VoIP traffic [15, 61]. In
the distribution. NextP2 provides better anonymity, at the cost of
a drastically higher overhead (at most +100% instead of +12%).
Overheads are shown in Table 3.
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 23
website fingerprinting over Tor, research has repeatedly us to improve this paper. We also thank Enis Ceyhun
showed that the total website size is the feature that Alp, Cristina Basescu, Kelong Cong, Philipp Jovanovic,
helps an adversary the most [16, 21, 38]. In particular, Apostolos Pyrgelis and Henry Corrigan-Gibbs for their
Dyer et al. [21] show the necessity of padding the whole helpful comments and suggestions, and Holly B. Cogliati
website, as opposed to individual packets, to prevent for text editing. This project was supported in part by
an adversary from identifying a website by its observed grant #2017-201 of the Strategic Focal Area “Personal-
total size. They also systematized the existing padding ized Health and Related Technologies (PHRT)” of the
approaches. Wang et al. [58] propose deterministic and ETH Domain and by grants from the AXA Research
randomized padding strategies tailored for padding Tor Fund, Handshake, and the Swiss Data Science Center.
traffic against a perfect attacker, which inspired our §4.
Finally, Sphinx [18] is an encrypted packet format
for mix networks with the goal of minimizing the infor-
mation revealed to the adversary. Sphinx shares similar-
References
ities with PURBs in its binary format (e.g., the pres- [1] Ring-road: Leaking sensitive data in security protocols.
ence of a group element followed by a ciphertext). Un- http://www.ringroadbug.com/.
like PURBs, however, it supports only one cipher suite, [2] Michel Abdalla, Mihir Bellare, and Phillip Rogaway. The Or-
and one direct recipient (but several nested ones, due to acle Diffie-Hellman Assumptions and an Analysis of DHIES.
the nature of mix networks). To the best of our knowl- In Cryptographers’ Track at the RSA Conference, pages
143–158, 2001.
edge, PURBs is the first solution that hides all meta-
[3] Diego F Aranha, Pierre-Alain Fouque, Chen Qian, Mehdi
data while providing cryptographic agility. Tibouchi, and Jean-Christophe Zapalowicz. Binary Elligator
Squared. In International Workshop on Selected Areas in
Cryptography, pages 20–37, 2014.
[4] Adam Barth, Dan Boneh, and Brent Waters. Privacy in
7 Conclusion Encrypted Content Distribution Using Private Broadcast
Encryption. In International Conference on Financial Cryp-
Conventional encrypted data formats leak information, tography and Data Security, pages 52–64, 2006.
via both unencrypted metadata and ciphertext length, [5] Tal Be’ery and Amichai Shulman. A Perfect CRIME? Only
TIME Will Tell. Black Hat Europe, 2013.
that may be used by attackers to infer sensitive informa-
[6] Mihir Bellare, Alexandra Boldyreva, Anand Desai, and David
tion via techniques such as traffic analysis and website Pointcheval. Key-Privacy in Public-Key Encryption. In
fingerprinting. We have argued that this metadata leak- Advances in Cryptology – ASIACRYPT 2001, pages 566–
age is not necessary, and as evidence have presented 582, 2001.
PURBs, a generic approach for designing encrypted [7] Mihir Bellare, Alexandra Boldyreva, Kaoru Kurosawa, and
Jessica Staddon. Multi-Recipient Encryption Schemes: Effi-
data formats that do not leak anything at all, except for
cient Constructions and Their Security. IEEE Transactions
the padded length of the ciphertexts, to anyone with- on Information Theory, 53(11):3927–3943, 2007.
out the decryption keys. We have shown that despite [8] Mihir Bellare and Chanathip Namprempre. Authenticated
having no cleartext header, PURBs can be efficiently Encryption: Relations among Notions and Analysis of the
encoded and decoded, and can simultaneously support Generic Composition Paradigm. Journal of Cryptology,
multiple public keys and cipher suites. Finally, we have 21(4):469–491, 2008.
[9] Mihir Bellare and Björn Tackmann. The Multi-user Security
introduced Padmé, a padding scheme that reduces the
of Authenticated Encryption: AES-GCM in TLS 1.3. In
length leakage of ciphertexts and has a modest overhead Annual International Cryptology Conference, pages 247–276,
decreasing with file size. Padmé performs significantly 2016.
better than classic padding schemes with fixed block size [10] Daniel J Bernstein, Mike Hamburg, Anna Krasnova, and
in terms of anonymity, and its overhead is asymptoti- Tanja Lange. Elligator: Elliptic-curve points indistinguish-
able from uniform random strings. In ACM Conference on
cally lower than using exponentially increasing padding.
Computer and Communications Security, CCS ’13, 2013.
[11] Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. Ar-
gon2: New Generation of Memory-Hard Functions for Pass-
word Hashing and Other Applications. Technical report,
Acknowledgments 2015.
[12] Tor Blog. Tor at the heart: Bridges and pluggable trans-
We are thankful to our anonymous reviewers and our ports. https://blog.torproject.org/tor-heart-bridges-and-
meticulous proof shepherd Markulf Kohlweiss for their pluggable-transports, Dec 2016.
constructive and thorough feedback that has helped
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 24
[13] Dan Boneh, Craig Gentry, and Brent Waters. Collusion nications. In IEEE Symposium on Security and Privacy, S&P
Resistant Broadcast Encryption with Short Ciphertexts and ’13, pages 65–79, 2013.
Private Keys. In Advances in Cryptology – CRYPTO, pages [30] IDRIX. Veracrypt. https://www.veracrypt.fr/en/Home.html.
258–275, 2005. [31] Jonathan Katz and Yehuda Lindell. Introduction to modern
[14] J. Callas, L. Donnerhacke, H. Finney, D. Shaw, and cryptography. CRC press, 2014.
R. Thayer. OpenPGP Message Format. RFC 4880, Nov [32] John Kelsey. Compression and information leakage of plain-
2007. text. In International Workshop on Fast Software Encryp-
[15] Yu-Chun Chang, Kuan-Ta Chen, Chen-Chi Wu, and Chin- tion, Lecture Notes in Computer Science, pages 263–276,
Laung Lei. Inferring speech activity from encrypted Skype 2002.
traffic. In IEEE Global Telecommunications Conference, [33] Hugo Krawczyk. Cryptographic extraction and key deriva-
GLOBECOM, pages 1–5, 2008. tion: The HKDF scheme. In Annual Cryptology Conference,
[16] Giovanni Cherubin, Jamie Hayes, and Marc Juarez. Website pages 631–648, 2010.
Fingerprinting Defenses at the Application Layer. In Pri- [34] Kaoru Kurosawa. Multi-recipient public-key encryption with
vacy Enhancing Technologies Symposium, PETS ’17, pages shortened ciphertext. In International Workshop on Public
2:186–2:203, 2017. Key Cryptography, pages 48–63, 2002.
[17] George Danezis and Richard Clayton. Introducing Traffic [35] Stevens Le Blond, Chao Zhang, Arnaud Legout, Keith Ross,
Analysis. 2007. and Walid Dabbous. I Know Where You Are and What
[18] George Danezis and Ian Goldberg. Sphinx: A Compact You Are Sharing: Exploiting P2P Communications to Invade
and Provably Secure Mix Format. In IEEE Symposium on Users’ Privacy. In ACM SIGCOMM Conference on Internet
Security and Privacy, S&P ’09, pages 269–282, 2009. Measurement Conference, IMC ’11, 2011.
[19] Cécile Delerablée. Identity-Based Broadcast Encryption with [36] Jonathan Mayer, Patrick Mutchler, and John C. Mitchell.
Constant Size Ciphertexts and Private Keys. In International Evaluating the privacy properties of telephone meta-
Conference on the Theory and Application of Cryptology data. Proceedings of the National Academy of Sciences,
and Information Security, pages 200–215, 2007. 113(20):5536–5541, 2016.
[20] T. Dierks and E. Rescorla. The Transport Layer Security [37] Hooman Mohajeri Moghaddam, Baiyu Li, Mohammad Der-
(TLS) Protocol Version 1.2. RFC 5246, Aug 2008. akhshani, and Ian Goldberg. SkypeMorph: Protocol Obfus-
[21] Kevin P Dyer, Scott E Coull, Thomas Ristenpart, and cation for Tor Bridges. In ACM Conference on Computer
Thomas Shrimpton. Peek-a-Boo, I Still See You: Why Effi- and Communications Security, CCS ’12, pages 97–108,
cient Traffic Analysis Countermeasures Fail. In IEEE Sym- 2012.
posium on Security and Privacy, S&P ’12, pages 332–346, [38] Rebekah Overdorf, Mark Juarez, Gunes Acar, Rachel Green-
2012. stadt, and Claudia Diaz. How Unique is Your .onion?: An
[22] Nelly Fazio and Irippuge Milinda Perera. Outsider- Analysis of the Fingerprintability of Tor Onion Services. In
Anonymous Broadcast Encryption with Sublinear Cipher- ACM Conference on Computer and Communications Secu-
texts. In International Workshop on Public Key Cryptogra- rity, CCS ’17, pages 2021–2036, 2017.
phy, pages 225–242, 2012. [39] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and
[23] Sergey Frolov and Eric Wustrow. The use of TLS in Cen- Thomas Engel. Website fingerprinting in onion routing based
sorship Circumvention. In Network and Distributed System anonymization networks. In ACM Workshop on Workshop
Security (NDSS) Symposium, 2019. on Privacy in the Electronic Society, pages 103–114, 2011.
[24] Craig Gentry and Brent Waters. Adaptive Security in Broad- [40] Jeffrey Pang, Ben Greenstein, Ramakrishna Gummadi, Srini-
cast Encryption Systems (with Short Ciphertexts). In An- vasan Seshan, and David Wetherall. 802.11 User Finger-
toine Joux, editor, Annual International Conference on the printing. In ACM International Conference on Mobile Com-
Theory and Applications of Cryptographic Techniques, pages puting and Networking, MobiCom ’07, pages 99–110, 2007.
171–188, 2009. [41] Colin Percival. Stronger key derivation via sequential
[25] Yoel Gluck, Neal Harris, and Angelo Prado. BREACH: reviv- memory-hard functions. Self-published, pages 1–16, 2009.
ing the CRIME attack. Black Hat USA, 2013. [42] Damian Poddebniak, Christian Dresen, Jens Müller, Fabian
[26] B. Greschbach, G. Kreitz, and S. Buchegger. The devil Ising, Sebastian Schinzel, Simon Friedberger, Juraj So-
is in the metadata 2014 – New privacy challenges in De- morovsky, and Jörg Schwenk. Efail: Breaking S/MIME and
centralised Online Social Networks. In IEEE International OpenPGP Email Encryption using Exfiltration Channels. In
Conference on Pervasive Computing and Communications USENIX Security Symposium, USENIX ’18, 2018.
Workshops, pages 333–339, March 2012. [43] Andrew Reed and Benjamin Klimkowski. Leaky streams:
[27] Dominik Herrmann, Rolf Wendolsky, and Hannes Feder- Identifying variable bitrate DASH videos streamed over en-
rath. Website Fingerprinting: Attacking Popular Privacy crypted 802.11n connections. In IEEE Consumer Commu-
Enhancing Technologies with the Multinomial Naïve-bayes nications & Networking Conference (CCNC), pages 1107–
Classifier. In ACM Workshop on Cloud Computing Security, 1112, 2016.
CCSW ’09, pages 31–42, 2009. [44] Andrew Reed and Michael Kranch. Identifying HTTPS-
[28] P. Hoffman and J. Schlyter. The DNS-Based Authentication protected Netflix videos in real-time. In ACM Conference on
of Named Entities (DANE) Transport Layer Security (TLS) Data and Application Security and Privacy, pages 361–368,
Protocol: TLSA. RFC 6698, August 2012. 2017.
[29] Amir Houmansadr, Chad Brubaker, and Vitaly Shmatikov. [45] E. Rescorla. The Transport Layer Security (TLS) Protocol
The parrot is dead: Observing unobservable network commu- Version 1.3. RFC 8446, Aug 2018.
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 25
[46] Ivan Ristić. HTTP client fingerprinting using ssl handshake [63] Fan Zhang, Wenbo He, Xue Liu, and Patrick G. Bridges.
analysis. https://blog.ivanristic.com/2009/06/http-client- Inferring Users’ Online Activities Through Traffic Analysis.
fingerprinting-using-ssl-handshake-analysis.html, Jun 2009. In ACM Conference on Wireless Network Security, WiSec
[47] Tom Ritter and Daniel Kahn Gillmor. Protecting the TLS ’11, pages 59–70, 2011.
Handshake. IETF Interim, May 2014. [64] Philip R. Zimmermann. The Official PGP User’s Guide. MIT
[48] Juliano Rizzo and Thai Duong. The CRIME attack. Press, Cambridge, MA, USA, 1995.
Ekoparty, 2012.
[49] Phillip Rogaway. Nonce-based symmetric encryption. In
International Workshop on Fast Software Encryption, pages
348–358, 2004. A Layout
[50] Roei Schuster, Vitaly Shmatikov, and Eran Tromer. Beauty
and the Burst: Remote Identification of Encrypted Video
Algorithm 2 presents the Layout algorithm a sender
Streams. In USENIX Security Symposium, USENIX ’17,
pages 1357–1374, 2017. uses in step (8) of MsPURB.Enc. Layout arranges
[51] Mehdi Tibouchi. Elligator squared: Uniform points on elliptic PURB’s components in a continuous byte array.
curves of prime order as uniform random strings. In Inter-
Notation. We denote by a[i : j] ← b, the operation of
national Conference on Financial Cryptography and Data
Security, pages 139–156, 2014.
copying the bits of b at the positions a[i], a[i+1], · · · a[j−
[52] Thiago Valverde. Bad life advice - replay attacks against 1]. When written like this, b always has correct length of
https. http://blog.valverde.me/2015/12/07/bad-life- j − i bits, and we assume i < j. If, before an operation
advice/, Dec 2015. a[i : j] ← b, |a| < j, we first grow a to length j. We
[53] Guido Vranken. HTTPS Bicycle Attack. sometimes write a[i :] ← b instead of a[i : |b|] ← b.
https://guidovranken.com/2015/12/30/https-bicycle-
We use a “reservation array”, which is an array with a
attack/, Dec 2015.
[54] Liang Wang, Kevin P Dyer, Aditya Akella, Thomas Risten- method array.isFree(start,end) that returns True if and
part, and Thomas Shrimpton. Seeing through network- only if none of the bits array[i], array[i+1], · · · array[j−1]
protocol obfuscation. In ACM Conference on Computer and were previously assigned a value, and False otherwise.
Communications Security, CCS ’15, 2015.
[55] Qiyan Wang, Xun Gong, Giang TK Nguyen, Amir
Houmansadr, and Nikita Borisov. Censorspoofer: asymmetric
communication using IP spoofing for censorship-resistant B Positions for Public Keys
web browsing. In ACM Conference on Computer and Com-
munications Security, pages 121–132, 2012.
This section provides an example of possible sets of al-
[56] Tao Wang and Ian Goldberg. Improved Website Fingerprint-
ing on Tor. In ACM Workshop on Workshop on Privacy in lowed public key positions for the suites in the PURB
the Electronic Society, pages 201–212, 2013. encoding. We emphasize that finding an optimal set of
[57] Tao Wang and Ian Goldberg. On realistically attacking Tor positions was not the focus of this work. The intention
with website fingerprinting. In Privacy Enhancing Technolo- is merely to show that such sets exist and to offer a
gies Symposium, PETS ’16, pages 4:21–4:36, 2016.
concrete example (which is used for the compactness
[58] Tao Wang and Ian Goldberg. Walkie-Talkie: An Efficient
Defense Against Passive Website Fingerprinting Attacks. In
experiment, Figure 9).
USENIX Security Symposium, USENIX ’17, pages 1375– Example. We use the required and recommended suites
1390, 2017. in the latest draft of TLS 1.3 [45] as an example of suites
[59] Zachary Weinberg, Jeffrey Wang, Vinod Yegneswaran,
a PURB could theoretically support. The suites and
Linda Briesemeister, Steven Cheung, Frank Wang, and
Dan Boneh. StegoTorus: a camouflage proxy for the Tor groups are shown in Table 4.
anonymity system. In ACM Conference on Computer and The PURB concept of “suite” combines both
Communications Security, pages 109–120, 2012. “suite” and “group” in TLS. For instance, a PURB
[60] Philipp Winter, Tobias Pulls, and Juergen Fuss. Scramble- suite could be PURB_AES_128_GCM_SHA_256-
Suit: A polymorphic network protocol to circumvent cen-
_SECP256R1. We show possible PURB suites in Ta-
sorship. In ACM Workshop on Workshop on Privacy in the
Electronic Society, pages 213–224, 2013.
ble 6. For the sake of simplicity, we introduce aliases in
[61] Charles V Wright, Lucas Ballard, Fabian Monrose, and Ger- the table, and will further refer to those suites as suite
ald M Masson. Language identification of encrypted VoIP A-F. In Table 5, we show a possible assignment. For in-
traffic: Alejandra y Roberto or Alice and Bob? In USENIX stance, if only suites A and C are used, the public key
Security Symposium, USENIX ’07, pages 43–54, 2007. for A would be placed in [0, 64], while value in [96, 160]
[62] Charles V. Wright, Scott E. Coull, and Fabian Monrose.
is changed so that the XOR of [0, 64] and [96, 160] equals
Traffic Morphing: An Efficient Defense Against Statistical
Traffic Analysis. In Network and Distributed Security Sym- the key for B. Note that a sender must respect the suite
posium, pages 237–250, 2009.
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 26
Algorithm 2: Layout
// τi is an encoded public key of a suite Si // fill empty space in the layout with random bits
// keysi = hZ1 , . . . , Zr i are entry-point keys 36 foreach start, end < layout.end do
2 pubkey_pos = []; // chosen primary position per suite // fill entry-point reservations with ciphertexts
3 pubkey_fixed = []; // all positions fixed so far 43 foreach keysi in hkeys1 , . . . , keysn i do
4 foreach τi in hτ1 , . . . , τn i do 44 while keysi not empty do
// decide suite’s primary public key position 45 Z = keysi .pop();
5 for pos ∈ SuiteAllowedPositions(Si ) do 46 hstart, end, Si ← entrypoints.pop();
6 if pubkey_fixed.isFree(pos.start, pos.end) then // Encrypt an entry point
7 pubkey_pos.append(hτi , posi); 47 e ← EZ (K k meta);
8 layout[pos.start:pos.end] ← τi ; 48 layout[start:end] ← e;
9 break; 49 end
10 end 50 end
11 end
// compute the padding and append it to layout
// later suites cannot modify these positions 51 purb_len ← Padmé (|layout| + |cpayload | + mac_len);
// without disrupting this suite’s XOR 52 mac_pos ← purb_len - mac_len;
12 for pos ∈ SuiteAllowedPositions(Si ) do 53 while not pubkey_fixed.isFree(mac_pos, purb_len) do
13 pubkey_fixed[pos.start:pos.end] ← ‘F’; // MAC mustn’t overlap public-key positions:
14 end // if so, we pad to the next Padmé size
15 end 54 purb_len ← Padmé (purb_len + 1);
// reserve entry-point positions in hash tables 55 mac_pos ← purb_len - mac_len;
16 entrypoints = []; 56 end
17 foreach auxi in haux1 , . . . , auxn i do 57 padding_len ← mac_pos - meta.payload_end;
Table 4. Suites and groups described in the latest draft of TLS 1.3.
D Security Proofs
Symmetric/Hash Algorithms
TLS_AES_128_GCM_SHA256 Required This section contains the proofs of the security proper-
TLS_AES_256_GCM_SHA384 Recommended ties provided by MsPURB.
TLS_CHACHA20_POLY1305_SHA256 Recommended
TLS_AES_128_CCM_SHA256 Optional
TLS_AES_128_CCM_8_SHA256 Optional
Key Exchange Groups D.1 Preliminaries
secp256r1 Required
x25519 Recommended Before diving into proving the security of our scheme,
secp384r1 Optional we define what it means to be ind-cca2- and ind$-cca2-
secp521r1 Optional secure for the primitives that MsPURB builds upon.
x448 Optional
ffdhe2048 Optional Key-Encapsulation Mechanism (KEM). Follow-
ffdhe3072 Optional ing the definition from Katz & Lindell [31], we begin by
ffdhe4096 Optional defining KEM as a tuple of PPT algorithms.
ffdhe6144 Optional
ffdhe8192 Optional Syntax KEM.
KEM.Setup(1λ ) → S: Given a security parameter λ, ini-
Table 5. Example of Allowed Positions per suite. Here, the algo- tialize a cipher suite S.
rithm simply finds any mapping so that each suite can coexist in a KEM.KeyGen(S) → (sk, pk): Given a cipher suite S, gen-
PURB. The receiver must XOR the values at all possible positions erate a (private, public) key pair.
of a suite to obtain an encoded public key.. KEM.Encap(pk) → (c, k): Given a public key pk, output
a ciphertext c and a key k.
Suite Possible positions
KEM.Decap(sk, c) → k/⊥: Given a private key sk and
A {0}
B {0, 64} a ciphertext c, output a key k or a special symbol ⊥
C {0, 96} denoting failure.
D {0, 32, 64, 160}
E {0, 64, 128, 192} Consider an ind-cca2 security game against an adaptive
F {0, 32, 64, 96, 128, 256} adversary A:
Game KEM.
order A-F during encoding. We provide a simple python The KEM ind-cca2 game for a security parameter λ is
script to design such sets in the code repository. between a challenger and an adaptive adversary A. It
proceeds along the following phases.
Init: The challenger and adversary take λ as input. The
adversary outputs a cipher suite S it wants to attack.
C Default Schemes for Payload The challenger verifies that S is a valid cipher suite,
i.e., that it a valid output of KEM.Setup(1λ ). The chal-
In addition to PURB suites, a list of suitable candidates $
lenger aborts, and sets b? ← {0, 1} if S is not valid.
for a payload encryption scheme (Enc, Dec), a MAC al-
Setup: The challenger runs (sk, pk) ← KEM.KeyGen(S)
gorithm MAC, and a hash function H0 must be deter-
and gives pk to A.
mined and standardized. This list can be seamlessly
Phase 1: A can make decapsulation queries qDecap(c)
updated with time, as an encoder makes the choice
with ciphertexts c of its choice, to the challenger who
and records it in meta on per-PURB basis. The cho-
responds with KEM.Decap(sk, c).
sen schemes are shared by all the suites included in
Challenge: The challenger runs (c? , k0 ) ←
the PURB, hence these schemes must match the se- $
curity level of the suite with the highest bit-wise se- KEM.Encap(pk) and generates k1 ← {0, 1}|k0 | . The
$
curity. An example of suitable candidates, given the challenger picks b ← {0, 1} and sends hc? , kb i to A.
suites from Table 6, is (Enc, Dec) = AES256-CBC, Phase 2: A continues querying qDecap(c) with the re-
MAC = HMAC-SHA384, and H0 = SHA3-384. striction that c 6= c? .
Guess: A outputs its guess b? for b and wins if b? = b.
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 28
We define A’s advantage in this game as: MSBE.Dec(sk, c) → m/⊥: Given a private key sk and
the ciphertext c, return a message m or ⊥ if c does
Advcca2 λ ?
1
KEM,A (1 ) = 2 Pr[b = b ] − 2 . not decrypt correctly.
We define A’s advantage in this game as: We say that the tags of a MAC scheme are indistin-
guishable from random if Advind$ λ
MAC,A (1 ) is negligible in
Advcca2-out λ ?
1
msbe,A (1 ) = 2 Pr[b = b ] − 2 .
the security parameter.
We say that a MSBE scheme is ind$-cca2-secure if
Advcca2-out λ
msbe,A (1 ) is negligible in the security parameter.
Finally, we require that the MAC scheme is strongly D.2 Proof of Theorem 1
unforgeable under an adaptive chosen-message attack
and outputs tags that are indistinguishable from ran- We prove the ind$-cca2 security of MsPURB as
dom. A MAC scheme is given by the algorithms an MSBE scheme. More precisely, we will show that
MAC.KeyGen, M, and V, where MAC.KeyGen(1λ ) out- there exists adversaries B1 , . . . , B5 such that
puts a key Kmac . To compute a tag on the message
m, run σ = MKmac (m). The verification algorithm Advcca2-out λ cca2 λ
msbe,A (1 ) ≤ r AdvKEM,B1 (1 ) + AdvΠ,B2
ind$-cca2 λ
(1 ) +
VKmac (m, σ) outputs > if σ is a valid tag on the mes- Advsuf λ ind$ λ
MAC,B3 (1 ) + AdvMAC,B4 (1 )+
sage m and ⊥ otherwise. We formalize the strong un-
Advind$-cpa (1λ ).
forgeability and indistinguishability properties using the (Enc,Dec),B5
them directly (without calling HdrPURB.Decap) when Proof. Let Wi be the event that A outputs b? = 1 in
asked to decrypt variants of the challenge ciphertext. game Gi . We aim to show that
Advcca2-out λ ? ?
Game G3 . msbe,A (1 ) = Pr[b = 1 | b = 0] − Pr[b = 1 | b = 1]
Let ei be the encrypted entry point under key Zi (de- = Pr[W0 ] − Pr[W7 ]
rived from ki ) for recipient i computed in line 47 of
Layout (step (8) of MsPURB.Enc). The game goes is negligible. To do so, we show that each of the steps in
the sequence of games is negligible, i.e., that Pr[Wi ] −
as in G2 , but for the challenge ciphertext, the chal-
lenger saves the mapping of the challenge entry points Pr[Wi+1 ] is negligible. The result then follows from the
Game G4 . G1 <–> G2 .
As in G3 , but the challenger replaces e?1 , . . . , e?r in the We show that the games G1 and G2 are indistinguish-
challenge ciphertext with random strings of the appro- able using a hybrid argument on the number of recipi-
priate length. Note that per the change in G3 , the chal- ents r. Consider the hybrid games Hi where the first i re-
lenger will not try to decrypt these e?i , but will recover cipients use random keys k1 , . . . , ki as in G2 , whereas the
K ? and meta? directly instead. remaining r − i recipients use the real keys ki+1 , . . . , kr
as in G1 . Then G1 = H0 and G2 = Hr .
Game G5 . We prove that A cannot distinguish Hj−1 from Hj .
As in G4 , but the challenger replies differently to Let Sj = hG, p, g, Hide(·), Π, H, Ĥi, be the suite corre-
the queries qDec(pki (Si ), c) where c is not equal the sponding to recipient j. Suppose A can distinguish Hj−1
challenge ciphertext c? but the encoded public key from Hj , then we can build a distinguisher B against
τ recovered in step (1) of MsPURB.Dec is such that the ind$-cca2 security of the IES KEM for the suite
Unhide(τ ) = Xj? and ei = e?i . In this case, the challenger Sj0 = hG, p, g, Hi. Recall that B receives, from its ind$-
replies with ⊥ directly, without running VKmac (·) (step cca2-KEM challenger,
(5) of MsPURB.Dec). – a public key Y ;
$
– a challenge hX ? , k ? i, where depending on bit b ←
Game G6 . ?
{0, 1}, we have k ? = H(Y x ) if b = 0 or k ? ←
$
As in G5 , but the challenger replaces the integrity tag {0, 1}λH if b = 1 (where λH is the bit-length of H);
in the challenge ciphertext in step (9) of MsPURB.Enc – access to a Decap(·) oracle for all but X ? .
with a random string of the same length. At the start of the game, B will set pkj = Y , so that
the public key of recipient j matches that of its IES
Game G7 .
KEM challenger. Note that B does not know the cor-
As in G6 , but the challenger replaces the encrypted pay-
responding private key yj . For all other recipients i, B
load cpayload in the challenge ciphertext in step (7) of
sets (ski = yi , pki = Yi ) = MsPURB.KeyGen(Si ).
MsPURB.Enc with a random string of the same length.
The distinguisher B will use its challenge (X ? , k ? )
to construct the challenge ciphertext for A. In particu-
Conclusion. As of G7 , all ciphertexts in the PURBs lar, when running HdrPURB.Encap for a suite Sj , it sets
header, the payload encryption and the MAC have been X = X ? in step (1) of HdrPURB.Encap. Moreover, for
replaced by random strings. The open slots in the hash recipient j it will use kj = k ? . For all other recipients i
tables are always filled with random bits. Finally, the with corresponding suites Si it proceeds as follows when
encoded keys τ = Hide(X) are indistinguishable from computing ki in HdrPURB.Encap.
random strings as well, since the keys X are random. $
– If i < j, then it sets ki ← {0, 1}λH for appropriate
Therefore, the PURB ciphertexts c are indeed indistin-
λH ;
guishable from random strings, as in the MSBE game
– If i > j and the suite Si for user i is the same as
with b = 1.
suite Sj for user j, then it sets ki = H(X ? yi ); and
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 31
– If i > j, but Sj 6= Si , then it computes ki as per random key of the ind$-cca2 challenger. For other users
steps (1) and (2) of HdrPURB.Encap. i it proceeds as follows:
Thereafter, B continues running MsPURB.Enc as before. – If i < j, it sets e?i to a random string of appropriate
Whenever B receives a decryption query for a user length.
pki , it proceeds as before. When it receives a decryption – If i > j, it computes e?i as per line 47 of Layout.
query for user pkj , it uses its IES-KEM Decap oracle in Thereafter, B answers decryption queries as before.
step (2) of HdrPURB.Decap. Note that B is not allowed Except that whenever, B derives key kj for user j, it will
to call Decap(·) on X ? , but as per the changes in G1 , use its decryption oracle DZ (·). Note that in particular,
it will directly use k ? for user pkj if HdrPURB.Decap because of the changes in G3 , B will not make DZ (·)
recovers X ? in step (1). queries on e?i from the challenge ciphertext c? .
If b = 0 in B’s IES KEM challenge, then recipient If b = 0, B simulates Hj−1 , and if b = 1, it simulates
?
j’s key kj = H(Y x ), and hence B perfectly simulates Hj . Therefore, if A distinguishes between Hj−1 and Hj ,
Hj−1 . If b = 1 in B’s IES KEM challenge, then j’s key then B breaks the ind$-cca2 security of Π. To show that
$ G3 is indistinguishable from G4 , repeat this argument r
kj ← {0, 1}λH and, hence, B perfectly simulates Hj . If
A distinguishes Hj−1 from Hj then B breaks the ind$- times. More precisely:
cca2-KEM security of IES. Hence, Hj−1 and Hj are in- Pr[W3 ] − Pr[W4 ] ≤ r · Advind$-cca2 (1λ ).
distinguishable. Repeating this argument r times shows Π,A
Game G2 . as desired.
As in G1 , but we change the random sampling ki?
in HdrPURB.Encap for the challenge ciphertext with
ki? = H(Yjx ) = kj? where Yj = pkj . The game now is the
original recipient-privacy ind$-cpa game where b = 1.