Popets 2019 0056

Proceedings on Privacy Enhancing Technologies ; 2019 (4):6–33
Kirill Nikitin*, Ludovic Barman*, Wouter Lueks, Matthew Underwood, Jean-Pierre Hubaux und
Bryan Ford
Reducing Metadata Leakage from Encrypted

Files and Communication with PURBs
Abstract: Most encrypted data formats leak metadata
via their plaintext headers, such as format version, en-
1 Introduction
cryption schemes used, number of recipients who can de-
Traditional encryption schemes and protocols aim to
crypt the data, and even the recipients’ identities. This
protect only their data payload, leaving related meta-
leakage can pose security and privacy risks to users, e.g.,
data exposed. Formats such as PGP [64] reveal in clear-
by revealing the full membership of a group of collabo-
text headers the public keys of the intended recipients,
rators from a single encrypted e-mail, or by enabling an
the algorithm used for encryption, and the actual length
eavesdropper to fingerprint the precise encryption soft-
of the payload. Secure-communication protocols simi-
ware version and configuration the sender used.
larly leak information during key and algorithm agree-
We propose that future encrypted data formats improve
ment. The TLS handshake [45], for example, leaks in
security and privacy hygiene by producing Padded Uni-
cleartext the protocol version, chosen cipher suite, and
form Random Blobs or PURBs: ciphertexts indistin-
the public keys of the parties. This metadata exposure
guishable from random bit strings to anyone without
is traditionally assumed not to be security-sensitive, but
a decryption key. A PURB’s content leaks nothing at
important for the recipient’s decryption efficiency.
all, even the application that created it, and is padded
Research has consistently shown, however, that at-
such that even its length leaks as little as possible.
tackers can exploit metadata to infer sensitive infor-
Encoding and decoding ciphertexts with no cleartext
mation about communication content. In particular, an
markers presents efficiency challenges, however. We
attacker may be able to fingerprint users [40, 52] and
present cryptographically agile encodings enabling le-
the applications they use use [63]. Using traffic anal-
gitimate recipients to decrypt a PURB efficiently, even
ysis [17], an attacker may be able to infer websites
when encrypted for any number of recipients’ public
a user visited [17, 21, 39, 56, 57] or videos a user
keys and/or passwords, and when these public keys
watched [43, 44, 50]. On VoIP, metadata can be used
are from different cryptographic suites. PURBs employ
to infer the geo-location [35], the spoken language [61],
Padmé, a novel padding scheme that limits informa-
or the voice activity of users [15]. Side-channel leaks
tion leakage via ciphertexts of maximum length M to a
from data compression [32] facilitate several attacks on
practical optimum of O(log log M ) bits, comparable to
SSL [5, 25, 48]. The lack of proper padding might en-
padding to a power of two, but with lower overhead of
able an active attacker to learn the length of the user’s
at most 12% and decreasing with larger payloads.
password from TLS [53] or QUIC [1] traffic. In social
Keywords: metadata, leakage, padding, traffic analysis networks, metadata can be used to draw conclusions
DOI 10.2478/popets-2019-0056
about users’ actions [26], whereas telephone metadata
Received 2019-02-28; revised 2019-06-15; accepted 2019-06-16. has been shown to be sufficient for user re-identification
and for determining home locations [36]. Furthermore,
by observing the format of packets, oppressive regimes
can infer which technology is used and use this infor-
mation for the purposes of incrimination or censorship.
*Corresponding Author: Kirill Nikitin: EPFL, E-mail: Most TCP packets that Tor sends, for example, are 586
[email protected] bytes due to its standard cell size [27].
*Corresponding Author: Ludovic Barman: EPFL, E- As a step towards countering these privacy threats,
mail: [email protected]
we propose that encrypted data formats should produce
Wouter Lueks: EPFL, E-mail: [email protected]
Matthew Underwood: unaffiliated Padded Uniform Random Blobs or PURBs: ciphertexts
Jean-Pierre Hubaux: EPFL, E-mail: jean- designed to protect all encryption metadata. A PURB
[email protected] encrypts application content and metadata into a sin-
Bryan Ford: EPFL, E-mail: [email protected] gle blob that is indistinguishable from a random string,
Reducing Metadata Leakage from Encrypted Files and Communication with PURBs 7
and is padded to minimize information leakage via its of costly public-key operations, however, decoding per-
length while minimizing space overhead. Unlike tradi- formance is comparable to PGP, and is almost indepen-
tional formats, a PURB does not leak the encryption dent of the number of recipients (up to 10,000).
schemes used, who or how many recipients can decrypt Analysis of real-world data sets show that many ob-
it, or what application or software version created it. jects are trivially identifiable by their unique sizes with-
While simple in concept, because PURBs by definition out padding, or even after padding to a fixed block size
contain no cleartext structure or markers, encoding and (e.g., that of a block cipher or a Tor cell). We show that
decoding them efficiently presents practical challenges. Padmé can significantly reduce the number of objects
This paper’s first key contribution is Multi-Suite uniquely identifiable by their sizes: from 83% to 3% for
PURB or MsPURB, a cryptographically agile PURB 56k Ubuntu packages, from 87% to 3% for 191k Youtube
encoding scheme that supports any number of recipi- videos, from 45% to 8% for 848k hard-drive user files,
ents, who can use either shared passwords or public- and from 68% to 6% for 2.8k websites from the Alexa
private key pairs utilizing multiple cryptographic suites. top 1M list. This much stronger leakage protection in-
The main technical challenge is providing efficient de- curs an average space overhead of only 3%.
cryption to recipients without leaving any cleartext In summary, our main contributions are as follows:
markers. If efficiency was of no concern, the sender could – We introduce MsPURB, a novel encrypted data for-
simply discard all metadata and expect the recipient to mat that reveals no metadata information to ob-
parse and trial-decrypt the payload using every possi- servers without decryption keys, while efficiently
ble format version, structure, and cipher suite. Real- supporting multiple recipients and cipher suites.
world adoption requires both decryption efficiency and – We introduce Padmé, a padding scheme that
cryptographic agility, however. MsPURB combines a asymptotically minimizes information leakage from
variable-length header containing encrypted metadata data lengths while also limiting size overheads.
with a symmetrically-encrypted payload. The header’s – We implement these encoding and padding schemes,
structure enables efficient decoding by legitimate recip- evaluating the former’s performance against PGP
ients via a small number of trial decryptions. MsPURB and the latter’s efficiency on real-world data.
facilitates the seamless addition and removal of sup-
ported cipher suites, while leaking no information to
third parties without a decryption key. We construct
our scheme starting with the standard construction of 2 Motivation and Background
the Integrated Encryption Scheme (IES) [2] and use the
ideas of multi-recipient public-key encryption [7, 34] as We first offer example scenarios in which PURBs may
a part of the multi-recipient development. be useful, and summarize the Integrated Encryption
To reduce information leakage from data lengths, Scheme that we later use as a design starting point.
this paper’s second main contribution is Padmé, a
padding scheme that groups encrypted PURBs into
indistinguishability sets whose visible lengths are rep- 2.1 Motivation and Applications
resentable as limited-precision floating-point numbers.
Like obvious alternatives such as padding to the next Our goal is to define a generic method applicable to
power of two, Padmé reduces maximum information most of the common data-encryption scenarios such
leakage to O(log log M ) bits, where M is the maximum that the techniques are flexible to the application type,
length of encrypted blob a user or application produces. to the cryptographic algorithms used, and to the num-
Padmé greatly reduces constant-factor overhead with ber of participants involved. We also seek to enhance
respect to obvious alternatives, however, enlarging files plausible deniability such that a user can deny that a
by at most +12%, and less as file size increases. PURB is created by a given application or that the user
In our evaluation, creating a MsPURB ciphertext owns the key to decrypt it. We envision several immedi-
takes 235 ms for 100 recipients on consumer-grade hard- ate applications that could benefit from using PURBs.
ware using 10 different cipher suites, and takes only E-mail Protection. E-mail systems traditionally use
8 ms for the common single-recipient single-suite sce- PGP or S/MIME for encryption. Their packet for-
nario. Our implementation is in pure Go without assem- mats [14], however, exposes format version, encryption
bly optimizations that might speed up public-key oper- methods, number and public-key identities of the recip-
ations. Because the MsPURB design limits the number ients, and public-key algorithms used. In addition, the
payload is padded only to the block size of a symmetric- 2.2 Integrated Encryption Scheme
key algorithm used, which does not provide “size pri-
vacy”, as we show in §5.3. Using PURBs for encrypted The Integrated Encryption Scheme (IES) [2] is a hybrid
e-mail could minimize this metadata leakage. Further- encryption scheme that enables the encryption of arbi-
more, as e-mail traffic is normally sparse, the moderate trary message strings (unlike ElGamal, which requires
overhead PURBs incur can easily be accommodated. the message to be a group element), and offers flexibility
in underlying primitives. To send an encrypted message,
Initiation of Cryptographic Protocols. In most
a sender first generates an ephemeral Diffie-Hellman key
cryptographic protocols, initial cipher suite negotiation,
pair and uses the public key of the recipient to derive a
handshaking, and key exchange are normally performed
shared secret. The choice of the Diffie-Hellman group is
unencrypted. In TLS 1.2 [20], an eavesdropper who mon-
flexible, e.g., multiplicative groups of integers or elliptic
itors a connection from the start can learn many details
curves. The sender then relies on a cryptographic hash
such as cryptographic schemes used. The unencrypted
function to derive the shared keys used to encrypt the
Server Name Indication (SNI) enables an eavesdropper
message with a symmetric-key cipher and to compute
to determine which specific web site a client is con-
a MAC using the encrypt-then-MAC approach. The re-
nected to among the sites hosted by the same server.
sulting ciphertext is structured as shown in Figure 1.
The eavesdropper can also fingerprint the client [46] or
distinguish censorship-circumvention tools that try to
mimic TLS traffic [23, 29]. TLS 1.3 [45] takes a few pro- pks enc(M ) σmac
tective measures: e.g., less unencrypted metadata dur-
ing the handshake, and an experimental extension for Fig. 1. Ciphertext output of the Integrated Encryption Scheme
where pks is an ephemeral public key of the sender, and σmac and
encrypted SNI [45, 47]. These measures are only par-
enc(M ) are generated using the DH-derived keys.
tial, however, and leave other metadata, such as proto-
col version number, cipher suites, and public keys, still
visible. PURBs could facilitate fully-encrypted hand-
shaking from the start, provided a client already knows
at least one public key and cipher suite the server sup- 3 Hiding Encryption Metadata
ports. Clients might cache this information from prior
connections, or obtain it out-of-band while finding the This section addresses the challenges of encoding and
server, e.g., via DNS-based authentication [28]. decoding Padded Uniform Random Blobs or PURBs in
a flexible, efficient, and cryptographically agile way. We
Encrypted Disk Volumes. VeraCrypt [30] uses a
first cover notation, system and threat models, followed
block cipher to turn a disk partition into an encrypted
by a sequence of strawman approaches that address dif-
volume where the partition’s free space is filled with
ferent challenges on the path towards the full MsPURB
random bits. For plausible deniability and coercion pro-
scheme. We start with a scheme where ciphertexts are
tection, VeraCrypt supports so-called hidden volumes:
encrypted with a shared secret and addressed to a sin-
an encrypted volume whose content and metadata is
gle recipient. We then improve it to support public-key
indistinguishable from the free space of a primary en-
operations with a single cipher suite, and finally to mul-
crypted volume hosting the hidden volume. This protec-
tiple recipients and multiple cipher suites.
tion is limited, however, because a primary volume can
host only a single hidden volume. A potential coercer
might therefore assume by default that the coercee has
3.1 Preliminaries
a hidden volume, and interpret a claim of non-possession
of the decryption keys as a refusal to provide them. Let λ be a standard security parameter. We use $ to
PURBs might enhance coercion protection by enabling $
indicate randomness, ← to denote random sampling,
an encrypted volume to contain any number of hidden
k to denote string concatenation and |value| to denote
volumes, facilitating a stronger “N + 1” defense. Even if
the bit-length of “value”. We write PPT as an abbrevi-
a coercee reveals up to N “decoy” volumes, the coercer
ation for probabilistic polynomial-time. Let Π = (E, D)
cannot know whether there are any more.
be an ind$-cca2-secure authenticated-encryption (AE)
scheme [8] where EK (m) and DK (c) are encryption and
decryption algorithms, respectively, given a message
m, a ciphertext c, and a key K. Let MAC = (M, V)
be strongly unforgeable Message Authentication Code 1. An outsider adversary who does not hold a private
(MAC) generation and verification algorithms. An au- key or a password valid for decryption;
thentication tag generated by MAC must be indistin- 2. An insider adversary who is a “curious” and active
guishable from a random bit string. legitimate recipient with a valid decryption key.
Let G be a cyclic finite group of prime order p Both adversaries are adaptive.
generated by the group element g where the gap-CDH Naturally, the latter adversary has more power, e.g., she
problem is hard to solve (e.g., an elliptic curve or a can recover the plaintext payload. Hence, we consider
multiplicative group of integers modulo a large prime). different security goals given the adversary type:
Let Hide : G(1λ ) → {0, 1}λ be a mapping that en- 1. We seek ind$-cca2 security against the outsider ad-
codes a group element of G to a binary string that versary, i.e., the encoded content and all metadata
is indistinguishable from a uniform random bit string must be indistinguishable from random bits under
(e.g., Elligator [10], Elligator Squared [3, 51]). Let an adaptive chosen-ciphertext attack;
Unhide: {0, 1}λ → G(1λ ) be the counterpart to Hide 2. We seek recipient privacy [4] against the insider ad-
that decodes a binary string into a group element of G. versary under a chosen-plaintext attack, i.e., a re-
Let H : G → {0, 1}2λ and Ĥ : {0, 1}∗ → {0, 1}2λ be cipient must not be able to determine the identities
two distinct cryptographic hash functions. Let PBKDF : of the ciphertext’s other recipients.
{salt, password} → {0, 1}2λ be a secure password-based Recipient privacy is a generalization of the key indistin-
key-derivation function [11, 33, 41], a “slow” hash func- guishability notion [6] where an adversary is unable to
tion that converts a salt and a password into a bit string determine whether a given public key has been used for
that can be used as a key for symmetric encryption. a given encryption.
3.1.1 System Model 3.1.3 System Goals
Let data be an application-level unit of data (e.g., a We wish to achieve two system goals beyond security:
file or network message). A sender wants to send an – PURBs must provide cryptographic agility. They
encrypted version of data to one or more recipients. We should accommodate either one or multiple recip-
consider two main approaches for secure data exchanges: ients, allow encryption for each recipient using a
(1) Via pre-shared secrets, where the sender shares shared password or a public key, and support dif-
with the recipients long-term one-to-one passphrases ferent cipher suites. Adding new cipher suites must
Ŝ1 , ..., Ŝr that the participants can use in a password- be seamless and must not affect or break backward
hashing scheme to derive ephemeral secrets S1 , ..., Sr . compatibility with other cipher suites.
(2) Via public-key cryptography, where sender and – PURBs’ encoding and decoding must be “reason-
recipients derive ephemeral secrets Zi = H(X yi ) = ably” efficient. In particular, the number of expen-
H(Yi x ) using a hash function H. Here (x, X = g x ) de- sive public-key operations should be minimized, and
notes the sender’s one-time (private, public) key pair padding must not impose excessive space overhead.
and (yi , Yi = g yi ) is the key pair of recipient i ∈ 1, ..., r.
In both scenarios, the sender uses ephemeral secrets
S1 , ..., Sr or Z1 , ..., Zr to encrypt (parts of) the PURB 3.2 Encryption to a Single Passphrase
header using an authenticated encryption (AE) scheme.
We refer to a tuple S = hG, p, g, Hide(·), Π, H, Ĥi We begin with a simple strawman PURB encoding for-
used in the PURB generation as a cipher suite. This mat allowing a sender to encrypt data using a single
can be considered similar to the notion of a cipher suite long-term passphrase Ŝ shared with a single recipient
in TLS [20]. Replacing any component of a suite (e.g., (e.g., out-of-band via a secure channel). The sender and
the group) results in a different cipher suite. recipient use an agreed-upon cipher suite defining the
scheme’s components. The sender first generates a fresh
symmetric session key K and computes the PURB pay-
3.1.2 Threat Model and Security Goals load as EK (data). The sender then generates a random
salt and derives the ephemeral secret S = PBKDF(salt, Ŝ).
We will consider two different types of computationally The sender next creates an entry point (EP ) containing
bounded adversaries: the session key K, the position of the payload and po-
tentially other metadata. The sender then encrypts the 3.4 Multiple Public Keys, Single Suite
EP using S. Finally, the sender concatenates the three
segments to form the PURB as shown in Figure 2. We often wish to encrypt a message to several recipients,
e.g., in multicast communication or mobile group chat.
We hence add support for encrypting one message under
salt ES (K k meta) EK (data)
multiple public keys that are of the same suite.
entry point payload As the first step, we adopt the idea of multi-recipient
public-key encryption [7, 34] where the sender gener-
Fig. 2. A PURB addressed to a single recipient and encrypted with
ates a single key pair and uses it to derive an ephemeral
a passphrase-derived ephemeral secret S.
secret with each of the intended recipients. The sender
creates one entry point per recipient. These entry points
contain the same session key and metadata but are en-
crypted with different ephemeral secrets.
3.3 Single Public Key, Single Suite As a PURB’s purpose is to prevent metadata leak-
age, including the number of recipients, a PURB cannot
We often prefer to use public-key cryptography, instead reveal how many entry points exist in the header. Yet
of pre-shared secrets, to establish secure communication a legitimate recipient needs to have a way to enumer-
or encrypt data at rest. Typically the sender or initiator ate possible candidates for her entry point. Hence, the
indicates in the file’s cleartext metadata which public primary challenge is to find a space-efficient layout of
key this file is encrypted for (e.g., in PGP), or else par- entry points—with no cleartext markers—such that the
ties exchange public-key certificates in cleartext during recipients are able to find their segments efficiently.
communication setup (e.g., in TLS). Both approaches Linear Table. The most space-efficient approach is to
generally leak the receiver’s identity. We address this place entry points sequentially. In fact, OpenPGP sug-
use case with a second strawman PURB encoding for- gests a similar approach for achieving better privacy [14,
mat that builds on the last by enabling the decryption Section 5.1]. However, in this case, decryption is inef-
of an entry point EP using a private key. ficient: the recipients have to attempt sequentially to
To expand our scheme to the public-key scenario, decrypt each potential entry point, before finding their
we adopt the idea of a hybrid asymmetric-symmetric own or reaching the end of the PURB.
scheme from the IES (see §2.2). Let (y, Y ) denote the
Fixed Hash Tables. A more computationally-efficient
recipient’s key pair. The sender generates an ephemeral
approach is to use a hash table of a fixed size. The sender
key pair (x, X), computes the ephemeral secret Z =
creates a hash table and places each encrypted entry
H(Y x ), then proceeds as before, except it encrypts K
point there, identifying the corresponding position by
and associated metadata with Z instead of S. The
hashing an ephemeral secret. Once all the entry points
sender replaces the salt in the PURB with her encoded
are placed, the remaining slots are filled with random
ephemeral public key Hide(X), where Hide(·) maps a
bit strings, hence a third-party is unable to deduce the
group element to a uniform random bit string. The re-
number of recipients. The upper bound, corresponding
sulting PURB structure is shown in Figure 3.
to the size of the hash table, is public information. This
approach, however, yields significant space overhead: in
Hide(X) EZ (K k meta) EK (data) the common case of a single recipient, all the unpopu-
encoded pk entry point payload lated slots are filled with random bits but still transmit-
ted. This approach also has the downside of imposing
Fig. 3. A PURB addressed to a single recipient that uses a public an artificial limit on the number of recipients.
key Y , where X is the public key of the sender and Z = H(Y x ) is
Expanding Hash Tables. We therefore include not
the ephemeral secret.
one but a sequence of hash tables whose sizes are consec-
utive powers of two. Immediately following the encoded
public key, the sender encodes a hash table of length
one, followed (if needed) by a hash table of length two,
one of length four, etc., until all the entry points are
placed. Unpopulated slots are filled with random bits.
To decrypt a PURB, a recipient decodes the public key
X, derives the ephemeral secret, computes the hash in- With r recipients, the worst-case compactness is hav-
dex in the first table (which is always zero), and tries ing r hash tables (if each insertion leads to a collision),
to decrypt the corresponding entry point. On failure, which happens with exponentially decreasing probabil-
the recipient moves to the second hash table, seeks the ity. The expected number of trial decryptions is log2 r.
correct position and tries again, and so on.
Definitions. We now formalize this scheme. Let r be
the number of recipients and (y1 , Y1 ), . . . , (yr , Yr ) be 3.5 Multiple Public Keys and Suites
their corresponding key pairs. The sender generates a
In the real world, not all data recipients’ keys might use
fresh key pair (x, X) and computes one ephemeral secret
the same cipher suite. For example, users might prefer
ki = H(Yi x ) per recipient. The sender uses a second hash
different key lengths or might use public-key algorithms
function Ĥ to derive independent encryption keys as
in different groups. Further, we must be able to intro-
Zi = Ĥ(“key” k ki ) and position keys as Pi = Ĥ(“pos” k
duce new cipher suites gradually, often requiring larger
ki ). Then the sender encrypts the data and creates r en-
and differently-structured keys and ciphertexts, while
try points EZ1 (K, meta), ..., EZr (K, meta). The position
preserving interoperability and compatibility with old
of an entry in a hash table j is (Pi mod 2j ). The sender
cipher suites. We therefore build on the above strawman
iteratively tries to place an entry point in HT0 (hash ta-
schemes to produce Multi-Suite PURB or MsPURB,
ble 0), then in HT1, and so on, until placement succeeds
which offers cryptographic agility by supporting the en-
(i.e., no collision occurs). If placement fails in the last
cryption of data for multiple different cipher suites.
existing hash table HTj, the sender appends another
When a PURB is multi-suite encrypted, the recipi-
hash table HT(j + 1) of size 2j+1 and places the entry
ents need a way to learn whether a given suite has been
point there. An example of a PURB encrypted for five
used and where the encoded public key of this suite is
recipients is illustrated in Figure 4.
located in the PURB. There are two obvious approaches
to enabling recipients to locate encoded public keys for
encoded pk HT0 HT1 HT2 payload multiple cipher suites: to pack the public keys linearly at
the beginning of a PURB, or to define a fixed byte posi-
Hide(X) EZ1 (K) EZ3 (K) EZ4 (K) EK (data)
tion for each cipher suite. Both approaches incur unde-
EZ2 (K) random sirable overhead. In the former case, the recipients have
EZ5 (K) to check all possible byte ranges, performing an expen-
sive public-key operation for each. The latter approach
random
results in significant space overhead and lack of agility,
Fig. 4. A PURB with hash tables of increasing sizes (HT0, HT1, as unused fixed positions must be filled with random
HT2). Five and two slots of the hash tables are filled with entry bits, and adding new cipher suites requires either assign-
points and random bit strings respectively. The metadata “meta” ing progressively larger fixed positions or compatibility-
in the entry points is omitted from the figure. Hash-table entries breaking position changes to existing suites.
are put one after another in the byte representation of a PURB.
Set of Standard Positions. To address this chal-
lenge, we introduce a set of standard byte positions
To decode, a recipient reads the public key; derives per suite. These sets are public and standardized for
the ephemeral secret ki , the encryption key Zi and the all PURBs. The set refers to positions where the suite’s
position key Pi ; and iteratively tries matching positions public key could be in the PURB. For instance, let us
in hash tables until the decryption of the entry point consider a suite PURB_X25519_AES128GCM_SHA256. We
succeeds. Although the recipient does not initially know can define—arbitrarily for now—the set of positions as
the number of hash tables in a PURB, the recipient {0, 64, 128, 1024}. As the length of the encoded pub-
needs to do only a single expensive public-key operation, lic key is fully defined by the suite (32 bytes here, as
and the rest are inexpensive symmetric-key decryption Curve25519 is used), the recipients will iteratively try
trials. In the worst case of a small message encrypted to decode a public key at [0:32), then [64:96), etc.
to many recipients, or a non-recipient searching for a If the sender wants to encode a PURB for two suites
nonexistent entry point, the total number of trial de- A and B, she needs to find one position in each set
cryptions required is logarithmic in the PURB’s size. such that the public keys do not overlap. For instance,
In the common case of a single recipient, only a sin- if setA = {0, 128, 256} and setB = {0, 32, 64, 128}, and
gle hash table of size 1 exists, and the header is compact. the public keys’ lengths are 64 and 32, respectively, one
possible choice would be to put the public key for suite the last encoded public key or hash table, and its start
A in [0:64), and the public key for suite B in [64:96). position is recorded in the meta in each entry point.
All suites typically have position 0 in their set, so that Decoding Efficiency. We have not yet achieved our
in the common case of a PURB encoded for only one decoding efficiency goal, however: the recipient must
suite, the encoded public key is at the beginning of the perform several expensive public-key operations for each
PURB for maximum space efficiency. Figure 5 illus- cipher suite, one for each potential position until the
trates an example encoding. With well-designed sets, correct position is found. We reduce this overhead to a
in which each new cipher suite is assigned at least one single public-key operation per suite by removing the
position not overlapping with those assigned to prior recipient’s need to know in which of the suite positions
suites, the sender can encode a PURB for any subset of the public key was actually placed. To accomplish this,
the suites. We address efficiency hereunder, and provide a sender XORs bytes at all the suite positions and places
a concrete example with real suites in Appendix B. the result into one of them. The sender first constructs
the whole PURB as before, then she substitutes the
encoded pkA HT0 HT1 HT2 payload bytes of the already-written encoded public key with
the XOR of bytes at all the defined suite positions (if
Hide(XA ) rnd Hide(XB ) EZ2 (K) EK (data)
they do not exceed the PURB length), which could even
EZ1 (K) random correspond to encrypted payload. To decode a PURB,
EZ3 (K) a recipient starts by reading and XORing the values at
all the positions defined for a suite. This results in an
random
encoded public key, if that suite was used in this PURB.
Fig. 5. Example of a PURB encoded for three public keys in two Encryption Flexibility. Although multiple cipher
suites (suite A and B). The sender generates one ephemeral key suites can be used in a PURB, so far these suites must
pair per suite (XA and XB ). In this example, XA is placed at the
agree on one payload encryption scheme, as a payload
first allowed position, and XB moves to the second allowed position
appears only once. To lift this constraint, we decou-
(since the first position is taken by suite A). Those positions are
public and fixed for each suite. HT0 cannot be used for storing an ple encryption schemes for entry points and payloads.
entry point, as XA partially occupies it; HT0 is considered “full” An entry-point encryption scheme is a part of a cipher
and the entry point is placed in subsequent hash tables - here HT1. suite, whereas a payload encryption scheme is indicated
separately in the metadata “meta” in each entry point.
Overlapping Layers. One challenge is that suites

might indicate different lengths for both their public 3.6 Non-malleability
keys and entry points. An encoder can easily accom-
modate this requirement by processing each suite used Our encoding scheme MsPURB so far ensures integrity
in a PURB as an independent logical layer. Concep- only of the payload and the entry point a decoder uses. If
tually, each layer is composed of the public key and the entry points of other recipients or random-byte fill-
the entry-point hash tables for the recipients that use ings are malformed, a decoder will not detect this. If an
a given suite, and all suites’ layers overlap. To place attacker obtains access to a decoding oracle, he can ran-
the layers, an encoder first initializes a byte layout for domly flip bits in an intercepted PURB, query the ora-
the PURB. Then, she reserves in the byte layout the po- cle on decoding validity, and learn the structure of the
sitions for the public keys of each suite used. Finally, PURB including the exact length of the payload. An ex-
she fills the hash tables of each suite with correspond- ample of exploiting malleability is the Efail attacks [42],
ing entry points. She identifies whether a given hash- which tamper with PGP- or S/MIME-encrypted e-mails
table slot can be filled by checking the byte layout; the to achieve exfiltration of the plaintext.
bytes might already be occupied by an entry point of To protect PURBs from undetected modification,
the same or a different suite or one of the public keys. we add integrity protection to MsPURB using a MAC
The hash tables for each suite start immediately after algorithm. A sender derives independent encryption
the suite public key’s first possible position. Thus, upon Kenc = Ĥ(“enc” k K) and MAC Kmac = Ĥ(“mac” k K)
reception of a PURB, a decoder knows exactly where to keys from the encapsulated key K, and uses Kmac to
start decryption trials. The payload is placed right after compute an authentication tag over a full PURB as
the final encoding step. The sender records the utilized
MAC algorithm in the meta in the entry points, along Algorithms HdrPURB.
with the payload encryption scheme that now does not HdrPURB.Encap(R, S) → (τ, k1 , . . . , kr ): Given a set of
need to be authenticated. The sender places the tag public keys R = {pk1 = Y1 , . . . , pkr = Yr } of a suite S:
at the very end of the PURB, which covers the entire (1) Pick a fresh x ∈ Zp and compute X = g x where
PURB including encoded public keys, entry point hash p, g are defined in S.
tables, payload ciphertext, and any padding required. (2) Derive k1 = H(Y1x ), . . . , kr = H(Yrx ).
Because the final authentication tag covers the en- (3) Map X to a uniform string τX = Hide(X).
tire PURB, the sender must calculate it after all other (4) Output an encoded public key τ = τX and
PURB content is finalized, including the XOR-encoding k1 , . . . , kr .
of all the suites’ public key positions. Filling in the tag HdrPURB.Decap(sk(S), τ ) → k: Given a private key
would present a problem, however, if the tag’s position sk = y of a suite S and an encoded public key τ :
happened to overlap with one of the public key positions (1) Retrieve X = Unhide(τ ).
of some cipher suite, because filling in the tag would cor- (2) Compute and output k = H(X y ).
rupt the suite’s XOR-encoded public key. To handle this
situation, the sender is responsible for ensuring that the Algorithms MsPURB.
authentication tag does not fall into any of the possible MsPURB.Setup(1λ ) → S: Initialize a cipher suite S =
public key positions for the cipher suites in use. hG, p, g, Hide(·), Π, H, Ĥi.
To encode a PURB, a sender prepares entry points, MsPURB.KeyGen(S) → (sk, pk): Given a suite S =
lays out the header, encrypts the payload, adds padding hG, p, g, . . .i, pick x ∈ Zp and compute X = g x . Out-
(see §4), and computes the PURB’s total length. If any put (sk = x, pk = X).
of the byte positions of the authentication tag to be ap- MsPURB.Enc(R, m) → c: Given a set of public keys of
pended overlap with public key positions, the sender in- an indicated suite R = {pk1 (S1 ), . . . , pkr (Sr )} and a
creases the padding to next bracket, until the public-key message m:
positions and the tag are disjoint. The sender proceeds (1) Pick an appropriate symmetric-key encryption
with XOR-encoding all suites’ public keys, and comput- scheme (Enc, Dec) with key length λK , a MAC
ing and appending the tag. Upon receipt of a PURB, algorithm MAC = (M, V), and a hash function
a decoder computes the potential public keys, finds and H0 : {0, 1}∗ → {0, 1}λK such that the key length
decrypts her entry point, learns the decryption scheme λK matches the security level of the most con-
and the MAC algorithm with the size of its tag. She then servative suite.
verifies the PURB’s integrity and decrypts the payload. (2) Group R into R1 , . . . , Rn , s.t. all public keys in a
group Ri share the same suite Si . Let ri = |Ri |.
(3) For each Ri :
3.7 Complete Algorithms (a) Run (τi , k1 , . . . , kri ) = HdrPURB.Encap(Ri , Si );
(b) Compute entry-point keys keysi = (Z1 =
We summarize the encoding scheme by giving detailed Ĥ(“key” k k1 ), . . . , Zri = Ĥ(“key” k kri ))
algorithms. We begin by defining helper HdrPURB al- and positions auxi = (P1 = Ĥ(“pos” k
gorithms that encode and decode a PURB header’s data k1 ), . . . , Pri = Ĥ(“pos” k kri )).
$
for a single cipher suite. We then use these algorithms (4) Pick K ← {0, 1}λK .
in defining the final MsPURB encoding scheme. (5) Record (Enc, Dec), MAC and H0 in meta.
Recall the notion of a cipher suite S = (6) Compute a payload key Kenc = H0 (“enc” k K)
hG, p, g, Hide(·), Π, H, Ĥi, where G is a cyclic group of and a MAC key Kmac = H0 (“mac” k K).
order p generated by g; Hide is a mapping: G → {0, 1}λ ; (7) Obtain cpayload = EncKenc (m).
Π = (E, D) is an authenticated-encryption scheme; and (8) Run c0 ← Layout(τ1 , . . . , τn , keys1 , . . . , keysn ,
H : G → {0, 1}2λ , Ĥ : {0, 1}∗ → {0, 1}2λ are two distinct aux1 , . . . , auxn , S1 , . . . , Sn , K, meta, cpayload ) (see
cryptographic hash functions. Let sk and pk be a private Algorithm 2 on page 26).
key and a public key, respectively, for hG, p, gi defined in (9) Derive an authentication tag σ = MKmac (c0 ) and
a cipher suite. We then define the full HdrPURB and output c = c0 k σ.
MsPURB algorithms as follows: MsPURB.Dec(sk(S), c) → m/⊥: Given a private key sk
of a suite S and a ciphertext c:
(1) Look up the possible positions of a public key 3.8 Practical Considerations
defined by S and XOR bytes at all the positions
to obtain the encoded public key τ . Cryptographic agility (i.e., changing the encryption
(2) Run k ← HdrPURB.Decap(sk, τ ). scheme) for the payload is provided by the metadata
(3) Derive Z = Ĥ(“key” k k) and P = Ĥ(“pos” k k). embedded in the entry points. For entry points them-
(4) Parse c as growing hash tables and, using the selves, we recall that the recipient uses trial-decryption
secret Z as the key, trial-decrypt the entries de- and iteratively tests suites from a known, public, or-
fined by P to obtain K k meta. If no decryption dered list. To add a new suite, it suffices to add it to
is successful, return ⊥. this list. With this technique, a PURB does not need
(5) Look up the hash function H0 , a MAC = (M, V) version numbers. There is, however, a trade-off between
algorithm and the length of MAC output tag σ the number of supported suites and the maximum de-
from meta. Parse c as hc0 k σi. Derive Kmac = cryption time. It is important that a sender follows the
H0 (“mac” k K) and run VKmac (c0 , σ). On failure, fixed order of the cipher suites during encoding because
return ⊥. a varying order might result in a different header length,
(6) Derive Kenc = H0 (“enc” k K), read the start and given the same set of recipients and sender’s ephemeral
the end of the payload from meta (it is written by keys, which could be used by an insider adversary.
Layout) to parse c0 as hhdr k cpayload k paddingi, If a nonce-based authenticated-encryption scheme is
and return DecKenc (cpayload ) where Dec is the used for entry points, a sender needs to include a distinct
payload decryption algorithm specified in meta. random nonce as a part of entry-point ciphertext (the
nonce of each entry point must be unique per PURB).
Theorem 1. If for each cipher suite S = hG, p, g, Some schemes, e.g., AES-GCM [9], have been shown to
Hide(·), Π, H, Ĥi used in a PURB we have that: the gap- retain their security when the same nonce is reused with
CDH problem is hard relative to G, Hide maps group different keys. When such a scheme is used, there can
elements in G to uniform random strings, Π is ind$- be a single global nonce to reuse by each entry point.
cca2-secure, and H, Ĥ and H0 are modeled as a ran- However, generalizing this approach of a global nonce
dom oracle; and moreover that MAC is strongly unforge- to any scheme requires further analysis.
able with its MACs being indistinguishable from ran-
Hardening Recipient Privacy. The given instantia-
dom, and the scheme for payload encryption (Enc, Dec)
tion of MsPURB provides recipient privacy only un-
is ind$-cpa-secure, then MsPURB is ind$-cca2-secure
der a chosen-plaintext attack. If information about de-
against an outsider adversary.
cryption success is leaked, an insider adversary could
learn identities of other recipients of a PURB by al-
Proof. See Appendix D.2.
tering the header, recomputing the MAC, and querying
Theorem 1 also implies that an outsider adversary can- candidates. A possible approach to achieving ind$-cca2
not break recipient privacy under an ind$-cca2 attack, recipient privacy is to sign a complete PURB using a
as long as the two possible sets of recipients N0 , N1 in- strongly existentially unforgeable signature scheme and
duce the same distribution on the length of a PURB. to store the verification key in each entry point, as simi-
larly done in the broadcast-encryption scheme by Barth
Theorem 2. If for each cipher suite S = hG, p, g, et al. [4]. This approach, however, requires adaptation to
Hide(·), Π, H, Ĥi, used in a PURB we have that: the gap- the multi-suite settings, and it will result in a significant
CDH problem is hard relative to G, Hide maps group increase of the header size and decrease in efficiency. We
elements in G to uniform random strings, Π is ind$- leave this question for future work.
cca2-secure, H and Ĥ are modeled as a random oracle,
Limitations. The MsPURB scheme above is not se-
and the order in which cipher suites are used for encod-
cure against quantum computers, as it relies on discrete
ing is fixed; then MsPURB is recipient-private against
logarithm hardness. It is theoretically possible to sub-
an ind$-cpa insider adversary.
stitute IES-based key encapsulation with a quantum-
Proof. See Appendix D.3. resistant variant to achieve quantum ind$-cca2 security.
The requirements for substitution are ind$-cca2 secu-
rity and compactness (it must be possible to securely
reuse sender’s public key to derive shared secrets with
multiple recipients). Furthermore, as MsPURB is non- bits (i.e., information) in its mantissa than in its ex-
interactive, they do not offer forward secrecy. ponent. This constraint limits information leakage to
Simply by looking at the sizes (of the header for at most double that of conservatively padding to the
a malicious insider, or the total size for a malicious next power of two, while reducing overhead through
outsider), an adversary can infer a bound on the to- logarithmically-increasing precision for larger objects.
tal number of recipients. We partially address this with Many defenses already exist for specific scenarios,
padding in §4. However, no reasonable padding scheme e.g., against website fingerprinting [21, 58]. Padmé does
can perfectly hide this information. If this is a problem not attempt to compete with tailored solutions in their
in practice, we suggest adding dummy recipients. domains. Instead, Padmé aims for a substantial increase
Protecting concrete implementations against timing in application-independent length leakage protection as
attacks is a highly challenging task. The two following a generic measure of security/privacy hygiene.
properties are required for basic hardening. First, the
implementations of PURBs should always attempt to
decrypt all potential entry points using all the recipi- 4.1 Design Criterion
ent’s suites. Second, decryption errors of any source as
well as inability to recover the payload should be pro- We design Padmé again using intermediate strawman
cessed in constant time and always return ⊥. approaches for clarity. To compare these straightforward
alternatives with our proposal, we define a game where
an adversary guesses the plaintext behind a padded en-
crypted blob. This game is inspired by related work such
4 Limiting Leakage via Length as defending against a perfect attacker [58].
The encoding scheme presented above in §3 produces Padding Game. Let P denote a collection of plaintext
blobs of data that are indistinguishable from random objects of maximum length M : e.g., data, documents, or
bit-strings of the same length, thus leaking no infor- application data units. An honest user chooses a plain-
mation to the adversary directly via their content. The text p ∈ P , then pads and encodes it into a PURB
length itself, however, might indirectly reveal informa- c. The adversary knows almost everything: all possible
tion about the content. Such leakage is already used ex- plaintexts P , the PURB c and the parameters used to
tensively in traffic-analysis attacks, e.g., website finger- generate it, such as schemes and number of recipients.
printing [21, 39, 56, 57], video identification [43, 44, 50], The adversary lacks only the private inputs and decryp-
and VoIP traffic fingerprinting [15, 61]. Although so- tion keys for c. The adversary’s goal is to guess the
lutions involving application- or network-level padding plaintext p based on the observed PURB c of length |c|.
are numerous, they are typically designed for a specific Design Goals. Our goal in designing the padding func-
problem domain, and the more basic problem of length- tion is to manage both space overhead from padding and
leaking ciphertexts remains. In any practical solution, maximum information leaked to the adversary.
some leakage is unavoidable. We show, however, that
typical approaches such as padding to the size of a block
cipher are fundamentally insufficient for efficiently hid- 4.2 Definitions
ing the plaintext length effectively, especially for plain-
texts that may vary in size by orders of magnitude. Overhead. Let c be a padded ciphertext resulting from
We introduce Padmé, a novel padding scheme de- PURB-encoding plaintext p. For simplicity we focus
signed for, though not restricted to, encoding PURBs. here purely on overhead incurred by padding, by assum-
Padmé reduces length leakage for a wide range of en- ing an unrealistic, “perfectly-efficient” PURB encoding
crypted data types, ensuring asymptotically lower leak- that (unlike MsPURB) incurs no space overhead for en-
age of O(log log M ), rather than O(log M ) for com- cryption metadata. We define the additive overhead of
mon stream- and block-cipher-encrypted data. Padmé’s |c| over |p| to be |c|−|p|, the number of extra bytes added
space overhead is moderate, always less than 12% and by padding. The multiplicative overhead of padding is
decreasing with file size. The intuition behind Padmé |c|−|p|
|p| , the relative fraction by which |c| expands |p|.
is to pad objects to lengths representable as limited-
Leakage. Let P be a finite space of plaintexts of max-
precision floating-point numbers. A Padmé length is
imum length M . Let f : N → N be a padding function
constrained in particular to have no more significant
that yields the padded size |c| given a plaintext length
|p|, for p ∈ P . The image of f is a set R of padded overhead, whereas for larger files, blocks are larger and
lengths that f can produce from plaintexts p ∈ P . group more plaintext lengths together, improving leak-
We quantify the leakage of padding function f in terms age asymptotically. A simple approach is to pad plain-
of the number of elements in R. More precisely, we define texts into buckets bi of size varying as a power of some
the leakage as the number of bits (amount of informa- base, e.g., two, so bi = 2i . The padding function is thus
tion entropy) required to distinguish a unique element of f (L) = 2dlog2 Le . We call this strawman NextP2.
R, which is dlog2 |R|e. Intuitively, a function that pads Because NextP2 pads plaintexts of maximum
everything to a constant size larger than all plaintexts length M into at most dlog2 M e buckets, the image R
(e.g., f (p) = 1 Tb) leaks no information to the adver- of f contains only O(log M ) elements. This represents
sary, because |R| = 1 (and observing |c| = 1 Tb leaks only O(log log M ) bits of entropy or information leakage,
no information about the plaintext), whereas more fine- a major asymptotic improvement over fixed-size blocks.
grained padding functions leak more bits. The maximum overhead is substantial, however, al-
most +100%: e.g., a 17 GB Blu-Ray movie would be
padded into 32 GB. Using powers of another base x > 2,
4.3 Strawman Padding Approaches we reduce leakage further at a cost of more overhead:
e.g., padding to the nearest power of 3 incurs overhead
We first explore two strawman designs, based on differ- up to +200%, with less leakage but still O(log log M ).
ent padding functions f . A padding function that offers We could reduce overhead by using a fractional base
any useful protection cannot be one-to-one, otherwise 1 < x < 2, but fractional exponents are cumbersome in
the adversary could trivially invert it and recover |p|. practical padding functions we would prefer to be sim-
We also exclude randomized padding schemes for simple and operate only on integers. Although this second
plicity, and because in practice adversaries can typically strawman succeeds in achieving asymptotically lower
cancel out and defeat random padding factors statisti- leakage than padding to fixed-size blocks, it is less at-
cally over many observations. Therefore, only padding tractive in practice due to high overhead when x ≥ 2
functions that group many plaintext lengths into fewer and due to computation complexity when 1 < x < 2.
padded ciphertexts are of interest in our analysis.
Strawman 1: Fixed-Size Blocks. We first consider
a padding function f (L) = b · dL/be, where b is a block 4.4 Padmé
size in bytes. This is how objects often get “padded” by
We now describe our padding scheme Padmé, which
default in practice, e.g., in block ciphers or Tor cells.
limits information leakage about the length of the plain-
In this case, the PURB’s size is a multiple of b, the
text for wide range of encrypted data sizes. Similarly
maximum additive overhead incurred is b − 1 bytes, and
to the previous strawman, Padmé also asymptotically
the leakage is dlog2 M/be = O(log M ), where M is the
leaks O(log log M ) bits of information, but its overhead
maximum plaintext size.
is much lower (at most 12% and decreasing with L).
In practice, when plaintext sizes differ by orders
of magnitude, there is no good value for b that serves Intuition. In NextP2, any permissible padded length
all plaintexts well. For instance, consider b = 1 MB. L has the form L = 2n . We can therefore represent L as
Padding small files and network messages would in- a binary floating-point number with a blog2 nc + 1-bit
cur a large overhead: e.g., padding Tor’s 512 B cells exponent and a mantissa of zero, i.e., no fractional bits.
to 1 MB would incur overheads of 2000×. In contrast, In Padmé, we similarly represent a permissible
padding a 700 MB movie with at most 1 MB of chaff padded length as a binary floating-point number, but
would add only a little confusion to the adversary, as we allow a non-zero mantissa at most as long as the ex-
this movie may still be readily distinguishable from oth- ponent (see Figure 6). This approach doubles the num-
ers by length. To reduce information leakage asymp- ber of bits used to represent an allowed padded length
totically over a vast range of cleartext sizes, therefore, – hence doubling absolute leakage via length – but al-
padding must depend on plaintext size. lows for more fine-grained buckets, reducing overhead.
Padmé asymptotically leaks the same number of bits as
Strawman 2: Padding to Powers of 2. The next
NextP2, differing only by a constant factor of 2, but
step is to pad to varying-size blocks, which is the basis
reduces space overhead by almost 10× (from +100% to
for our actual scheme. The intuition is that for small
+12%). More importantly, the multiplicative expansion
plaintexts, the blocks are small too, yielding modest
overhead decreases with L (see Figure 7).
blog2 nc + 1-bit exponent 0-bit mantissa exponent: it is “too precise” and therefore not a permit-
ted padded length. The value 10 is permitted, however,
In the strawman NextP2, the allowed length L = 2n can be rep-
so a 9 byte-long ciphertext is padded to 10 bytes.
resented as a binary floating-point number with a blog(n) + 1c bits
of exponent and no mantissa.
To understand why Padmé requires the low E − S
bits to be 0, notice that forcing all the last E bits to 0 is
equivalent to padding to a power of two. In comparison,
blog2 nc + 1-bit exponent blog2 nc + 1-bit mantissa
Padmé allows S extra bits to represent the padded size,
Fig. 6. Padmé represents lengths as floating-point numbers, allow- with S defined as the bit length of the exponent.
ing the mantissa to be of at most blog2 nc + 1 bits. Algorithm 1 specifies the Padmé function precisely.
Algorithm 1: Padmé
100 PadMé Data: length of content L
Next power of 2
80 Result: length of padded content L0
padding overhead [%]
E ← blog2 Lc // L’s floating-point exponent

60
S ← blog2 Ec + 1 // # of bits to represent E
40 z ←E−S // # of low bits to set to 0
20
m ← (1 z) − 1 // mask of z 1’s in LSB
// round up using mask m to clear last z bits
0
L0 ← (L + m) & ∼m
103 104 105 106
original size L [B]
Fig. 7. Maximum multiplicative expansion overhead with respect

to the plaintext size L. The naïve approach to pad to the next Leakage and Overhead. By design, if the maximum
power of two has a constant maximum overhead of 100%, whereas plaintext size is M , Padmé’s leakage is O(log log M )
Padmé’s maximum overhead decreases with L, following 2 log1 L . bits, the length of the binary representation of the
2
largest plaintext. As we fix E −S bits to 0 and round up,
the maximum overhead is 2E−S − 1. We can estimate
the maximum multiplicative overhead as follows:
Algorithm. To compute the padded size L0 = f (L),
ensuring that its floating-point representation fits in at 2E−S − 1 2E−S
max overhead = <
most 2 × blog2 nc + 1 bits, we require the last E − S bits L L
blog Lc−blog log 2 Lc−1
of L0 to be 0. E = blog2 Lc is the value of the exponent, 2 2 2
≈
and S = blog2 Ec + 1 is the size of the exponent’s binary L
representation. The reason for the substraction will be- 1
≈
come clear later. For now, we demonstrate how E and 2 · 2log2 log2 L
1
S are computed in Table 1. = (1)
2 log2 L
Recall that Padmé requires the mantissa’s bit
length to be no longer than that of the exponent. In Ta- Thus, Padmé’s maximum multiplicative overhead
ble 1, for the value L = 9 the mantissa is longer than the decreases with respect to the file size L. The maximum
overhead is +11.11%, when padding a 9-byte file into 10
Table 1. The IEEE floating-point representations of 8, 9 and 10. bytes. For bigger files, the overhead is smaller.
The value 8 has 1 bit of mantissa (the initial 1 is omitted), and On Optimality. There is no clear sweet spot on the
2 bits of exponents; 9 has a 3-bits mantissa and a 2-bit exponent,
leakage-to-overhead curve. We could easily force the last
while the value 10 as 2 bits of mantissa and exponents. Padmé 1
enforces the mantissa to be no longer than the exponent, hence 9 2 (E − S) bits to be 0 instead of the last E − S bits, for
gets rounded up to the next permitted length 10. example, to reduce overhead and increase leakage. Still,
what matters in practice is the relationship between L
L L E S IEEE representation and the overhead. We show in §5.3 how this choice per-
8 0b1000 3 2 0b1.0 * 2^0b11 forms with various real-world datasets.
9 0b1001 3 2 0b1.001 * 2^0b11
10 0b1010 3 2 0b1.01 * 2^0b11
PGP variant as a comparison for PURBs, which also

5 Evaluation does not indicate key IDs in the header but uses a more
efficient structure. The hidden PGP variant still leaks
Our evaluation is two-fold. First, we show the perfor-
the cipher suites used, the total length, and other plain-
mance and overhead of the PURB encoding and decod-
text markers (version number, etc.).
ing. Second, using several datasets, we show how Padmé
facilitates hiding information about data length.
5.2.1 Methodology
5.1 Implementation
We ran the encoding experiments on a consumer-grade
laptop, with a quad-core 2.2 GHz Intel Core i7 processor
We implemented a prototype of the PURB encoding
and 16 GB of RAM, using Go 1.12.5. To compare with
and padding schemes in Go. The implementation follows
an OpenPGP implementation, we use and modify Key-
the algorithms in §3.7, and it consists of 2 kLOC. Our
base’s fork3 of the default Golang crypto library4 , as the
implementation relies on the open-source Kyber library1
fork adds support for the ECDH scheme on Curve25519.
for cryptographic operations. The code is designed to be
We further modify Keybase’s implementation to
easy to integrate with existing applications. The code
add the support for the anonymized OpenPGP scheme.
is still proof-of-concept, however, and has not yet gone
All the encoding experiments use a PURB suite based
through rigorous analysis and hardening, in particular
on the Curve25519 elliptic-curve group, AES128-GCM
against timing attacks.
for entry point encryption and SHA256 for hashing. We
Reproducibility. All the datasets, the source code for also apply the global nonce optimization, as discussed in
PURBs and Padmé, as well as scripts for reproducing §3.8. For experiments needing more than one suite, we
all experiments, are available in the main repository2 . use copies the above suite to ensure homogeneity across
timing experiments. The payload size in each experi-
ment is 1 KB. For each data point, we generate a new
5.2 Performance of the PURB Encoding set of keys, one per recipient. We measure each data
point 20 times, using fresh randomness each time, and
The main question we answer in the evaluation of the depict the median value and the standard deviation.
encoding scheme is whether it has a reasonable cost, in
terms of both time and space overhead, and whether it
scales gracefully with an increasing number of recipients 5.2.2 Results
and/or cipher suites. First, we measure the average CPU
time required to encode and decode a PURB. Then,
Encoding Performance. In this experiment, we first
we compare the decoding performance with the perfor-
evaluate how the time required to encode a PURB
mance of plain and anonymized OpenPGP schemes de-
changes with a growing number of recipients and cipher
scribed below. Finally, we show how the compactness of
suites, and second, how the main computational com-
the header changes with multiple recipients and suites,
ponents contribute to this duration. We divide the total
as a percentage of useful bits in the header.
encoding time into three components. The first is au-
Anonymized PGP. In standard PGP, the identity— thenticated encryption of entry points. The second is
more precisely, the public key ID—of the recipient is the generation and Elligator encoding of sender’s public
embedded in the header of the encrypted blob. This keys, one per suite. A public key is derived by multi-
plaintext marker speeds up decryption, but enables a plying a base point with a freshly generated private key
third party to enumerate all data recipients. In the so- (scalar). If the resultant public key is not encodable,
called anonymized or “hidden” version of PGP [14, Sec- which happens in half of the cases, a new key is gen-
tion 5.1], this key ID is substituted with zeros. In this erated. Point multiplication dominates this component,
case, the recipient sequentially tries the encrypted en- constituting ≈ 90% of the total time. The third is the
tries of the header with her keys. We use the hidden
1 https://github.com/dedis/kyber 3 https://github.com/keybase/go-crypto
2 https://github.com/dedis/purb 4 https://github.com/golang/crypto
104
EncHeader 1 suite PGP standard
KeyGen 3 suites PGP hidden
103 SharedSecrets 10 suites 102 PURBs flat
Total time
Decoding time [ms]

PURBs standard
102
CPU time [ms]
101
1
10
optimization
Assembly-
100 100
10−1
10−1
10−2
1 3 10 100 100 101 102 103 104
Number of Recipients Number of Recipients
(a) The CPU cost of encoding a PURB given the number of recip- (b) The worst-case CPU cost of decoding for PGP, PGP with hidden
ients and of cipher suites. EncHeader: encryption of entry points; recipients, PURBs without hash tables (flat), and standard PURBs.
KeyGen: generation and hiding of public keys; SharedSecrets: com-
putation of shared secrets.
Fig. 8. Performance of the PURBs encoding.
generation only depends on a number of cipher suites

Percentage of useful bits in the header [%]
100 used. The encoding is arguably efficient for most cases

of communication, as even with hundred recipients and
80 ten suites, the time for creating a PURB is 235 ms.
Decoding Performance. We measure the worst-case
60 CPU time required to decipher a standard PGP mes-
sage, a PGP message with hidden recipients, a flat
40 PURB that has a flat layout of entry points with-
1 Suite out hash tables, and a standard PURB. We use the
20 3 Suites Curve25519 suite in all the PGP and PURB schemes.
10 Suites Figure 8b shows the results. The OpenPGP library
0 uses the assembly-optimized Go elliptic library for point
100 101 102
Number of Recipients multiplication, hence the multiplication takes ≈ 0.05–
0.1 ms there, while it takes ≈ 2–3 ms in Kyber. This
Fig. 9. Compactness of the PURB header (% of non-random bits). results in a significant difference in absolute values for
small numbers of recipients. But our primary interest is
derivation of a shared secret with each recipient, essen- the dynamics of total duration. The time increase for
tially a single point-multiplication per recipient. Other anonymous PGP is linear because, in the worst case, a
significant components of the total encoding duration decoder has to derive as many shared secrets as there are
are payload encryption, MAC computation and layout recipients. PURBs in contrast exhibit almost constant
composition. We consider cases using one, three or ten time, requiring only a single multiplication regardless of
cipher suites. When more than one cipher suite is used, the number of recipients. A decoder still has to perform
the recipients are equally divided among them. multiple entry-point trial decryptions, but one such op-
Figure 8a shows that in the case of a single recipi- eration would account for only ≈ 0.3% of the total time
ent, the generation of a public key and the computation in the single-recipient, single-suite scenario. The advan-
of a shared secret dominate the total time and both tage of using hash tables, and hence logarithmically less
take ≈ 2 ms. As expected, computing shared secrets symmetric-key operations, is illustrated by the differ-
starts dominating the total time when the number of ence between PURBs standard and PURBs flat, which
recipients grows, whereas the duration of the public-key
is noticeable after 100 recipients and will become more Table 2. Datasets used in the evaluation of anonymity provided by
pronounced if point multiplication is optimized. Padmé.
Dataset # of objects
Header Compactness. Compared with placing the
Ubuntu packages 56,517
header elements linearly, our expanding hash table de- YouTube videos 191,250
sign is less compact, but enables more efficient decod- File collections 3,027,460
ing. Figure 8b shows an example of this trade-off, PGP Alexa top 1M Websites 2,627
hidden versus PURBs standard.
In Figure 9, we show the compactness, or the per-
centage of the PURB header that is filled with actual instance. As packages can be referenced in multiple
data, with respect to the number of recipients and cipher repositories, we filtered the list by name and architec-
suites. Not surprisingly, an increasing number of recip- ture. The reason for padding Ubuntu software updates
ients and/or suites increases the collisions and reduces is that the knowledge of updates enables a local eaves-
compactness: 45% for 100 recipients and 1 suite, 36% for dropper to build a list of packages and their versions
100 recipients and 10 suites. In the most common case of that are installed on a machine. If some of the packages
having one recipient in one suite, however, the header is are outdated and have known vulnerabilities, an adver-
perfectly compact. Finally, there is a trade-off between sary might use it as an attack vector. A percentage of
compactness and efficient decryption. We can easily in- software updates still occurs over un-encrypted connec-
crease compactness by resolving entry point hash table tions, which is still an issue; but encrypted connections
collisions linearly, instead of directly moving to the next to software-update repositories also expose which dis-
hash table. The downside is that the recipient has more tribution and the kind of update being done (security
entry points to try. / restricted5 / multiverse6 / etc). We hope that this
unnecessary leakage will disappear in the near future.
The YouTube dataset contains 191,250 unique
5.3 Performance of Padmé Padding videos, obtained by iteratively querying the YouTube
API. One semantic video is generally represented by
In evaluating a padding scheme, one important met- 2 − 5 .webm files, which corresponds to various video
ric is overhead incurred in terms of bits added to the qualities. Hence, each object in the dataset is a unique
plaintexts. By design, Padmé’s overhead is bounded (video, quality) pair. We use this dataset as if the videos
1
by 2·log . As discussed in §4.4, Padmé does not es- were downloaded in bulk rather than streamed; that
2L
cape the typical overhead-to-leakage trade-off, hence is, we pad the video as a single file. The argument
Padmé’s novelty does not lie in this tradeoff. Rather, for padding YouTube videos as whole files is that, as
the novelty lies in the practical relation between L and shown by related work [43, 44, 50], variable-bitrate en-
the overhead. Padmé’s overhead is moderate, at most coding combined with streaming leak which video is be-
+12% and much less for large PURBs. ing watched. If YouTube wanted to protect the privacy
A more interesting question is how effectively, given of its users, it could re-encode everything to constant-
an arbitrary collection of plaintexts P , Padmé hides bitrate encoding and still stream it, but then the total
which plaintext is padded. Padmé was designed to work length of the stream would still leak information. Alter-
with an arbritrary collection of plaintexts P . It remains natively, it could adopt a model similar to that of the
to be seen how Padmé performs when applied to a spe- iTunes store, where videos have variable bit-rate but
cific set of plaintexts P , i.e., with a distribution coming are bulk-downloaded; but again, the total downloaded
from the real world, and to establish how well it groups length would leak information, requiring some padding.
files into sets of identical length. In the next section, we Hence, we explore how unique the YouTube videos are
experiment with four datasets made of various objects: a by length with and without padding.
collection of Ubuntu packages, a set of YouTube videos, The files dataset was constituted by collecting
a set of user files, and a set of Alexa Top 1M websites. the file sizes in the home directories (‘~user/’) of 10
co-workers and contains 3,027,460 of both personal files
and configuration files. These files were collected on ma-
5.3.1 Datasets and Methodology
The Ubuntu dataset contains 56,517 unique packages, 5 Contains proprietary software and drivers.
parsed from the official repository of a live Ubuntu 16.04 6 Contains software restricted by copyright.
100 alexa For all these datasets, despite containing very differ-
youtube
80 files home
ent objects, a large percentage of objects have a unique
ubuntu packages size: 87% in the case of YouTube video (Figure 11a),
Percentile
60 45% in the case of files (Figure 11b), 83% in the case

of Ubuntu packages (Figure 11c), and 68% in the case
40
of Websites Figure 11d). These characteristics persist in
20 traditional block-cipher encryption (blue dashed curves)
where objects are padded only to a block size. Even af-
0 ter being padded to 512 bytes, the size of a Tor cell,
102 104 106 108 1010 1012 most object sizes remain as unique as in the unpadded
Size of objects [bits]
case. We observe similar results when padding to 256
Fig. 10. Distribution of the sizes of the objects in each dataset.
bits, the typical block size for AES (not plotted).
NextP2 (red dotted curves) provides the best
chines running Fedora, Arch, and Mac OS X. The argu- anonymity: in the YouTube and Ubuntu datasets (Fig-
ment for analyzing the uniqueness of those files is not ures 11a and 11c), there is no single object that remains
to encrypt each file individually – there is no point in unique with respect to its size; all belong to groups of at
hiding the metadata of a file if the file’s location exposes least 10 objects. We cannot generalize this statement, of
everything about it, e.g. ‘~user/.ssh’ – but rather to course, as shown by the other two datasets (Figures 11b
quantify the privacy gain when padding those objects. and 11d). In general, we see a massive improvement with
Finally, the Alexa dataset is made of 2,627 web- respect to the unpadded case. Recall that this padding
sites from the Alexa Top 1M list. The size of each web- scheme is impractically costly, adding +100% to the size
site is the sum of all the resources loaded by the web- in the worst case and +50% in mean. In Table 3, we see
page, which has been recorded by piloting a ‘chrome- that the mean overhead is of +45%.
headless’ instance with a script, mimicking real brows- Finally, we see the anonymity provided by Padmé
ing. One reason for padding whole websites – as op- (green solid curves). By design, Padmé has an accept-
posed to padding individual resources – is that related able maximum overhead (maximum +12% and decreas-
work in website fingerprinting showed the importance ing). In three of the four datasets, there is a constant dif-
of the total downloaded size [21]. The effectiveness of ference between our expensive reference point NextP2
Padmé when padding individual resources, or for in- and Padmé; despite having a decreasing overhead with
stance bursts [58], is left as interesting future work. respect to L, unlike NextP2. This means that although
larger files have proportionally less protection (i.e., less
padding in percentage) with Padmé, this is not critical,
5.3.2 Evaluation of Padmé as these files are more rare and are harder to protect effi-
ciently, even with a naïve and costly approach. When we
The distribution of the objects sizes for all the datasets observe the percentage of uniquely identifiable objects
is shown in Figure 10. Intuitively, it is harder for an effi- (objects that trivially reveal their plaintext given our
cient padding scheme to build groups of same-sized files perfect adversary), we see a significant drop by using
when there are large objects in the dataset. Therefore, Padmé: from 83% to 3% for the Ubuntu dataset, from
we expect the last 5% to 10% of the four datasets to 87% to 3% for the Youtube dataset, from 45% to 8%
remain somewhat unique, even after padding. for the files dataset and from 68% to 6% for the Alexa
For each dataset, we analyze the anonymity set size dataset. In Table 3, we see that the mean overhead of
of each object. To compute this metric, we group ob- Padmé is around 3%, more than an order of magnitude
jects by their size, and report the distribution of the smaller than NextP2. We also see how using a fixed
sizes of these groups. A large number of small groups block size can yield high overhead in percentage, in ad-
indicate that many objects are easily identifiable. For dition to insufficient protection.
each dataset, we compare three different approaches:
the NextP2 strawman, Padmé, and padding to a fixed
block size of 512B, like a Tor cell. The anonymity met-
rics are shown in Figure 11, and the respective overheads
are shown in Table 3.
(a) Dataset ‘YouTube’:

Table 3. Analysis of the overhead, in percentage, of various padding
approaches. In the first column, we use b = 512B as block size.
100
80 Dataset Fixed block size Next power of 2 Padmé

YouTube 0.01 44.12 2.23
Percentile
60 files 40.15 44.18 3.64

Ubuntu 14.09 43.21 3.12
40 Alexa 36.71 47.12 3.07
Unpadded
20 Tor cell (512B)
Padmé
Next power of 2
0
100 101 102 103 104
6 Related Work
Anonymity set size
(b) Dataset ‘files’: The closest related work PURBs build on is Broadcast
Encryption [4, 13, 19, 22, 24], which formalizes the se-
100 curity notion behind a ciphertext for multiple recipi-
80
ents. In particular, the most relevant notion in (Private)
Broadcast Encryption is Recipient Privacy [4], in which
Percentile
60 an adversary cannot tell whether a public key is a valid

40
recipient for a given ciphertext. PURBs goes further
Unpadded by enabling multiple simultaneous suites, while achiev-
20 Tor cell (512B)
Padmé
ing indistinguishably from random bits in the ind$-cca2
0
Next power of 2 model. PURBs also addresses size leakage.
100 101 102 103 104 105 Traffic morphing [62] is a method for hiding the
Anonymity set size traffic of a specific application by masking it as traffic
(c) Dataset ‘Ubuntu’: of another application and imitating the corresponding
packet distribution. The tools built upon this method
100
can be standalone [55] or use the concept of Tor plug-
80 gable transport [37, 59, 60] that is applied to prevent-
ing Tor traffic from being identified and censored [12].
Percentile
60
There are two fundamental differences with PURBs.
40 First, PURBs focus on a single unit of data; we do
Unpadded
not yet explore the question of the time distribution
20 Tor cell (512B)
Padmé of multiple PURBs. Second, traffic-morphing systems,
Next power of 2 in most cases, try to mimic a specific transport and
0
100 101 102 103 104 sometimes are designed to only hide the traffic of one
Anonymity set size
given tool, whereas PURBs are universal and arguably
(d) Dataset ‘Alexa’: adaptable to any underlying application. Moreover, it
has been argued that most traffic-morphing tools do
100
not achieve unobservability in real-world settings due
80 to discrepancies between their implementations and the
systems that they try to imitate, because of the un-
Percentile
60
covered behavior of side protocols, error handling, re-
40 sponses to probing, etc. [23, 29, 54]. We believe that for
Unpadded
Tor cell (512B) a wide class of applications, using pseudo-random uni-
20
Padmé form blobs, either alone or in combination with other
Next power of 2
0 lower-level tools, is a potential solution in a different
100 101 102
Anonymity set size
direction.
Traffic analysis aims at inferring the contents of
Fig. 11. Analysis of the anonymity provided by various padding encrypted communication by analyzing metadata. The
approaches: NextP2, Padmé, padding with a constant block size most well-studied application of it is website fingerprint-
and no padding. We measure for each object with how many other ing [21, 39, 56, 57], but it has also been applied to video
objects it becomes indistinguishable after being padded, and plot identification [43, 44, 50] and VoIP traffic [15, 61]. In
the distribution. NextP2 provides better anonymity, at the cost of
a drastically higher overhead (at most +100% instead of +12%).
Overheads are shown in Table 3.
website fingerprinting over Tor, research has repeatedly us to improve this paper. We also thank Enis Ceyhun
showed that the total website size is the feature that Alp, Cristina Basescu, Kelong Cong, Philipp Jovanovic,
helps an adversary the most [16, 21, 38]. In particular, Apostolos Pyrgelis and Henry Corrigan-Gibbs for their
Dyer et al. [21] show the necessity of padding the whole helpful comments and suggestions, and Holly B. Cogliati
website, as opposed to individual packets, to prevent for text editing. This project was supported in part by
an adversary from identifying a website by its observed grant #2017-201 of the Strategic Focal Area “Personal-
total size. They also systematized the existing padding ized Health and Related Technologies (PHRT)” of the
approaches. Wang et al. [58] propose deterministic and ETH Domain and by grants from the AXA Research
randomized padding strategies tailored for padding Tor Fund, Handshake, and the Swiss Data Science Center.
traffic against a perfect attacker, which inspired our §4.
Finally, Sphinx [18] is an encrypted packet format
for mix networks with the goal of minimizing the infor-
mation revealed to the adversary. Sphinx shares similar-
References
ities with PURBs in its binary format (e.g., the pres- [1] Ring-road: Leaking sensitive data in security protocols.
ence of a group element followed by a ciphertext). Un- http://www.ringroadbug.com/.
like PURBs, however, it supports only one cipher suite, [2] Michel Abdalla, Mihir Bellare, and Phillip Rogaway. The Or-
and one direct recipient (but several nested ones, due to acle Diffie-Hellman Assumptions and an Analysis of DHIES.
the nature of mix networks). To the best of our knowl- In Cryptographers’ Track at the RSA Conference, pages
143–158, 2001.
edge, PURBs is the first solution that hides all meta-
[3] Diego F Aranha, Pierre-Alain Fouque, Chen Qian, Mehdi
data while providing cryptographic agility. Tibouchi, and Jean-Christophe Zapalowicz. Binary Elligator
Squared. In International Workshop on Selected Areas in
Cryptography, pages 20–37, 2014.
[4] Adam Barth, Dan Boneh, and Brent Waters. Privacy in
7 Conclusion Encrypted Content Distribution Using Private Broadcast
Encryption. In International Conference on Financial Cryp-
Conventional encrypted data formats leak information, tography and Data Security, pages 52–64, 2006.
via both unencrypted metadata and ciphertext length, [5] Tal Be’ery and Amichai Shulman. A Perfect CRIME? Only
TIME Will Tell. Black Hat Europe, 2013.
that may be used by attackers to infer sensitive informa-
[6] Mihir Bellare, Alexandra Boldyreva, Anand Desai, and David
tion via techniques such as traffic analysis and website Pointcheval. Key-Privacy in Public-Key Encryption. In
fingerprinting. We have argued that this metadata leak- Advances in Cryptology – ASIACRYPT 2001, pages 566–
age is not necessary, and as evidence have presented 582, 2001.
PURBs, a generic approach for designing encrypted [7] Mihir Bellare, Alexandra Boldyreva, Kaoru Kurosawa, and
Jessica Staddon. Multi-Recipient Encryption Schemes: Effi-
data formats that do not leak anything at all, except for
cient Constructions and Their Security. IEEE Transactions
the padded length of the ciphertexts, to anyone with- on Information Theory, 53(11):3927–3943, 2007.
out the decryption keys. We have shown that despite [8] Mihir Bellare and Chanathip Namprempre. Authenticated
having no cleartext header, PURBs can be efficiently Encryption: Relations among Notions and Analysis of the
encoded and decoded, and can simultaneously support Generic Composition Paradigm. Journal of Cryptology,
multiple public keys and cipher suites. Finally, we have 21(4):469–491, 2008.
[9] Mihir Bellare and Björn Tackmann. The Multi-user Security
introduced Padmé, a padding scheme that reduces the
of Authenticated Encryption: AES-GCM in TLS 1.3. In
length leakage of ciphertexts and has a modest overhead Annual International Cryptology Conference, pages 247–276,
decreasing with file size. Padmé performs significantly 2016.
better than classic padding schemes with fixed block size [10] Daniel J Bernstein, Mike Hamburg, Anna Krasnova, and
in terms of anonymity, and its overhead is asymptoti- Tanja Lange. Elligator: Elliptic-curve points indistinguish-
able from uniform random strings. In ACM Conference on
cally lower than using exponentially increasing padding.
Computer and Communications Security, CCS ’13, 2013.
[11] Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. Ar-
gon2: New Generation of Memory-Hard Functions for Pass-
word Hashing and Other Applications. Technical report,
Acknowledgments 2015.
[12] Tor Blog. Tor at the heart: Bridges and pluggable trans-
We are thankful to our anonymous reviewers and our ports. https://blog.torproject.org/tor-heart-bridges-and-
meticulous proof shepherd Markulf Kohlweiss for their pluggable-transports, Dec 2016.
constructive and thorough feedback that has helped
[13] Dan Boneh, Craig Gentry, and Brent Waters. Collusion nications. In IEEE Symposium on Security and Privacy, S&P
Resistant Broadcast Encryption with Short Ciphertexts and ’13, pages 65–79, 2013.
Private Keys. In Advances in Cryptology – CRYPTO, pages [30] IDRIX. Veracrypt. https://www.veracrypt.fr/en/Home.html.
258–275, 2005. [31] Jonathan Katz and Yehuda Lindell. Introduction to modern
[14] J. Callas, L. Donnerhacke, H. Finney, D. Shaw, and cryptography. CRC press, 2014.
R. Thayer. OpenPGP Message Format. RFC 4880, Nov [32] John Kelsey. Compression and information leakage of plain-
2007. text. In International Workshop on Fast Software Encryp-
[15] Yu-Chun Chang, Kuan-Ta Chen, Chen-Chi Wu, and Chin- tion, Lecture Notes in Computer Science, pages 263–276,
Laung Lei. Inferring speech activity from encrypted Skype 2002.
traffic. In IEEE Global Telecommunications Conference, [33] Hugo Krawczyk. Cryptographic extraction and key deriva-
GLOBECOM, pages 1–5, 2008. tion: The HKDF scheme. In Annual Cryptology Conference,
[16] Giovanni Cherubin, Jamie Hayes, and Marc Juarez. Website pages 631–648, 2010.
Fingerprinting Defenses at the Application Layer. In Pri- [34] Kaoru Kurosawa. Multi-recipient public-key encryption with
vacy Enhancing Technologies Symposium, PETS ’17, pages shortened ciphertext. In International Workshop on Public
2:186–2:203, 2017. Key Cryptography, pages 48–63, 2002.
[17] George Danezis and Richard Clayton. Introducing Traffic [35] Stevens Le Blond, Chao Zhang, Arnaud Legout, Keith Ross,
Analysis. 2007. and Walid Dabbous. I Know Where You Are and What
[18] George Danezis and Ian Goldberg. Sphinx: A Compact You Are Sharing: Exploiting P2P Communications to Invade
and Provably Secure Mix Format. In IEEE Symposium on Users’ Privacy. In ACM SIGCOMM Conference on Internet
Security and Privacy, S&P ’09, pages 269–282, 2009. Measurement Conference, IMC ’11, 2011.
[19] Cécile Delerablée. Identity-Based Broadcast Encryption with [36] Jonathan Mayer, Patrick Mutchler, and John C. Mitchell.
Constant Size Ciphertexts and Private Keys. In International Evaluating the privacy properties of telephone meta-
Conference on the Theory and Application of Cryptology data. Proceedings of the National Academy of Sciences,
and Information Security, pages 200–215, 2007. 113(20):5536–5541, 2016.
[20] T. Dierks and E. Rescorla. The Transport Layer Security [37] Hooman Mohajeri Moghaddam, Baiyu Li, Mohammad Der-
(TLS) Protocol Version 1.2. RFC 5246, Aug 2008. akhshani, and Ian Goldberg. SkypeMorph: Protocol Obfus-
[21] Kevin P Dyer, Scott E Coull, Thomas Ristenpart, and cation for Tor Bridges. In ACM Conference on Computer
Thomas Shrimpton. Peek-a-Boo, I Still See You: Why Effi- and Communications Security, CCS ’12, pages 97–108,
cient Traffic Analysis Countermeasures Fail. In IEEE Sym- 2012.
posium on Security and Privacy, S&P ’12, pages 332–346, [38] Rebekah Overdorf, Mark Juarez, Gunes Acar, Rachel Green-
2012. stadt, and Claudia Diaz. How Unique is Your .onion?: An
[22] Nelly Fazio and Irippuge Milinda Perera. Outsider- Analysis of the Fingerprintability of Tor Onion Services. In
Anonymous Broadcast Encryption with Sublinear Cipher- ACM Conference on Computer and Communications Secu-
texts. In International Workshop on Public Key Cryptogra- rity, CCS ’17, pages 2021–2036, 2017.
phy, pages 225–242, 2012. [39] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and
[23] Sergey Frolov and Eric Wustrow. The use of TLS in Cen- Thomas Engel. Website fingerprinting in onion routing based
sorship Circumvention. In Network and Distributed System anonymization networks. In ACM Workshop on Workshop
Security (NDSS) Symposium, 2019. on Privacy in the Electronic Society, pages 103–114, 2011.
[24] Craig Gentry and Brent Waters. Adaptive Security in Broad- [40] Jeffrey Pang, Ben Greenstein, Ramakrishna Gummadi, Srini-
cast Encryption Systems (with Short Ciphertexts). In An- vasan Seshan, and David Wetherall. 802.11 User Finger-
toine Joux, editor, Annual International Conference on the printing. In ACM International Conference on Mobile Com-
Theory and Applications of Cryptographic Techniques, pages puting and Networking, MobiCom ’07, pages 99–110, 2007.
171–188, 2009. [41] Colin Percival. Stronger key derivation via sequential
[25] Yoel Gluck, Neal Harris, and Angelo Prado. BREACH: reviv- memory-hard functions. Self-published, pages 1–16, 2009.
ing the CRIME attack. Black Hat USA, 2013. [42] Damian Poddebniak, Christian Dresen, Jens Müller, Fabian
[26] B. Greschbach, G. Kreitz, and S. Buchegger. The devil Ising, Sebastian Schinzel, Simon Friedberger, Juraj So-
is in the metadata 2014 – New privacy challenges in De- morovsky, and Jörg Schwenk. Efail: Breaking S/MIME and
centralised Online Social Networks. In IEEE International OpenPGP Email Encryption using Exfiltration Channels. In
Conference on Pervasive Computing and Communications USENIX Security Symposium, USENIX ’18, 2018.
Workshops, pages 333–339, March 2012. [43] Andrew Reed and Benjamin Klimkowski. Leaky streams:
[27] Dominik Herrmann, Rolf Wendolsky, and Hannes Feder- Identifying variable bitrate DASH videos streamed over en-
rath. Website Fingerprinting: Attacking Popular Privacy crypted 802.11n connections. In IEEE Consumer Commu-
Enhancing Technologies with the Multinomial Naïve-bayes nications & Networking Conference (CCNC), pages 1107–
Classifier. In ACM Workshop on Cloud Computing Security, 1112, 2016.
CCSW ’09, pages 31–42, 2009. [44] Andrew Reed and Michael Kranch. Identifying HTTPS-
[28] P. Hoffman and J. Schlyter. The DNS-Based Authentication protected Netflix videos in real-time. In ACM Conference on
of Named Entities (DANE) Transport Layer Security (TLS) Data and Application Security and Privacy, pages 361–368,
Protocol: TLSA. RFC 6698, August 2012. 2017.
[29] Amir Houmansadr, Chad Brubaker, and Vitaly Shmatikov. [45] E. Rescorla. The Transport Layer Security (TLS) Protocol
The parrot is dead: Observing unobservable network commu- Version 1.3. RFC 8446, Aug 2018.
[46] Ivan Ristić. HTTP client fingerprinting using ssl handshake [63] Fan Zhang, Wenbo He, Xue Liu, and Patrick G. Bridges.
analysis. https://blog.ivanristic.com/2009/06/http-client- Inferring Users’ Online Activities Through Traffic Analysis.
fingerprinting-using-ssl-handshake-analysis.html, Jun 2009. In ACM Conference on Wireless Network Security, WiSec
[47] Tom Ritter and Daniel Kahn Gillmor. Protecting the TLS ’11, pages 59–70, 2011.
Handshake. IETF Interim, May 2014. [64] Philip R. Zimmermann. The Official PGP User’s Guide. MIT
[48] Juliano Rizzo and Thai Duong. The CRIME attack. Press, Cambridge, MA, USA, 1995.
Ekoparty, 2012.
[49] Phillip Rogaway. Nonce-based symmetric encryption. In
International Workshop on Fast Software Encryption, pages
348–358, 2004. A Layout
[50] Roei Schuster, Vitaly Shmatikov, and Eran Tromer. Beauty
and the Burst: Remote Identification of Encrypted Video
Algorithm 2 presents the Layout algorithm a sender
Streams. In USENIX Security Symposium, USENIX ’17,
pages 1357–1374, 2017. uses in step (8) of MsPURB.Enc. Layout arranges
[51] Mehdi Tibouchi. Elligator squared: Uniform points on elliptic PURB’s components in a continuous byte array.
curves of prime order as uniform random strings. In Inter-
Notation. We denote by a[i : j] ← b, the operation of
national Conference on Financial Cryptography and Data
Security, pages 139–156, 2014.
copying the bits of b at the positions a[i], a[i+1], · · · a[j−
[52] Thiago Valverde. Bad life advice - replay attacks against 1]. When written like this, b always has correct length of
https. http://blog.valverde.me/2015/12/07/bad-life- j − i bits, and we assume i < j. If, before an operation
advice/, Dec 2015. a[i : j] ← b, |a| < j, we first grow a to length j. We
[53] Guido Vranken. HTTPS Bicycle Attack. sometimes write a[i :] ← b instead of a[i : |b|] ← b.
https://guidovranken.com/2015/12/30/https-bicycle-
We use a “reservation array”, which is an array with a
attack/, Dec 2015.
[54] Liang Wang, Kevin P Dyer, Aditya Akella, Thomas Risten- method array.isFree(start,end) that returns True if and
part, and Thomas Shrimpton. Seeing through network- only if none of the bits array[i], array[i+1], · · · array[j−1]
protocol obfuscation. In ACM Conference on Computer and were previously assigned a value, and False otherwise.
Communications Security, CCS ’15, 2015.
[55] Qiyan Wang, Xun Gong, Giang TK Nguyen, Amir
Houmansadr, and Nikita Borisov. Censorspoofer: asymmetric
communication using IP spoofing for censorship-resistant B Positions for Public Keys
web browsing. In ACM Conference on Computer and Com-
munications Security, pages 121–132, 2012.
This section provides an example of possible sets of al-
[56] Tao Wang and Ian Goldberg. Improved Website Fingerprint-
ing on Tor. In ACM Workshop on Workshop on Privacy in lowed public key positions for the suites in the PURB
the Electronic Society, pages 201–212, 2013. encoding. We emphasize that finding an optimal set of
[57] Tao Wang and Ian Goldberg. On realistically attacking Tor positions was not the focus of this work. The intention
with website fingerprinting. In Privacy Enhancing Technolo- is merely to show that such sets exist and to offer a
gies Symposium, PETS ’16, pages 4:21–4:36, 2016.
concrete example (which is used for the compactness
[58] Tao Wang and Ian Goldberg. Walkie-Talkie: An Efficient
Defense Against Passive Website Fingerprinting Attacks. In
experiment, Figure 9).
USENIX Security Symposium, USENIX ’17, pages 1375– Example. We use the required and recommended suites
1390, 2017. in the latest draft of TLS 1.3 [45] as an example of suites
[59] Zachary Weinberg, Jeffrey Wang, Vinod Yegneswaran,
a PURB could theoretically support. The suites and
Linda Briesemeister, Steven Cheung, Frank Wang, and
Dan Boneh. StegoTorus: a camouflage proxy for the Tor groups are shown in Table 4.
anonymity system. In ACM Conference on Computer and The PURB concept of “suite” combines both
Communications Security, pages 109–120, 2012. “suite” and “group” in TLS. For instance, a PURB
[60] Philipp Winter, Tobias Pulls, and Juergen Fuss. Scramble- suite could be PURB_AES_128_GCM_SHA_256-
Suit: A polymorphic network protocol to circumvent cen-
_SECP256R1. We show possible PURB suites in Ta-
sorship. In ACM Workshop on Workshop on Privacy in the
Electronic Society, pages 213–224, 2013.
ble 6. For the sake of simplicity, we introduce aliases in
[61] Charles V Wright, Lucas Ballard, Fabian Monrose, and Ger- the table, and will further refer to those suites as suite
ald M Masson. Language identification of encrypted VoIP A-F. In Table 5, we show a possible assignment. For in-
traffic: Alejandra y Roberto or Alice and Bob? In USENIX stance, if only suites A and C are used, the public key
Security Symposium, USENIX ’07, pages 43–54, 2007. for A would be placed in [0, 64], while value in [96, 160]
[62] Charles V. Wright, Scott E. Coull, and Fabian Monrose.
is changed so that the XOR of [0, 64] and [96, 160] equals
Traffic Morphing: An Efficient Defense Against Statistical
Traffic Analysis. In Network and Distributed Security Sym- the key for B. Note that a sender must respect the suite
posium, pages 237–250, 2009.
Algorithm 2: Layout
// τi is an encoded public key of a suite Si // fill empty space in the layout with random bits
// keysi = hZ1 , . . . , Zr i are entry-point keys 36 foreach start, end < layout.end do
// auxi = hP1 , . . . , Pr i are entry-point positions 37 if layout.isFree(start, end) then

$
// SuiteAllowedPositions are public values 38 layout[start:end] ← {0, 1}end-start
Input : hτ1 , . . . , τn i, hkeys1 , . . . , keysn i, haux1 , . . . , auxn i, 39 end
hS1 , . . . , Sn i, K, meta, cpayload , SuiteAllowedPositions 40 end
Output: byte[]
// place the payload just past the header layout
// determine public-key positions for each suite 41 meta.payload_start = |layout|;
1 layout = []; // public-key and entry-point assignments 42 meta.payload_end = |layout| + |cpayload |;
2 pubkey_pos = []; // chosen primary position per suite // fill entry-point reservations with ciphertexts
3 pubkey_fixed = []; // all positions fixed so far 43 foreach keysi in hkeys1 , . . . , keysn i do
4 foreach τi in hτ1 , . . . , τn i do 44 while keysi not empty do
// decide suite’s primary public key position 45 Z = keysi .pop();
5 for pos ∈ SuiteAllowedPositions(Si ) do 46 hstart, end, Si ← entrypoints.pop();
6 if pubkey_fixed.isFree(pos.start, pos.end) then // Encrypt an entry point
7 pubkey_pos.append(hτi , posi); 47 e ← EZ (K k meta);
8 layout[pos.start:pos.end] ← τi ; 48 layout[start:end] ← e;
9 break; 49 end
10 end 50 end
11 end
// compute the padding and append it to layout
// later suites cannot modify these positions 51 purb_len ← Padmé (|layout| + |cpayload | + mac_len);
// without disrupting this suite’s XOR 52 mac_pos ← purb_len - mac_len;
12 for pos ∈ SuiteAllowedPositions(Si ) do 53 while not pubkey_fixed.isFree(mac_pos, purb_len) do
13 pubkey_fixed[pos.start:pos.end] ← ‘F’; // MAC mustn’t overlap public-key positions:
14 end // if so, we pad to the next Padmé size
15 end 54 purb_len ← Padmé (purb_len + 1);
// reserve entry-point positions in hash tables 55 mac_pos ← purb_len - mac_len;
16 entrypoints = []; 56 end
17 foreach auxi in haux1 , . . . , auxn i do 57 padding_len ← mac_pos - meta.payload_end;
18 while auxi not empty do $ padding_len

58 padding ← {0, 1} ; // random padding
19 P ← auxi .pop(); 59 layout.append(cpayload k padding);
20 ht_len = 1; // length of current hash table
// XOR suites’ public key positions into primary
21 ht_pos = 0; // position of this hash table
60 for (τi , pos) ∈ pubkey_pos do
22 while True do
61 buffer = τi ;
23 index = P mod ht_len; // selected entry
62 for altpos ∈ SuiteAllowedPositions(Si ) do
24 start = ht_pos + index * entrypoint_len; 63 buffer = buffer ⊕ layout[altpos.start : altpos.end];
25 end = start + entrypoint_len; 64 end
26 if layout.isFree(start, end) then 65 layout[pos.start:pos.end] ← buffer;
$
layout[start:end] ← {0, 1}end-start ;
L
27 // now SuiteAllowedPositions(Si ) = τi
28 entrypoints.append(hstart, end, Si i); 66 end
29 break;
67 return layout
30 end
// if not free, double table size
31 ht_pos += ht_len * entrypoint_len;
32 ht_len *= 2;
33 end
34 end
35 end
Table 4. Suites and groups described in the latest draft of TLS 1.3.
D Security Proofs
Symmetric/Hash Algorithms
TLS_AES_128_GCM_SHA256 Required This section contains the proofs of the security proper-
TLS_AES_256_GCM_SHA384 Recommended ties provided by MsPURB.
TLS_CHACHA20_POLY1305_SHA256 Recommended
TLS_AES_128_CCM_SHA256 Optional
TLS_AES_128_CCM_8_SHA256 Optional
Key Exchange Groups D.1 Preliminaries
secp256r1 Required
x25519 Recommended Before diving into proving the security of our scheme,
secp384r1 Optional we define what it means to be ind-cca2- and ind$-cca2-
secp521r1 Optional secure for the primitives that MsPURB builds upon.
x448 Optional
ffdhe2048 Optional Key-Encapsulation Mechanism (KEM). Follow-
ffdhe3072 Optional ing the definition from Katz & Lindell [31], we begin by
ffdhe4096 Optional defining KEM as a tuple of PPT algorithms.
ffdhe6144 Optional
ffdhe8192 Optional Syntax KEM.
KEM.Setup(1λ ) → S: Given a security parameter λ, ini-
Table 5. Example of Allowed Positions per suite. Here, the algo- tialize a cipher suite S.
rithm simply finds any mapping so that each suite can coexist in a KEM.KeyGen(S) → (sk, pk): Given a cipher suite S, gen-
PURB. The receiver must XOR the values at all possible positions erate a (private, public) key pair.
of a suite to obtain an encoded public key.. KEM.Encap(pk) → (c, k): Given a public key pk, output
a ciphertext c and a key k.
Suite Possible positions
KEM.Decap(sk, c) → k/⊥: Given a private key sk and
A {0}
B {0, 64} a ciphertext c, output a key k or a special symbol ⊥
C {0, 96} denoting failure.
D {0, 32, 64, 160}
E {0, 64, 128, 192} Consider an ind-cca2 security game against an adaptive
F {0, 32, 64, 96, 128, 256} adversary A:
Game KEM.
order A-F during encoding. We provide a simple python The KEM ind-cca2 game for a security parameter λ is
script to design such sets in the code repository. between a challenger and an adaptive adversary A. It
proceeds along the following phases.
Init: The challenger and adversary take λ as input. The
adversary outputs a cipher suite S it wants to attack.
C Default Schemes for Payload The challenger verifies that S is a valid cipher suite,
i.e., that it a valid output of KEM.Setup(1λ ). The chal-
In addition to PURB suites, a list of suitable candidates $
lenger aborts, and sets b? ← {0, 1} if S is not valid.
for a payload encryption scheme (Enc, Dec), a MAC al-
Setup: The challenger runs (sk, pk) ← KEM.KeyGen(S)
gorithm MAC, and a hash function H0 must be deter-
and gives pk to A.
mined and standardized. This list can be seamlessly
Phase 1: A can make decapsulation queries qDecap(c)
updated with time, as an encoder makes the choice
with ciphertexts c of its choice, to the challenger who
and records it in meta on per-PURB basis. The cho-
responds with KEM.Decap(sk, c).
sen schemes are shared by all the suites included in
Challenge: The challenger runs (c? , k0 ) ←
the PURB, hence these schemes must match the se- $
curity level of the suite with the highest bit-wise se- KEM.Encap(pk) and generates k1 ← {0, 1}|k0 | . The
$
curity. An example of suitable candidates, given the challenger picks b ← {0, 1} and sends hc? , kb i to A.
suites from Table 6, is (Enc, Dec) = AES256-CBC, Phase 2: A continues querying qDecap(c) with the re-
MAC = HMAC-SHA384, and H0 = SHA3-384. striction that c 6= c? .
Guess: A outputs its guess b? for b and wins if b? = b.
Table 6. PURB Suites. “Suite A” is a shorthand for the first suite.
Alias PURB Suite |Public key| [B] |EntryPoint| [B]

A PURB_AES_128_GCM_SHA_256_SECP256R1 64 48
B PURB_AES_128_GCM_SHA_256_X25519 32 48
C PURB_AES_256_GCM_SHA_384_SECP256R1 64 80
D PURB_AES_256_GCM_SHA_384_X25519 32 80
E PURB_CHACHA20_POLY1305_SHA_256_SECP256R1 64 64
F PURB_CHACHA20_POLY1305_SHA_256_X25519 32 64
We define A’s advantage in this game as: MSBE.Dec(sk, c) → m/⊥: Given a private key sk and
the ciphertext c, return a message m or ⊥ if c does
Advcca2 λ ?

1
KEM,A (1 ) = 2 Pr[b = b ] − 2 . not decrypt correctly.

We say that a KEM is ind-cca2-secure if Advcca2 λ

KEM,A (1 ) Note that MsPURB as described in §3.7 satisfies the
is negligible in the security parameter.
syntax of a multi-suite broadcast encryption scheme.
Barth et al. [4] define the security of broadcast en-
Definition 3. We that a KEM is perfectly correct if
cryption schemes under adaptive chosen-chiphertext at-
for all (sk, pk) ← KEM.KeyGen(S) and for all (c, k) ←
tack for single-suite schemes. Here, we adjust this defini-
KEM.Encap(pk) we have k = KEM.Decap(sk, c).
tion to the multi-suite setting, and instead require that
Instantiation IES-KEM. the ciphertext is indistinguishable from a random string
We instantiate a KEM based on the Integrated Encryp- (ind$-cca2).
tion Scheme [2] (see §2.2 for details). Game MSBE.
IES.Setup(1λ ): Initialize a cipher suite S = hG, p, g, Hi, The MSBE ind$-cca2 game for a security parameter λ
where G is a cyclic group of order p and generated by is between a challenger and an adversary A. It proceeds
g, and H : G → {0, 1}2λ is a hash function. along the following phases.
IES.KeyGen(S): Pick x ∈ Zp , compute X = g x , and out- Init: The challenger and adversary take λ as input.
put (sk = x, pk = X). The adversary outputs a number of recipients r and
IES.Encap(pk): Given pk = Y , pick x ∈ Zp , compute corresponding cipher suites S1 , . . . , Sr it wants to at-
X = g x , and output hc = X, k = H(Y x )i. tack. Let s be the number of unique cipher suites.
IES.Decap(sk, c): Given sk = y and c = X, output a key The challenger verifies, for each i ∈ {1, . . . , r}, that Si
k = H(X y ). is a valid cipher suite, i.e., that it is a valid output
of MSBE.Setup(1λ ). The challenger aborts, and sets
Theorem 4 (Theorem 11.22 [31] and Section 7 [2]). If $
b? ← {0, 1} if the suites are not all valid.
the gap-CDH problem is hard relative to G, and H is Setup: The challenger generates private-public key
modeled as a random oracle, then IES-KEM is an ind- pairs for each recipient i given by A by run-
cca2-secure KEM. ning (ski , pki ) ← MSBE.KeyGen(Si ) and gives R =
{pk1 , . . . , pkr } to A.
Multi-Suite Broadcast Encryption. We consider Phase 1: A can make decryption queries qDec(pki , c)
MsPURB as a multi-suite broadcast encryption to the challenger for any pki ∈ R and any ci-
(MSBE) scheme extending the single-suite setting by phertext c of its choice. The challenger replies with
Barth et al. [13]. MSBE.Dec(ski , c).
Syntax MSBE. Challenge: A outputs m? . The challenger generates
$
MSBE.Setup(1λ ) → S: Given a security parameter λ, c0 = MSBE.Enc(R, m? ) and c1 ← {0, 1}|c0 | . The chal-
$
initialize a cipher suite S. lenger picks b ← {0, 1} and sends c? = cb to A.
MSBE.KeyGen(S) → (sk, pk): Given a cipher suite S, Phase 2: A continues making decryption queries
generate a (private, public) key pair. qDec(pki , c) with a restriction that c 6= c? .
MSBE.Enc(R, m) → c: Given a set of public keys Guess: A outputs its guess b? for b and wins if b? = b.
R = {pk1 , . . . , pkr } with corresponding cipher suites
S1 , . . . , Sr and a message m, generate a ciphertext c.
We define A’s advantage in this game as: We say that the tags of a MAC scheme are indistin-
guishable from random if Advind$ λ
MAC,A (1 ) is negligible in
Advcca2-out λ ?

1
msbe,A (1 ) = 2 Pr[b = b ] − 2 .

the security parameter.
We say that a MSBE scheme is ind$-cca2-secure if
Advcca2-out λ
msbe,A (1 ) is negligible in the security parameter.
Finally, we require that the MAC scheme is strongly D.2 Proof of Theorem 1
unforgeable under an adaptive chosen-message attack
and outputs tags that are indistinguishable from ran- We prove the ind$-cca2 security of MsPURB as
dom. A MAC scheme is given by the algorithms an MSBE scheme. More precisely, we will show that
MAC.KeyGen, M, and V, where MAC.KeyGen(1λ ) out- there exists adversaries B1 , . . . , B5 such that
puts a key Kmac . To compute a tag on the message
m, run σ = MKmac (m). The verification algorithm Advcca2-out λ cca2 λ
msbe,A (1 ) ≤ r AdvKEM,B1 (1 ) + AdvΠ,B2
ind$-cca2 λ
(1 ) +
VKmac (m, σ) outputs > if σ is a valid tag on the mes- Advsuf λ ind$ λ
MAC,B3 (1 ) + AdvMAC,B4 (1 )+
sage m and ⊥ otherwise. We formalize the strong un-
Advind$-cpa (1λ ).
forgeability and indistinguishability properties using the (Enc,Dec),B5
following simple games.

Thus, given our assumptions, Advcca2-out λ
msbe,A (1 ) is indeed
Game MAC-sforge. negligible in λ. To do so we use a sequence of games.
The MAC-sforge game for a security parameter λ is be- This sequence of games step by step transforms from the
tween a challenger and an adversary A. situation where b = 0 in the ind$-cca2 game of MSBE,
Setup: The challenger and adversary take λ as in- i.e., the adversary receives the real ciphertext, to b = 1,
put. The challenger generates a MAC key Kmac ← i.e., the adversary receives a random string.
MAC.KeyGen(1λ ).
Game G0 .
Challenge: The adversary A is given oracle access to
This game is as the original MSBE ind$-cca2 game
the oracles M(·) and V(·). On a query M(m) the chal-
where b = 0.
lenger returns σ = MKmac (m). On a query V(m, σ)
the challenger returns VKmac (m, σ).
Output: A eventually outputs a message-tag pair Game G1 .
(m, σ). A wins if VKmac (m, σ) = 1 and A has not As in G0 , but the challenger will no longer call
made a query M(m) that returned σ. HdrPURB.Decap to derive the keys ki on ciphertexts
derived from the challenge ciphertext c? . In particu-
We define A’s advantage in this game as:
lar, for every recipient pki using a suite Sj , we store
Advsuf λ
MAC,A (1 ) = Pr[A wins]. (Xj? , ki? ) when constructing the PURB headers for the
We say that a MAC scheme is strongly unforgeable un- challenge ciphertext. Then, when receiving a decryp-
der adaptive chosen-message attacks if Advsuf λ tion query for a recipient qDec(pki (Sj ), c), we proceed
MAC,A (1 ) is
negligible in the security parameter. by following MsPURB.Dec. If the encoded public key
τ recovered in step (1) of MsPURB.Dec is such that
Game MAC-IND$.
Unhide(τ ) = Xj? , then we use ki = ki? (as stored when
The MAC-IND$ game is between a challenger and an
creating the challenge ciphertext) directly, rather than
adversary A.
computing ki = HdrPURB.Decap(yi , τ ) in step (3) of
Setup: The challenger and adversary take λ as in-
MsPURB.Dec. If the encoded public key τ does not
put. The challenger generates a MAC key Kmac ←
$
match Xj? , then the challenger proceeds as before.
MAC.KeyGen(1λ ) and picks a bit b ← {0, 1}.
Challenge: The adversary outputs a message m. Game G2 .
The challenger computes σ0 = MKmac (m) and As in G1 , but we change how the keys k1? , . . . , kr? for the
$
σ1 ← {0, 1}|σ0 | and returns σb . challenge ciphertext are computed in HdrPURB.Encap.
Output: The adversary outputs its guess b? of b, and Rather than computing ki? = H(Yix ) as in step (2) of
$
wins if b? = b. HdrPURB.Encap, we set ki? ← {0, 1}λH for all the keys,
where λH is the bit-length of the corresponding hash
We define A’s advantage in this game as: function H. Recall that as per the changes in G1 , the
Advind$ λ ? challenger will store ki? generated in this way, and use

1
MAC,A (1 ) = 2 Pr[b = b ] − 2 .

them directly (without calling HdrPURB.Decap) when Proof. Let Wi be the event that A outputs b? = 1 in
asked to decrypt variants of the challenge ciphertext. game Gi . We aim to show that
Advcca2-out λ ? ?

Game G3 . msbe,A (1 ) = Pr[b = 1 | b = 0] − Pr[b = 1 | b = 1]

Let ei be the encrypted entry point under key Zi (de- = Pr[W0 ] − Pr[W7 ]
rived from ki ) for recipient i computed in line 47 of
Layout (step (8) of MsPURB.Enc). The game goes is negligible. To do so, we show that each of the steps in
the sequence of games is negligible, i.e., that Pr[Wi ] −
as in G2 , but for the challenge ciphertext, the chal-
lenger saves the mapping of the challenge entry points Pr[Wi+1 ] is negligible. The result then follows from the

and the encapsulated key K ? with metadata meta? : triangle inequality.

(e?i , ki? , K ? k meta? ). If the challenger receives a decryp- G0 <–> G1 .
tion query qDec(pki (Si ), c) it proceeds as before, except As long as the KEMs are perfectly correct, the games
when it should decrypt e?i using key ki? in step (4) of G0 and G1 are identical. Therefore:
MsPURB.Dec. In that case, it acts as if the decryption
Pr[W0 ] − Pr[W1 ] = 0.
returned K ? k meta? .
Game G4 . G1 <–> G2 .
As in G3 , but the challenger replaces e?1 , . . . , e?r in the We show that the games G1 and G2 are indistinguish-
challenge ciphertext with random strings of the appro- able using a hybrid argument on the number of recipi-
priate length. Note that per the change in G3 , the chal- ents r. Consider the hybrid games Hi where the first i re-
lenger will not try to decrypt these e?i , but will recover cipients use random keys k1 , . . . , ki as in G2 , whereas the
K ? and meta? directly instead. remaining r − i recipients use the real keys ki+1 , . . . , kr
as in G1 . Then G1 = H0 and G2 = Hr .
Game G5 . We prove that A cannot distinguish Hj−1 from Hj .
As in G4 , but the challenger replies differently to Let Sj = hG, p, g, Hide(·), Π, H, Ĥi, be the suite corre-
the queries qDec(pki (Si ), c) where c is not equal the sponding to recipient j. Suppose A can distinguish Hj−1
challenge ciphertext c? but the encoded public key from Hj , then we can build a distinguisher B against
τ recovered in step (1) of MsPURB.Dec is such that the ind$-cca2 security of the IES KEM for the suite
Unhide(τ ) = Xj? and ei = e?i . In this case, the challenger Sj0 = hG, p, g, Hi. Recall that B receives, from its ind$-
replies with ⊥ directly, without running VKmac (·) (step cca2-KEM challenger,
(5) of MsPURB.Dec). – a public key Y ;
$
– a challenge hX ? , k ? i, where depending on bit b ←
Game G6 . ?
{0, 1}, we have k ? = H(Y x ) if b = 0 or k ? ←
$
As in G5 , but the challenger replaces the integrity tag {0, 1}λH if b = 1 (where λH is the bit-length of H);
in the challenge ciphertext in step (9) of MsPURB.Enc – access to a Decap(·) oracle for all but X ? .
with a random string of the same length. At the start of the game, B will set pkj = Y , so that
the public key of recipient j matches that of its IES
Game G7 .
KEM challenger. Note that B does not know the cor-
As in G6 , but the challenger replaces the encrypted pay-
responding private key yj . For all other recipients i, B
load cpayload in the challenge ciphertext in step (7) of
sets (ski = yi , pki = Yi ) = MsPURB.KeyGen(Si ).
MsPURB.Enc with a random string of the same length.
The distinguisher B will use its challenge (X ? , k ? )
to construct the challenge ciphertext for A. In particu-
Conclusion. As of G7 , all ciphertexts in the PURBs lar, when running HdrPURB.Encap for a suite Sj , it sets
header, the payload encryption and the MAC have been X = X ? in step (1) of HdrPURB.Encap. Moreover, for
replaced by random strings. The open slots in the hash recipient j it will use kj = k ? . For all other recipients i
tables are always filled with random bits. Finally, the with corresponding suites Si it proceeds as follows when
encoded keys τ = Hide(X) are indistinguishable from computing ki in HdrPURB.Encap.
random strings as well, since the keys X are random. $
– If i < j, then it sets ki ← {0, 1}λH for appropriate
Therefore, the PURB ciphertexts c are indeed indistin-
λH ;
guishable from random strings, as in the MSBE game
– If i > j and the suite Si for user i is the same as
with b = 1.
suite Sj for user j, then it sets ki = H(X ? yi ); and
– If i > j, but Sj 6= Si , then it computes ki as per random key of the ind$-cca2 challenger. For other users
steps (1) and (2) of HdrPURB.Encap. i it proceeds as follows:
Thereafter, B continues running MsPURB.Enc as before. – If i < j, it sets e?i to a random string of appropriate
Whenever B receives a decryption query for a user length.
pki , it proceeds as before. When it receives a decryption – If i > j, it computes e?i as per line 47 of Layout.
query for user pkj , it uses its IES-KEM Decap oracle in Thereafter, B answers decryption queries as before.
step (2) of HdrPURB.Decap. Note that B is not allowed Except that whenever, B derives key kj for user j, it will
to call Decap(·) on X ? , but as per the changes in G1 , use its decryption oracle DZ (·). Note that in particular,
it will directly use k ? for user pkj if HdrPURB.Decap because of the changes in G3 , B will not make DZ (·)
recovers X ? in step (1). queries on e?i from the challenge ciphertext c? .
If b = 0 in B’s IES KEM challenge, then recipient If b = 0, B simulates Hj−1 , and if b = 1, it simulates
?
j’s key kj = H(Y x ), and hence B perfectly simulates Hj . Therefore, if A distinguishes between Hj−1 and Hj ,
Hj−1 . If b = 1 in B’s IES KEM challenge, then j’s key then B breaks the ind$-cca2 security of Π. To show that
$ G3 is indistinguishable from G4 , repeat this argument r
kj ← {0, 1}λH and, hence, B perfectly simulates Hj . If
A distinguishes Hj−1 from Hj then B breaks the ind$- times. More precisely:
cca2-KEM security of IES. Hence, Hj−1 and Hj are in- Pr[W3 ] − Pr[W4 ] ≤ r · Advind$-cca2 (1λ ).

distinguishable. Repeating this argument r times shows Π,A
that G1 and G2 are indistinguishable. More precisely:

G4 <–> G5 .
Pr[W1 ] − Pr[W2 ] ≤ r · Advcca2
λ The challenger’s actions in G4 and G5 only differ if A
KEM,A (1 ).
could create a decryption request qDec(pki (Si ), c) where
G0 <–> G1 . Unhide(τ ) = Xi? , ei = e?i , and the integrity tag σ is valid
By perfect correctness of the authentication encryption but c is different from c? (recall A is not allowed to query
scheme, we have that for all keys k and messages m that c? itself). We show that if A can cause the challenger to
Dk (Ek (m)) = m, thus, games G2 and G3 are identical. output ⊥ incorrectly, then we can build a simulator B
Therefore: that breaks the strong unforgeability of MAC.
Assume a simulator B that tries to win an unforge-

Pr[W2 ] − Pr[W3 ] = 0.
ability game. Simulator B receives access to the oracles
G3 <–> G4 . M(·) and V(·), and needs to output a pair (c, σ), such
Similarly to the proof above, consider the hybrid games that VKmac (c, σ) returns true.
Hi where the first i entry points are substituted with Simulator B now proceeds as follows. When creating
random strings e1 , . . . , ei as in G4 , whereas the remain- the challenge ciphertext c? , it does not compute σ in
ing r − i are the actual encryptions as in G3 . Then step (9) of MsPURB.Enc using K ? , but instead uses its
G3 = H0 and G4 = Hr . We show that A cannot distin- oracle M and sets σ = M(c0 ). Note that because of
guish Hj−1 from Hj . Let Sj = hG, p, g, Hide(·), Π, H, Ĥi, the random oracle model for H0 and the fact that A’s
be the suite corresponding to recipient j. We show that view is independent of K ? , this change of Kmac remains
if A distinguishes Hj−1 from Hj then we can build a undetected.
distinguisher B against the ind$-cca2 security of Π. B Whenever A makes a decryption query
receives from its ind$-cca2 challenger: qDec(pki (Si ), c) B proceeds as before, except when it
– a challenge ciphertext e? , in response to an encryp- derives the key K ∗ . In that case it runs V(c0 , σ) to use
tion call with a message m such that, depending on its oracle to verify the MAC in step (5) of MsPURB.Dec.
the bit b ∈ {0, 1}, we have that e? = EZ (m) if b = 0 If V(c0 , σ) returns > then B outputs (c0 , σ) as its forgery
or e? is a random string if b = 1; (by construction, c0 was not queried to the MAC oracle
– a decryption oracle DZ (·). M(·)).
When constructing the challenge ciphertext, B calls Therefore, A cannot make queries that cause the
its challenge oracle with K k meta to obtain e? , and challenger to incorrectly output ⊥, and therefore the two
then sets e?j = e? for user j’s entry point (in line 47 games are indistinguishable, provided MAC is strongly
of Layout). We note that in the random oracle the unforgeable. More precisely:
real encryption key Zj = Ĥ(“key” k kj ) is independent
Pr[W4 ] − Pr[W5 ] ≤ Advsuf λ

from adversary A’s view, so we can replace it with the MAC,A (1 ).
G5 <–> G6 . D.3 Proof of Theorem 2

If A can distinguish between G5 and G6 , then we can
build a distinguisher B that breaks the indistinguisha- For our MsPURB ind$-cpa recipient-privacy game, we
bility from random bits (MAC-IND$) of MAC. take inspiration from the single-suite recipient-privacy
Distinguisher B proceeds as follows to compute the game defined by Barth et al. [4], but we restate it in the
challenge ciphertext c? . It proceeds as before, except ind$-cpa setting.
that in step (9) of MsPURB.Enc, it submits c0 to its
Game Recipient-Privacy.
challenge oracle to receive a tag τ ? . It then sets τ = τ ?
The game is between a challenger and an adversary A,
and proceeds to construct the PURB ciphertext.
and proceeds along the following phases:
Note that as per the changes before, B never needs
Init: The challenger and adversary take λ as input.
to verify a MAC under the key that was used to create
The adversary outputs a number of recipients r and
τ ? for the challenge ciphertext. Moreover, as before, A’s
corresponding cipher suites S1 , . . . , Sr it wants to at-
view is independent of the K ? , so also this change of
tack. Let s be the number of unique cipher suites.
Kmac remains undetected.
The challenger verifies, for each i ∈ {1, . . . , r}, that
If b = 0, B simulates G5 , and if b = 1, B simu-
Si is a valid cipher suite, i.e., that it a valid out-
lates G6 . Hence, if A can distinguish between these two
put of MSBE.Setup(1λ ). The challenger aborts, and
games, B breaks the MAC-IND$ game. More precisely: $
sets b? ← {0, 1} if the suites are not all valid. Adver-
Pr[W5 ] − Pr[W6 ] ≤ Advind$ (1λ ).

MAC,A sary A then outputs two sets of recipients N0 , N1 ⊆
{1, . . . , n} such that |N0 | = |N1 | = r, and the number
G6 <–> G7 . of users in N0 and N1 using suite Sj is the same.
If A can distinguish between G6 and G7 , then we can Setup: For each i ∈ 1, . . . , n given by A, the challenger
build a distinguisher B that breaks the ind$-cpa prop- runs (ski , pki ) ← MsPURB.KeyGen(Si ), where Si is
erty of (Enc, Dec). In the ind$-cpa game [49], B receives: previously chosen by A. The challenger gives two sets
– a challenge ciphertext cpayload = cb , s.t. c0 = R0 = {pk10 , . . . , pkr0 } and R1 = {pk11 , . . . , pkr1 } to A,
$
EncKenc (m) on a chosen-by-B m, c1 ← {0, 1}|c0 | , where R0 , R1 are the generated public keys of the re-
$ cipients N0 , N1 respectively. The challenger also gives
and b ← {0, 1}.
B runs MsPURB.Dec as before to create a challenge for to A all ski that correspond to i ∈ N0 ∩ N1 .
A, except that B uses the ind$-cpa challenge ciphertext Challenge: A outputs m? . The challenger gen-
cpayload in step (7), instead of encrypting, as B does not erates c0 = MsPURB.Enc(R0 , m? ) and c1 =
$
know Kenc . As before, A’s view is independent of K ? , MsPURB.Enc(R1 , m? ). The challenger flips a coin b ←
so also this change of Kenc remains undetected. {0, 1} and sends c? = cb to A.
B answers decryption queries qDec(pki (Si ), c) from Guess: A outputs its guess b? for b and wins if b? = b.
A as before. In particular
– if Unhide(τ ) = Xi? and ei = e?i , B returns ⊥ as per We define A’s advantage in this game as:
the changes in G5 ;
Advcpa-in λ ?

1
msbe,A (1 ) = 2 Pr[b = b ] − 2 .

– Otherwise, B runs MsPURB.Dec(·).
If b = 0, B simulates G6 , and, if b = 1, B simulates G7 . We say that a MSBE scheme is cpa-secure against in-
Hence, if A can distinguish between these two games, B siders if Advcpa-in λ
msbe,A (1 ) is negligible in the security pa-
can break the the ind$-cpa property of (Enc, Dec). More rameter.
precisely: The conditions on N0 and N1 in the game ensure
Pr[W6 ] − Pr[W7 ] ≤ Advind$-cpa (1λ ). that A cannot trivially win by looking at the size of

(Enc,Dec),A
the ciphertext. PURBs allows for suites with different
Combining the individual inequalities we find that there groups (resulting in different size encodings of the corre-
exists adversaries B1 , . . . , B5 such that sponding IES public key) and for suites to use different
authenticated encryption schemes (that could result in
Advcca2-out λ cca2 λ
msbe,A (1 ) ≤ r AdvKEM,B1 (1 ) + AdvΠ,B2
ind$-cca2 λ
(1 ) +
different sizes of encrypted entry points). Since PURBs
Advsuf λ ind$ λ
MAC,B3 (1 ) + AdvMAC,B4 (1 )+
must encode groups and entry points into the header, we
mandate that for each suite the number of recipients is
Advind$-cpa
(Enc,Dec),B5
(1λ ),
the same in N0 and N1 . This assumption is similar to re-
completing the proof. quiring equal-size sets of recipients in a challenge game
for single-suite broadcast encryption [4]. As in broad- $

– a challenge hX ? , k ? i, where depending on bit b ←
cast encryption, if this requirement is an issue, a sender ?
{0, 1}, we have k ? = H(Y x ) if b = 0 or k ? ←
$
can add dummy recipients to avoid structural leakage {0, 1}l(λ) if b = 1;

to an insider adversary. – access to a Decap(·) oracle for all but X ? .
We will show that At the start of the game, B will set pki = Y , so that
the public key of recipient i matches that of its IES
Advcpa-in λ cca2 λ
msbe,A (1 ) ≤ 2d · AdvKEM,B (1 ),
KEM challenger. Note that B does not know the cor-
where d is the number of recipients in which N0 and N1 responding private key yi . For all other recipients h, B
differ. sets (skh = yh , pkh = Yh ) = MsPURB.KeyGen(Sh ). As
A plays an ind$-cpa game, B does not need to use the
Proof. Similarly to Barth et al. [4], we prove recipient
Decap(·) oracle (in fact, for ind$-cpa recipient privacy
privacy when the sets R0 and R1 differ only by one
ind$-cpa security of the IES KEM suffices).
public key in one suite. The general case follows by a
If b = 0 in the IES-KEM challenge, then B simulates
hybrid argument. Consider the following games:
G0 , and, If b = 1, B simulates G1 . Hence, if A distin-
Game G0 . guishes between G0 and G1 , B wins in the ind$-cca2
This game is as the original recipient-privacy ind$-cpa IES-KEM game. Therefore:
game where b = 0 and pki = R0 \ R1 , pkj = R1 \ R0 ,
where the public keys pki and pkj are of the same suite |Pr[W0 ] − Pr[W1 ]| ≤ Advcca2 λ
KEM,B (1 )
S.
G1 <–> G2 .
Game G1 . The proof follows the same steps as the proof of G0 <–>
As in G0 , but we change how a key ki? corresponding to G1 . Therefore:
the recipient i is computed in HdrPURB.Encap for the
|Pr[W0 ] − Pr[W1 ]| ≤ Advcca2 λ
KEM,B (1 ).
challenge ciphertext. Instead of computing ki? = H(Yix )
(where Yi = pki ) as in step (2) of HdrPURB.Encap, we Let d be the number of recipients that differ in N0 and
$
set ki? ← {0, 1}λH . As the challenger generates fresh N1 . Then by repeating the above two steps d times in a
public keys for each encryption query and thus a fresh hybrid argument, we find that:
key ki , and does not have to answer decryption queries,
it does not need to memorize ki? . Advcpa-in λ cca2 λ
msbe,A (1 ) ≤ 2d · AdvKEM,B (1 ),
Game G2 . as desired.
As in G1 , but we change the random sampling ki?
in HdrPURB.Encap for the challenge ciphertext with
ki? = H(Yjx ) = kj? where Yj = pkj . The game now is the
original recipient-privacy ind$-cpa game where b = 1.
Conclusion. G0 represents the recipient-privacy game

with b = 0 and G2 recipient-privacy game with b = 1.
If A cannot distinguish between G0 and G2 , A does
not have an advantage in winning the recipient-privacy
game.
Let Wi be the event that A outputs b? = 1 in game
Gi .
G0 <–> G1 .
If A can distinguish between G0 and G1 , we can build a
distinguisher B against the ind$-cca2 security of the IES
KEM. Recall that B receives, from its ind$-cca2-KEM
challenger,
– a public key Y ;

Popets 2019 0056

Uploaded by

Copyright:

Available Formats

Popets 2019 0056

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Popets 2019 0056

Uploaded by

Copyright:

Available Formats

Proceedings on Privacy Enhancing Technologies ; 2019 (4):6–33

Reducing Metadata Leakage from Encrypted

3.1.1 System Model 3.1.3 System Goals

Overlapping Layers. One challenge is that suites

E ← blog2 Lc // L’s floating-point exponent

Fig. 7. Maximum multiplicative expansion overhead with respect

PGP variant as a comparison for PURBs, which also

Decoding time [ms]

Fig. 8. Performance of the PURBs encoding.

generation only depends on a number of cipher suites

100 used. The encoding is arguably efficient for most cases

60 45% in the case of files (Figure 11b), 83% in the case

(a) Dataset ‘YouTube’:

80 Dataset Fixed block size Next power of 2 Padmé

60 files 40.15 44.18 3.64

60 an adversary cannot tell whether a public key is a valid

// auxi = hP1 , . . . , Pr i are entry-point positions 37 if layout.isFree(start, end) then

18 while auxi not empty do $ padding_len

Table 6. PURB Suites. “Suite A” is a shorthand for the first suite.

Alias PURB Suite |Public key| [B] |EntryPoint| [B]

We say that a KEM is ind-cca2-secure if Advcca2 λ

following simple games.

and the encapsulated key K ? with metadata meta? : triangle inequality.

that G1 and G2 are indistinguishable. More precisely:

G5 <–> G6 . D.3 Proof of Theorem 2

for single-suite broadcast encryption [4]. As in broad- $

can add dummy recipients to avoid structural leakage {0, 1}l(λ) if b = 1;

Conclusion. G0 represents the recipient-privacy game

You might also like