HTTP/2.0 Header Compression

2. Overview

In HTTP/1.X, HTTP headers, which are necessary for the functioning of the protocol, are transmitted with no transformations. Unfortunately, the amount of redundancy in both the keys and the values of these headers is astonishingly high, and is the cause of increased latency on lower bandwidth links. This indicates that an alternate encoding for headers would be beneficial to latency, and that is what is proposed here. As shown by SPDY [SPDY], Deflate compresses HTTP very effectively. However, the use of a compression scheme which allows for arbitrary matches against the previously encoded data (such as Deflate) exposes users to security issues. In particular, the compression of sensitive data, together with other data controlled by an attacker, may lead to leakage of that sensitive data, even when the resultant bytes are transmitted over an encrypted channel. Another consideration is that processing and memory costs of a compressor such as Deflate may also be too high for some classes of devices, for example when doing forward or reverse proxying.¶

2.1. Outline

The HTTP header representation described in this document is based on indexing tables that store (name, value) pairs, called header tables in the remainder of this document. This scheme is believed to be safe for all known attacks against the compression context today. Header tables are incrementally updated during the whole HTTP/2.0 session. Two independent header tables are used during a HTTP/2.0 session, one for HTTP request headers and one for HTTP response headers.¶

The encoder is responsible for deciding which headers to insert as (name, value) pairs in the header table. The decoder then does exactly what the encoder prescribes, ending in a state that exactly matches the encoder's state. This enables decoders to remain simple and understand a wide variety of encoders.¶

A header may be represented as a literal or as an index. If represented as a literal, the representation specifies whether this header is used to update the indexing table. The different representations are described in Section 3.3.¶

A set of headers is coded as a difference from the previous set of headers.¶

An example illustrating the use these different mechanisms to represent headers is available in Appendix B.¶

4. Detailed Format

4.2. Low-level representations

4.2.1. Integer representation

Integers are used to represent name indexes, pair indexes or string lengths. The integer representation keeps byte-alignment as much as possible as this allows various processing optimizations as well as efficient use of DEFLATE. For that purpose, an integer representation always finishes at the end of a byte.¶

An integer is represented in two parts: a prefix that fills the current byte and an optional list of bytes that are used if the integer value does not fit in the prefix. The number of bits of the prefix (called N) is a parameter of the integer representation.¶

The N-bit prefix allows filling the current byte. If the value is small enough (strictly less than 2^N-1), it is encoded within the N-bit prefix. Otherwise all the bits of the prefix are set to 1 and the value is encoded using an unsigned variable length integer representation.¶

The algorithm to represent an integer I is as follows: ¶

If I < 2^N - 1, encode I on N bits
Else, encode 2^N - 1 on N bits and do the following steps:
1. Set I to (I - (2^N - 1)) and Q to 1
2. While Q > 0
3. 1. Compute Q and R, quotient and remainder of I divided by 2^7
  2. If Q is strictly greater than 0, write one 1 bit; otherwise, write one 0 bit
  3. Encode R on the next 7 bits
  4. I = Q

4.2.1.1. Example 1: Encoding 10 using a 5-bit prefix

The value 10 is to be encoded with a 5-bit prefix. ¶

10 is less than 31 (= 2^5 - 1) and is represented using the 5-bit prefix.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 |   10 stored on 5 bits
+---+---+---+---+---+---+---+---+

4.2.1.2. Example 2: Encoding 1337 using a 5-bit prefix

The value I=1337 is to be encoded with a 5-bit prefix. ¶

1337 is greater than 31 (= 2^5 - 1).
- The 5-bit prefix is filled with its max value (31).
The value to represent on next bytes is I = 1337 - (2^5 - 1) = 1306.
- 1306 = 128*10 + 26, i.e. Q=10 and R=26.
- Q is greater than 1, bit 8 is set to 1.
- The remainder R=26 is encoded on next 7 bits.
- I is replaced by the quotient Q=10.
The value to represent on next bytes is I = 10.
- 10 = 128*0 + 10, i.e. Q=0 and R=10.
- Q is equal to 0, bit 16 is set to 0.
- The remainder R=10 is encoded on next 7 bits.
- I is replaced by the quotient Q=0.
The process ends.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 |   Prefix = 31
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |   Q>=1, R=26
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |   Q=0 , R=10
+---+---+---+---+---+---+---+---+

4.2.2. String literal representation

Literal strings can represent header names or header values. They are encoded in two parts: ¶

The string length, defined as the number of bytes needed to store its UTF-8 representation, is represented as an integer with a zero bits prefix. If the string length is strictly less than 128, it is represented as one byte.
The string value represented as a list of UTF-8 characters.

4.3. Indexed Header Representation

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 1 |        Index (7+)         |
+---+---------------------------+

Figure 1: Indexed Header

This representation starts with the '1' 1-bit pattern, followed by the index of the matching pair, represented as an integer with a 7-bit prefix.¶

4.4. Literal Header Representation

4.4.1. Literal Header without Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 |    Index (5+)     |
+---+---+---+-------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Figure 2: Literal Header without Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 |         0         |
+---+---+---+-------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Figure 3: Literal Header without Indexing - New Name

This representation, which does not involve updating the header table, starts with the '011' 3-bit pattern.¶

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 5-bit prefix. Note that if the index is strictly below 31, one byte is used.¶

If the header name does not match a header name entry, the value 0 is represented on 5 bits followed by the header name, represented as a literal string.¶

Header name representation is followed by the header value represented as a literal string as described in Section 4.2.2.¶

4.4.2. Literal Header with Incremental Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 |    Index (5+)     |
+---+---+---+-------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Figure 4: Literal Header with Incremental Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 |         0         |
+---+---+---+-------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Figure 5: Literal Header with Incremental Indexing - New Name

This representation starts with the '010' 3-bit pattern.¶

If the header name does not match a header name entry, the value 0 is represented on 5 bits followed by the header name, represented as a literal string.¶

Header name representation is followed by the header value represented as a literal string as described in Section 4.2.2.¶

4.4.3. Literal Header with Substitution Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 0 |      Index (6+)       |
+---+---+-----------------------+
|    Substituted Index (8+)     |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Figure 6: Literal Header with Substitution Indexing - Indexed Name

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 0 |           0           |
+---+---+-----------------------+
|       Name Length (8+)        |
+-------------------------------+
|  Name String (Length octets)  |
+-------------------------------+
|    Substituted Index (8+)     |
+-------------------------------+
|       Value Length (8+)       |
+-------------------------------+
| Value String (Length octets)  |
+-------------------------------+

Figure 7: Literal Header with Substitution Indexing - New Name

This representation starts with the '00' 2-bit pattern.¶

If the header name matches the header name of a (name, value) pair stored in the Header Table, the index of the pair increased by one (index + 1) is represented as an integer with a 6-bit prefix. Note that if the index is strictly below 62, one byte is used.¶

If the header name does not match a header name entry, the value 0 is represented on 6 bits followed by the header name, represented as a literal string.¶

The index of the substituted (name, value) pair is inserted after the header name representation as a 0-bit prefix integer.¶

The index of the substituted pair MUST correspond to a position in the header table containing a non-void entry. An index for the substituted pair that corresponds to empty position in the header table MUST be treated as an error.¶

This index is followed by the header value represented as a literal string as described in Section 4.2.2.¶

Appendix A. Initial header names

[rfc.comment.5: The tables in this section should be updated based on statistical analysis of header names frequency and specific HTTP 2.0 header rules (like removal of some headers).]
[rfc.comment.6: These tables are not adapted for headers contained in PUSH_PROMISE frames. Either the tables can be merged, or the table for responses can be updated.] ¶

A.1. Requests

The following table lists the pre-defined headers that make-up the initial header table user to represent requests sent from a client to a server.¶

Table 1
Index	Header Name	Header Value
0	:scheme	http
1	:scheme	https
2	:host
3	:path	/
4	:method	GET
5	accept
6	accept-charset
7	accept-encoding
8	accept-language
9	cookie
10	if-modified-since
11	keep-alive
12	user-agent
13	proxy-connection
14	referer
15	accept-datetime
16	authorization
17	allow
18	cache-control
19	connection
20	content-length
21	content-md5
22	content-type
23	date
24	expect
25	from
26	if-match
27	if-none-match
28	if-range
29	if-unmodified-since
30	max-forwards
31	pragma
32	proxy-authorization
33	range
34	te
35	upgrade
36	via
37	warning

A.2. Responses

The following table lists the pre-defined headers that make-up the initial header table used to represent responses sent from a server to a client. The same header table is also used to represent request headers sent from a server to a client in a PUSH_PROMISE frame.¶

Table 2
Index	Header Name	Header Value
0	:status	200
1	age
2	cache-control
3	content-length
4	content-type
5	date
6	etag
7	expires
8	last-modified
9	server
10	set-cookie
11	vary
12	via
13	access-control-allow-origin
14	accept-ranges
15	allow
16	connection
17	content-disposition
18	content-encoding
19	content-language
20	content-location
21	content-md5
22	content-range
23	link
24	location
25	p3p
26	pragma
27	proxy-authenticate
28	refresh
29	retry-after
30	strict-transport-security
31	trailer
32	transfer-encoding
33	warning
34	www-authenticate

Appendix B. Example

Here is an example that illustrates different representations and how tables are updated. [rfc.comment.7: This section needs to be updated to integrate differential coding.] ¶

B.1. First header set

The first header set to represent is the following: ¶

:path: /my-example/index.html
user-agent: my-user-agent
x-my-header: first

The header table is empty, all headers are represented as literal headers with indexing. The 'x-my-header' header name is not in the header name table and is encoded literally. This gives the following representation:

0x44      (literal header with incremental indexing, name index = 3)
0x16      (header value string length = 22)
/my-example/index.html
0x4D      (literal header with incremental indexing, name index = 12)
0x0D      (header value string length = 13)
my-user-agent
0x40      (literal header with incremental indexing, new name)
0x0B      (header name string length = 11)
x-my-header
0x05      (header value string length = 5)
first

The header table is as follows after the processing of these headers:

Header table
+---------+----------------+---------------------------+
|  Index  | Header Name    | Header Value              |
+---------+----------------+---------------------------+
|    0    | :scheme        | http                      |
+---------+----------------+---------------------------+
|    1    | :scheme        | https                     |
+---------+----------------+---------------------------+
|   ...   | ...            | ...                       |
+---------+----------------+---------------------------+
|   37    | warning        |                           |
+---------+----------------+---------------------------+
|   38    | :path          | /my-example/index.html    | added header
+---------+----------------+---------------------------+
|   39    | user-agent     | my-user-agent             | added header
+---------+----------------+---------------------------+
|   40    | x-my-header    | first                     | added header
+---------+----------------+---------------------------+

As all the headers in the first header set are indexed in the header table, all are kept in the reference set of headers, which is:

Reference Set:
:path, /my-example/index.html
user-agent, my-user-agent
x-my-header, first

B.2. Second header set

The second header set to represent is the following: ¶

:path: /my-example/resources/script.js
user-agent: my-user-agent
x-my-header: second

Comparing this second header set to the reference set, the first and third headers are from the reference set are not present in this second header set and must be removed. In addition, in this new set, the first and third headers have to be encoded. The path header is represented as a literal header with substitution indexing. The x-my-header will be represented as a literal header with incremental indexing.

0xa6       (indexed header, index = 38: removal from reference set)
0xa8       (indexed header, index = 40: removal from reference set)
0x04       (literal header, substitution indexing, name index = 3)
0x26       (replaced entry index = 38)
0x1f       (header value string length = 31)
/my-example/resources/script.js
0x5f 0x0a  (literal header, incremental indexing, name index = 40)
0x06       (header value string length = 6)
second

The header table is updated as follow:

Header table
+---------+----------------+---------------------------+
|  Index  | Header Name    | Header Value              |
+---------+----------------+---------------------------+
|    0    | :scheme        | http                      |
+---------+----------------+---------------------------+
|    1    | :scheme        | https                     |
+---------+----------------+---------------------------+
|   ...   | ...            | ...                       |
+---------+----------------+---------------------------+
|   37    | warning        |                           |
+---------+----------------+---------------------------+
|   38    | :path          | /my-example/resources/    | replaced
|         |                |     script.js             | header
+---------+----------------+---------------------------+
|   39    | user-agent     | my-user-agent             |
+---------+----------------+---------------------------+
|   40    | x-my-header    | first                     |
+---------+----------------+---------------------------+
|   41    | x-my-header    | second                    | added header
+---------+----------------+---------------------------+

All the headers in this second header set are indexed in the header table, therefore, all are kept in the reference set of headers, which becomes:

Reference Set:
:path, /my-example/resources/script.js
user-agent, my-user-agent
x-my-header, second

HTTP/2.0 Header Compression

Abstract

Status of this Memo

Copyright Notice

1. Introduction

2. Overview

2.1. Outline

3. Header Encoding

3.1. Encoding Components

3.2. Header Table

3.3. Header Representation

3.3.1. Literal Representation

3.3.2. Indexed Representation

3.4. Differential Coding