Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #36023 -- Updated content_disposition_header to handle control chars. #18890

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

alexmv
Copy link
Contributor

@alexmv alexmv commented Dec 6, 2024

Trac ticket number

36023

Branch description

Being US-ASCII is not sufficient to use the simple quoted-string attribute value -- it must also not have any control characters.

Checklist

  • This PR targets the main branch.
  • The commit message:
    • is written in past tense
    • mentions the ticket number
    • ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@alexmv alexmv force-pushed the disposition-specials branch from c83a182 to 6bb4f62 Compare December 6, 2024 16:05
@jacobtylerwalls
Copy link
Member

Hi @alexmv, could you take a second to open a ticket? This bit of process helps us prioritize PRs. Thanks!

@alexmv alexmv force-pushed the disposition-specials branch from 6bb4f62 to 13ccbb5 Compare December 17, 2024 16:51
@alexmv
Copy link
Contributor Author

alexmv commented Dec 17, 2024

@sarahboyce sarahboyce changed the title Update content_disposition_header to handle control chars. Fixed #36023 -- Updated content_disposition_header to handle control chars. Dec 19, 2024
except UnicodeEncodeError:
is_ascii = False
# https://datatracker.ietf.org/doc/html/rfc9110#name-quoted-strings
if is_ascii and re.match(r"^[\t \x21-\x7e]*$", filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand the regex and that it's equivalent to the quoted string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more explanation to the commit message. Does that help clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentle ping on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you - that helped a lot ⭐

@alexmv alexmv force-pushed the disposition-specials branch from 13ccbb5 to ca0ac66 Compare December 20, 2024 14:51
To use the simple `filename="..."` form, the value must conform to the
official grammar from RFC6266[^1]:

    filename-parm       = "filename" "=" value
    value               = <value, defined in [RFC2616], Section 3.6>
                        ; token | quoted-string

The `quoted-string` definition comes from RFC 9110[^2]:

```
    quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
    qdtext         = HTAB / SP / %x21 / %x23-5B / %x5D-7E / obs-text

The backslash octet ("\") can be used as a single-octet quoting
mechanism within quoted-string and comment constructs. Recipients that
process the value of a quoted-string MUST handle a quoted-pair as if
it were replaced by the octet following the backslash.

    quoted-pair    = "\" ( HTAB / SP / VCHAR / obs-text )

A sender SHOULD NOT generate a quoted-pair in a quoted-string except
where necessary to quote DQUOTE and backslash octets occurring within
that string.
```

That is, quoted strings are able to express horizontal tabs, space
characters, and everything in the range from 0x21 to 0x7e, expect for
0x22 (`"`) and 0x5C (`\`), which can still be expressed but must be
escaped with their own `\`.

We ignore the case of `obs-text`, which is defined as the range
0x80-0xFF, since its presence is there for permissive parsing of
accidental high-bit characters, and it should not be generated by
conforming implementations.

Transform this character range into a regex and apply it in addition
to the "is ASCII" check.  This ensures that all simple filenames are
expressed in the simple format, and that all filenames with newlines
and other control characters are properly expressed with the
percent-encoded `filename*=...`form.

[^1]: https://datatracker.ietf.org/doc/html/rfc6266#section-4.1
[^2]: https://datatracker.ietf.org/doc/html/rfc9110#name-quoted-strings
@sarahboyce sarahboyce force-pushed the disposition-specials branch from ca0ac66 to 4efa613 Compare January 6, 2025 16:20
Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@sarahboyce sarahboyce merged commit 8914b57 into django:main Jan 7, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants