-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Bug report
http.cookies.SimpleCookie()
takes a string and should return a dict-like parse of the result. On some parse errors, it returns an empty dict, or one sparsely populated with values, for example on success on a cookie with two name-value pairs:
>>> import http.cookies.SimpleCookie('a1=b;a2="c d";')
<SimpleCookie: a1='b' a2='c d'>
Cookies consist of name-value pairs, both of which have legal character subsets as defined in RFC2068 and RFC2109. Actual browser / server implementations are more lenient and cookies.py
source includes such acknowledgement.
# Pattern for finding cookie
#
# This used to be strict parsing based on the RFC2109 and RFC2068
# specifications. I have since discovered that MSIE 3.0x doesn't
# follow the character rules outlined in those specs. As a
# result, the parsing rules here are less strict.
Cool. Modern times bring more exceptions. Specifically, Google's OAUTH implementation now includes a cookie (g_state
) whose value appears to be JSON, and embedded double quotes cause SimpleCookie() to fail (or actually "succeed" in a useless way):
>>> import http.cookies.SimpleCookie('a1={"b":c};')
<SimpleCookie: >
Bug Fix/Modest Proposal
Rather than trying to get Google to change their cookie format (which is happily supported by common browsers), or requiring users of http.cookie
module write their own parsers, I suggest simple augmentation of the regular expression used to "find cookies".
The change would simply allow any number of embedded double quotes. The following snippet adds two lines to the existing _CookiePattern
as found in cpython/Lib/http/cookies.py
:
_CookiePattern = re.compile(r"""
\s* # Optional whitespace at start of cookie
(?P<key> # Start of group 'key'
[""" + _LegalKeyChars + r"""]+? # Any word of at least one letter
) # End of group 'key'
( # Optional group: there may not be a value.
\s*=\s* # Equal Sign
(?P<val> # Start of group 'val'
"(?:[^\\"]|\\.)*" # Any doublequoted string
| # or
\w{3},\s[\w\d\s-]{9,11}\s[\d:]{8}\sGMT # Special case for "expires" attr
| # or
[""" + _LegalValueChars + r"""]* # Any word or empty string
# additional clause vvvvv
|
[""" + _LegalValueChars + r"""]+[""" + _LegalValueChars + r'"]*[' + _LegalValueChars + r"""]+ # Any word with internal quotes
# end ^^^^
) # End of group 'val'
)? # End of optional value group
\s* # Any number of spaces.
(\s+|;|$) # Ending either at space, semicolon, or EOS.
""", re.ASCII | re.VERBOSE) # re.ASCII may be removed if safe.
The two added lines merely permit cookie values to contain any number of double quotes, as long as the first and last character of the value is not a double quote. No further interpretation of cookie value (such as json validation) is attempted or warranted.
Your environment
Observed initially in 3.8.10 (linux), confirmed 3.10.1 (Mac) and observed the code in cpython/Lib/http/cookies.py:437
.
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status