python-regex-cheatsheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Extensions.

Do not cause grouping, except 'P<name>':


Python 2.7 Regular (?iLmsux) Match empty string, sets re.X flags

Expressions (?:...)
(?P<name>...)
Non-capturing version of regular parens
Create a named capturing group.
(?P=name) Match whatever matched prev named group
(?#...) A comment; ignored.
Non-special chars match themselves. Exceptions are (?=...) Lookahead assertion, match without consuming
special characters: (?!...) Negative lookahead assertion
(?<=...) Lookbehind assertion, match if preceded
(?<!...) Negative lookbehind assertion
\ Escape special char or start a sequence. (?(id)y|n) Match 'y' if group 'id' matched, else 'n'
. Match any char except newline, see re.DOTALL
^ Match start of the string, see re.MULTILINE
$ Match end of the string, see re.MULTILINE Flags for re.compile(), etc. Combine with '|':
[] Enclose a set of matchable chars
R|S Match either regex R or regex S. re.I == re.IGNORECASE Ignore case
() Create capture group, & indicate precedence re.L == re.LOCALE Make \w, \b, and \s locale dependent
re.M == re.MULTILINE Multiline
re.S == re.DOTALL Dot matches all (including newline)
re.U == re.UNICODE Make \w, \b, \d, and \s unicode dependent
After '[', enclose a set, the only special chars are: re.X == re.VERBOSE Verbose (unescaped whitespace in pattern
is ignored, and '#' marks comment lines)

] End the set, if not the 1st char


- A range, eg. a-c matches a, b or c Module level functions:
^ Negate the set only if it is the 1st char
compile(pattern[, flags]) -> RegexObject
match(pattern, string[, flags]) -> MatchObject
Quantifiers (append '?' for non-greedy): search(pattner, string[, flags]) -> MatchObject
findall(pattern, string[, flags]) -> list of strings
finditer(pattern, string[, flags]) -> iter of MatchObjects
{m} Exactly m repetitions split(pattern, string[, maxsplit, flags]) -> list of strings
{m,n} From m (default 0) to n (default infinity) sub(pattern, repl, string[, count, flags]) -> string
* 0 or more. Same as {,} subn(pattern, repl, string[, count, flags]) -> (string, int)
+ 1 or more. Same as {1,} escape(string) -> string
purge() # the re cache
? 0 or 1. Same as {,1}

Special sequences: RegexObjects (returned from compile()):

.match(string[, pos, endpos]) -> MatchObject


\A Start of string .search(string[, pos, endpos]) -> MatchObject
\b Match empty string at word (\w+) boundary .findall(string[, pos, endpos]) -> list of strings
\B Match empty string not at word boundary .finditer(string[, pos, endpos]) -> iter of MatchObjects
\d Digit .split(string[, maxsplit]) -> list of strings
.sub(repl, string[, count]) -> string
\D Non-digit
.subn(repl, string[, count]) -> (string, int)
\s Whitespace [ \t\n\r\f\v], see LOCALE,UNICODE .flags # int, Passed to compile()
\S Non-whitespace .groups # int, Number of capturing groups
\w Alphanumeric: [0-9a-zA-Z_], see LOCALE .groupindex # {}, Maps group names to ints
\W Non-alphanumeric .pattern # string, Passed to compile()
\Z End of string
\g<id> Match prev named or numbered group,
'<' & '>' are literal, e.g. \g<0>
MatchObjects (returned from match() and search()):
or \g<name> (not \g0 or \gname) .expand(template) -> string, Backslash & group expansion
.group([group1...]) -> string or tuple of strings, 1 per arg
.groups([default]) -> tuple of all groups, non-matching=default
Special character escapes are much like those already .groupdict([default]) -> {}, Named groups, non-matching=default
.start([group]) -> int, Start/end of substring match by group
escaped in Python string literals. Hence regex '\n' is .end([group]) -> int, Group defaults to 0, the whole match
same as regex '\\n': .span([group]) -> tuple (match.start(group), match.end(group))
.pos int, Passed to search() or match()
.endpos int, "
\a ASCII Bell (BEL) .lastindex int, Index of last matched capturing group
\f ASCII Formfeed .lastgroup string, Name of last matched capturing group
.re regex, As passed to search() or match()
\n ASCII Linefeed .string string, "
\r ASCII Carriage return
\t ASCII Tab
\v ASCII Vertical tab Gleaned from the python 2.7 're' docs.
\\ A single backslash http://docs.python.org/library/re.html
\xHH Two digit hexadecimal character goes here
\OOO Three digit octal char (or just use an https://github.com/tartley/python-regex-cheatsheet
initial zero, e.g. \0, \09) Version: v0.3.3
\DD Decimal number 1 to 99, match
previous numbered group

You might also like