Regular Expressions

Code & Debug
Regular Expression in Python

Regular Expressions, often shortened as regex, are a sequence of characters
used to check whether a pattern exists in a given text (string) or not. If you
have ever used search engines, search and replace tools of word processors
and text editors - you've already seen regular expressions in use. They are used
at the server side to validate the format of email addresses or passwords
during registration, used for parsing text data files to find, replace, or delete
certain string, etc. They help in manipulating textual data, which is often a
prerequisite for data science projects involving text mining.
In Python, regular expressions are supported by the re module. That means
that if you want to start using them in your Python scripts, you must import
this module with the help of import:
import re
The re library in Python provides several functions that make it a skill worth
mastering.
This chapter will walk you through the important concepts of regular
expressions with Python. You will start with importing re - Python library that
supports regular expressions. Then you will see how basic/ordinary characters
are used for performing matches, followed by wild or special characters. Next,
you will learn about using repetitions in your regular expressions. You will also
learn how to create groups and named groups within your search for ease of
access to matches.
Contact: +91-9712928220 | Mail: [email protected]

Website: codeanddebug.in
Code & Debug
Basic Patterns: Ordinary Characters

You can easily tackle many basic patterns in Python using ordinary characters.
Ordinary characters are the simplest regular expressions. They match
themselves exactly and do not have a special meaning in their regular
expression syntax.
Examples are 'A', 'a', 'X', '5'.
Ordinary characters can be used to perform simple exact matches:
import re
pattern = r"Cookie"
sequence = "Cookie"
if re.match(pattern, sequence):
print("Match!")
else:
print("Not a match!")
Output
Most alphabets and characters will match themselves, as you saw in the
example.
The match() function returns a match object if the text matches the pattern.
Otherwise, it returns None.

Code & Debug
Wild Card Characters: Special Characters

Special characters are characters that do not match themselves as seen but
have a special meaning when used in a regular expression. For simple
understanding, they can be thought of as reserved metacharacters that denote
something else and not what they look like.
Let us check out some examples to see the special characters in action...
But before you do, the examples below make use of two functions namely:
search() and group().
With the search function, you scan through the given string/sequence, looking
for the first location where the regular expression produces a match.
The group function returns the string matched by the re.
. - A period. Matches any single character except the newline character.
import re
print(re.search(r'Co.k.e', 'Cookie').group())
Output
^ - A caret. Matches the start of the string.

import re
print(re.search(r'^Hello', "Hello World!").group())
Output

Code & Debug
$ - Matches the end of string.

This is helpful if you want to make sure a document/sentence ends with
certain characters.
import re
print(re.search(r'python$', "Python! Let's learn

python").group())
Output
[abc] - Matches a or b or c.
[a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 to 9).
import re
print(re.search(r'[0-6]', 'Number: 5').group())
Output
import re
# Matches any character except 5

print(re.search(r'Number: [^5]', 'Number: 0').group())
Output

Code & Debug
\w - Lowercase 'w'. Matches any single letter, digit, or underscore.

\W - Uppercase 'W'. Matches any character not part of \w (lowercase w).
import re
print("Lowercase w:", re.search(r'Co\wk\we',

'Cookie').group())
# Matches any character except single letter, digit or

underscore
print("Uppercase W:", re.search(r'C\Wke', 'C@ke').group())
# Uppercase W won't match single letter, digit

print("Uppercase W won't match, and return:",
re.search(r'Co\Wk\We', 'Cookie'))
Output
\s - Lowercase 's'. Matches a single whitespace character like: space, newline,

tab, return.
\S - Uppercase 'S'. Matches any character not part of \s (lowercase s).
import re
print("Lowercase s:", re.search(r'Learn\spython', 'Learn

python').group())
print("Uppercase S:", re.search(r'cook\Se', "Let's eat
cookie").group())
Output

Code & Debug
\d - Lowercase d. Matches decimal digit 0-9.

\D - Uppercase d. Matches any character that is not a decimal digit.
import re
print("How many python courses do you want? ",

re.search(r'\d+', '2 python').group())
Output
\t - Lowercase t. Matches tab.

\n - Lowercase n. Matches newline.
\r - Lowercase r. Matches return.
\A - Uppercase a. Matches only at the start of the string. Works across
multiple lines as well.
\Z - Uppercase z. Matches only at the end of the string.
\b - Lowercase b. Matches only the beginning or end of the word.

Code & Debug
Repetitions
It becomes quite tedious if you are looking to find long patterns in a sequence.
Fortunately, the re module handles repetitions using the following special
characters:
+ - Checks if the preceding character appears one or more times starting from
that position.
import re
print(re.search(r'Py+thon', 'Pyyyython').group())
Output
* - Checks if the preceding character appears zero or more times starting

from that position.
import re
print(re.search(r'Ca*o*kie', 'Cookie').group())
Output
? - Checks if the preceding character appears exactly zero or one time starting
from that position.
import re
print(re.search(r'Pythoi?n', 'Python').group())
Output

Code & Debug
But what if you want to check for an exact number of sequence repetition?
For example, checking the validity of a phone number in an application. re
module handles this very gracefully as well using the following regular
expressions:
{x} - Repeat exactly x number of times.
{x,} - Repeat at least x times or more.
{x, y} - Repeat at least x times but no more than y times.
import re
print(re.search(r'\d{9,10}', '0987654321').group())
Output

Code & Debug
Grouping in Regular Expressions

The group feature of regular expression allows you to pick up parts of the
matching text. Parts of a regular expression pattern bounded by parenthesis ()
are called groups. The parenthesis does not change what the expression
matches, but rather forms groups within the matched sequence. You have
been using the group() function all along in this tutorial's examples. The plain
match.group() without any argument is still the whole matched text as usual.
Let us understand this concept with a simple example. Imagine you were
validating email addresses and wanted to check the user’s name and host. This
is when you would want to create separate groups within your matched text.
import re
statement = 'Please contact us at: [email protected]'

match = re.search(r'([\w\.-]+)@([\w\.-]+)', statement)
if statement:
print("Email address:", match.group()) # The whole
matched text
print("Username:", match.group(1)) # The username (group
1)
print("Host:", match.group(2)) # The host (group 2)
Output


Regular Expressions

Uploaded by

Copyright:

Available Formats

Regular Expressions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regular Expressions

Uploaded by

Copyright:

Available Formats

Code & Debug

Regular Expression in Python

Contact: +91-9712928220 | Mail: [email protected]

Basic Patterns: Ordinary Characters

Contact: +91-9712928220 | Mail: [email protected]

Wild Card Characters: Special Characters

^ - A caret. Matches the start of the string.

print(re.search(r'^Hello', "Hello World!").group())

Contact: +91-9712928220 | Mail: [email protected]

$ - Matches the end of string.

print(re.search(r'python$', "Python! Let's learn

print(re.search(r'[0-6]', 'Number: 5').group())

# Matches any character except 5

Contact: +91-9712928220 | Mail: [email protected]

\w - Lowercase 'w'. Matches any single letter, digit, or underscore.

print("Lowercase w:", re.search(r'Co\wk\we',

# Matches any character except single letter, digit or

# Uppercase W won't match single letter, digit

\s - Lowercase 's'. Matches a single whitespace character like: space, newline,

print("Lowercase s:", re.search(r'Learn\spython', 'Learn

Contact: +91-9712928220 | Mail: [email protected]

\d - Lowercase d. Matches decimal digit 0-9.

print("How many python courses do you want? ",

\t - Lowercase t. Matches tab.

Contact: +91-9712928220 | Mail: [email protected]

* - Checks if the preceding character appears zero or more times starting

Contact: +91-9712928220 | Mail: [email protected]

Contact: +91-9712928220 | Mail: [email protected]

Grouping in Regular Expressions

statement = 'Please contact us at: [email protected]'

Contact: +91-9712928220 | Mail: [email protected]

You might also like