Regular Expressions
Regular Expressions
Regular Expressions
Expressions
A Regular Expression (RegEx) is a sequence of characters that defines a
search pattern.
Regular Expressions are used in programming languages to match text
patterns.
It's possible to check, if a text or a string matches a regular expression.
A great thing about regular expressions: The syntax of regular expressions is
the same for all programming and script languages, e.g. Python, Perl, Java etc.
Python has a module named re to work with RegEx.
MetaCharacters
[Pp]ython
Match "Python" or "python"
rub[ye]
Match "ruby" or "rube“
[aeiou]
Match any one lowercase vowel
. Period
A period matches any single character (except newline '\n').
^ Caret
The caret symbol ^ is used to check if a string starts with a certain
character.
$ Dollar
The dollar symbol $ is used to check if a string ends with a certain
character.
* Star
The star symbol * matches zero or more occurrences of the pattern left
to it.
+ Plus
The plus symbol + matches one or more occurrences of the pattern
left to it.
? Question Mark
The question mark symbol ? matches zero or one occurrence of the
pattern left to it.
{} Braces
Consider this code: {n,m}.
This means at least n, and at most m repetitions of the pattern left to it.
{} - Braces
| Alternation
Vertical bar | is used for alternation (or operator).
() Group
Parentheses () is used to group sub-patterns.
For example, (a|b|c)xz match any string that matches
either a or b or c followed by xz
Summary
\ Backslash
Backlash \ is used to escape various characters including all
meta characters.
For example,
\$a match if a string contains $ followed by a.
Here, $ is not interpreted by a RegEx engine in a special way.
search() vs. match()
match() checks for a match only at the beginning of the string.
search() checks for a match anywhere in the string.
Ex:
import re
x = re.search("cat","A cat and a rat can't be friends.")
print (x)
y = re.search("dog","A cat and a rat can't be friends.")
print (y)
x = re.match("cat","cat and a rat can't be friends.")
print (x)
y = re.match("cat","A cat and a rat can't be friends.")
print (y)
Output:
<re.RE_Match object; span=(2, 5), match='cat'>
None
<re.RE_Match object; span=(0, 3), match='cat'>
None
Ex:
import re
if re. search("cat", "A cat and a rat can't be friends."):
print(“Cat has been found.")
else:
print("No cat has been found.")
Output:
Cat has been found.
group() and groups()
Example:
import re
if match:
print(match.group())
print(match.group(1))
print(match.groups())
else:
print("pattern not found")
print(find)
Output:
801 35
801
('801', '35')
[('801', '35'), ('102', '11')]
Using r prefix before RegEx
When r or R prefix is used before a regular expression, it means
raw string. For example, '\n' is a new line whereas r'\n' means
two characters: a backslash \ followed by n.
Output:
output 1:
A and
output 2:
A cat and
output 3:
A
Ex:
import re
x = re.search(r'(^A) (and)',"A and a rat can't be friends.")
print ('output 4:\n',x.group(2))
x = re.search(r'(^A) ([a-z])nd',"A and a rat can't be friends.")
print ('output 5:\n',x.group(2))
x = re.search(r'(^A) (and)',"A and a rat can't be friends.")
print ('output 6:\n',x.groups())
Output:
output 4:
and
output 5:
a
output 6:
('A', 'and')
split() Function
The split() function returns a list where the string has been split at
each match:
import re
Ouput:
['The', 'rain', 'in', 'Spain']
sub() Function
The sub() function replaces the matches with the text of your choice
import re
Output:
The9rain9in9Spain
Ex:
import re
x = re.search(".","ab\n")
print (x.group())
x = re.search(”.+","ab\na")
print (x.group()) Output:
x = re.search(".*","") a
print (x.group()) ab
x = re.search("^a+","aabc")
aa
print (x.group()) bc
x = re.search("bc$","abc")
print (x.group())
Ex:
import re
match = re.search(r'iii', 'piiig')
print (match.group())
match = re.search(r'..g', 'piiig')
print (match.group())
match = re.search(r'\d\d\d', 'p123g')
print (match.group())
match = re.search(r'\w\w\w', '@@aB0d!!')
print (match.group())
match = re.search(r'\s', 'ab\nc') Output:
print (match.group()) iii
iig
match = re.search(r'\s', 'ab c') 123
print (match.group()) aB0
match = re.search(r'\S+', 'ab\nc')
print (match.group())
ab
Ex:
import re
str="[email protected]"
match = re.search(r'[\w-]+@[\w-]+', str)
Print(match.group())
match = re.search(r'[\w-]+@[\w.-]+', str)
Print(match.group())
match = re.search(r'[\w]+@[\w.-]+', str)
Print(match.group())
Output:
abc-xyz@yahoo
[email protected]
[email protected]
Ex:
import re
x = re.search('[^a-z]+',"ABC")
print ('output 1:\n',x.group())
x = re.search('[^a-z]',"ABC")
print ('output 2:\n',x.group())
Output:
output 1:
ABC
output 2:
A
Search and Replace
Syntax:
re.sub(pattern, repl, string, max)
Output:
Phone Num : 2004-959-559 a This is #Phone Number
Phone Num : 2004-959-559
Repetition Cases
Ex:
import re
x = re.search('ruby?',"rub")
print('output 1:\n',x.group())
x = re.search('ruby?',"ruby")
print('output 2:\n',x.group())
x = re.search('ruby*',"rub")
print('output 3:\n',x.group()) Output:
output 1:
x = re.search('ruby+',"rubyyy") rub
print('output 4:\n',x.group()) output 2:
ruby
output 3:
rub
output 4:
rubyyy
Ex:
import re
x = re.search('\d{3}',"0Ab456")
print ('output 1:\n',x.group())
x = re.search('\d{3,}',"0Ab456789")
print ('output 2:\n',x.group())
x = re.search('\d{3,5}',"0Ab456789")
print ('output 3:\n',x.group()) Output:
x = re.search('\d{3}',"0Ab4c56") output 1:
456
print ('output 4:\n',x.group()) output 2:
456789
output 3:
45678
AttributeError
Grouping with Parentheses
Ex:
import re
x = re.search('\D{3}',"0Abc456")
print('output 1:\n',x.group())
x = re.search('\D',"abc0123012")
print('output 2:\n',x.group()) Output:
output 1:
x = re.search('\D\d+',"abc012301a2") Abc
print('output 3:\n',x.group()) output 2:
a
x = re.search('(\D\d)+',"abc012ef34") output 3:
c012301
print('output 4:\n',x.group()) output 4:
x = re.search('(\D\d)+',"c0s234a") c0
output 5:
print('output 5:\n',x.group()) c0s2
Ex:
import re
x = re.search('([Pp]ython?)+',"Python and Java")
print('output 1:\n',x.group())
x = re.search ('([Pp]ython,?)+',"Python,python,Python")
print('output 2:\n',x.group())
Output:
output 1:
Python
output 2:
Python,python,Python
findall() method
findall() finds all the matches and returns them as a list of strings, with
each string representing one match.
If the pattern is not found, re.findall() returns an empty list.
Ex:
import re
str="[email protected], [email protected], iop"
match = re.findall(r'[\w-]+@[\w.-]+', str)
print(match)
for i in match:
print(i)
Output:
['[email protected]', '[email protected]']
[email protected]
[email protected]
findall() method
import re
string = 'hello 1 hi 89. Howdy 34'
pattern = '\d+'
Output:
['1', '89', '34']
findall() and groups
Ex:
import re
str="[email protected], [email protected], iop"
match = re.findall(r'([\w-]+)@([\w.-]+)', str)
print(match)
for i in match:
print (i)
print (i[0])
print (i[1])
Output:
[('abc-xyz', 'yahoo.ac.in'), ('qwe', 'gmail.com')]
('abc-xyz', 'yahoo.ac.in')
abc-xyz
yahoo.ac.in
('qwe', 'gmail.com')
qwe
gmail.com
Python program to search some literals strings in a string
Program:
import re
patterns = [ 'fox', 'dog', 'horse' ]
text = 'The quick brown fox jumps over the lazy dog.'
for pattern in patterns:
print('Searching for "%s" in "%s" ->' % (pattern, text),)
if re.search(pattern, text):
print('Matched!')
else:
print('Not Matched!')
Searching for "fox" in "The quick brown fox jumps over the lazy dog." ->
Matched!
Searching for "dog" in "The quick brown fox jumps over the lazy dog." ->
Matched!
Searching for "horse" in "The quick brown fox jumps over the lazy dog." ->
Not Matched!
Regex for language that accepts strings
containing ‘ab’ as substring:
import re
inp = input("Enter string: ")
if re.search("(a+b)*ab(a+b)*",inp):
print("Match found.")
else:
print("Match not found.")
aaabab Match found.
ab Match found.
aaa Match not found.
Problem
Write a python program to remove leading zeros
from an IP address.
Code:
import re
ip= input("Enter IP address: ")
ip= ip.lstrip('0')
ip2= re.sub('\.0+','.',ip)
print (ip)
print (ip2)
Output:
Enter IP address: 0014.01.0003.3
14.01.0003.3
14.1.3.3
Example:
import re
x = re.search('[-a-z]+',"a-b@c")
print('output 1:\n',x.group())
Output:
output 1:
a-b
Problem
Write a regular expression which matches
strings which starts with a sequence of digits -
at least one digit - followed by a blank and after
this arbitrary characters.
Code:
import re
x = re.search('^[0-9]+ .*',"0123 abc")
print('output:\n',x.group())
Output:
0123 abc
#re.start() method
Problem
We have an imaginary phone list. Not all entries
contain a phone number, but if a phone number
exists it is the first part of an entry. Then follows
separated by a blank and a surname, which is
followed by first names. Surname and first
name are separated by a comma and a space.
The task is to print the list in following order:
fisrt_name last_name phone_number
Ex:
555-8396 Neu, Allison
555-5299 Putz, Lionel
555-7334 Simpson, Homer Jay
Expected output:
Allison Neu 555-8396
Lionel Putz 555-5299
Homer Jay Simpson 555-7334
Code:
import re
l = ["555-8396 Neu, Allison", "555-5299 Putz, Lionel", "555-7334 Simpson, Homer
Jay"]
for i in l:
res = re.search(r"([0-9-]+)\s([A-Za-z]+),\s([A-Za-z]+)", i)
print(res.group(3) + " " + res.group(2) + " " + res.group(1))
Output:
Allison Neu 555-8396
Lionel Putz 555-5299
Homer Jay Simpson 555-7334
Ex:
555-8396 Neu, Allison
Burns, Montgomery
555-5299 Putz, Lione
555-7334 Simpson, Homer Jay
Expected Output:
Allison Neu 555-8396
Montgomery Burns
Lionel Putz 555-5299
Homer Jay Simpson 555-7334
Code:
import re
l = ["555-8396 Neu, Allison", "Burns, Montgomery",
"555-5299 Putz, Lionel", "555-7334 Simpson, Homer Jay"]
for i in l:
res = re.search(r"([0-9-]*)\s*([A-Za-z]+),\s+([A-Za-z]+)", i)
print res.group(3) + " " + res.group(2) + " " + res.group(1)
RE that matches a string that has an ‘a’ followed by one or more b's.
re.search(“ab+”,inp)
RE that matches a string that has an ’a’ followed by two 'b'
re.search(“ab{2}”,inp)
RE that matches a string that has an ’a’ followed by two or three 'b'
re.search(“ab{2,3}”,inp)
RE that matches a string that has an 'a' followed by anything, ending in
'b'
re.search(“ a.*b$”,inp)
RE that matches matches a word at the beginning of a string.
re.search(“^\w+”,inp)
RE that matches a word containing 'z‘
re.search(“\w*z\w*”,inp)
RE to check for a number at the end of a string
re.search(“.*[0-9]$”,inp)
Write a Python program to extract year, month and date from an url.
url1="https://www.washingtonpost.com/news/football-
insider/wp/2016/09/02/odell-beckhams-fame-rests-on-one-stupid-
little-ball-josh-norman-tells-author/"
import re
def extract_date(url):
return re.findall(r'/(\d{4})/(\d{1,2})/(\d{1,2})/',
url)
url1=
"https://www.washingtonpost.com/news/football-
insider/wp/2016/09/02/odell-beckhams-fame-rests-
on-one-stupid-little-ball-josh-norman-tells-author/"
print(extract_date(url1))
Output:
[('2016', '09', '02')]
Write
a Python program to convert a date of yyyy-
mm-dd format to dd-mm-yyyy format.
dt1 = "2026-01-02“
Output:
Original date in YYY-MM-DD Format: 2026-01-02
New date in DD-MM-YYYY Format: 02-01-2026
Write a Python program to print the numbers of
a given string.
Output:
10
20
30
Write Python program to search the numbers (0-9) of
length between 1 to 3 in a given string.
Output:
Number of length 1 to 3
1
12
13
345
Write a Python program to find all words
starting with 'a' or 'e' in a given string.
Output:
['e', 'example', 'eates', 'an', 'ayList', 'a', 'apacity',
'elements']
Write a Python program to replace maximum 2
occurrences of space, comma, or dot with a
colon.
import re
text = 'Python Exercises, PHP exercises.'
print(re.sub("[ ,.]", ":", text, 2))
Output:
Python:Exercises: PHP exercises.
Write a Python program to find all five characters
long word in a string.
Output:
['quick', 'brown', 'jumps']
Write a Python program to replace all occurrences
of space, comma, or dot with a colon.
Output:
Python:Exercises::PHP:exercises:
Write a Python program to remove multiple
spaces in a string.
Output:
Original string: Python Exercises
Without extra spaces: Python Exercises
Write a Python program to remove everything
except alphanumeric characters from a string.
Output:
Original string: **//Python Exercises// - 12.
Without extra spaces: PythonExercises12
Writea Python program to split a string at
uppercase letters.
import re
text = "PythonTutorialAndExercises"
print(re.findall('[A-Z][^A-Z]*', text))
Output:
['Python', 'Tutorial', 'And', 'Exercises']
Write a Python program to remove the
parenthesis area in a string.
Output:
example
w3resource
github
stackoverflow
import re
items = ["example(.com)", "w3resource", "github(.com)",
"stackoverflow(.com)"]
for item in items:
print(re.sub(r"\([^)]+\)", "", item))
Output:
example
w3resource
github
stackoverflow
MCQ
1) What does the function re.match do?
a) matches a pattern at the start of the string
b) matches a pattern at any position in the string
c) such a function does not exist
d) none of the mentioned
Conti..
2) What does the function re.search do?
a) matches a pattern at the start of the string
b) matches a pattern at any position in the string
c) such a function does not exist
d) none of the mentioned
3) What will be the output of the following Python code?
sentence = 'we are humans’
matched = re.match(r'(.*) (.*?) (.*)', sentence)
print(matched.groups())
a)(‘we’, ‘are’, ‘humans’)
b)(we, are, humans)
c)(‘we’, ‘humans’)
d) ‘we are humans’
4). ________ matches the start of the string.
________ matches the end of the string.
a) ‘^’, ‘$’
b) ‘$’, ‘^’
c) ‘$’, ‘?’
d) ‘?’, ‘^’
5) What will be the output of the following Python function?
re.findall("hello world", "hello", 1)
a) [“hello”]
b) [ ]
c) hello
d) hello world
6)What will be the output of the following Python code?
re.sub('morning', 'evening', 'good morning’)
a) ‘good evening’
b) ‘good’
c) ‘morning’
d) ‘evening’
7) What will be the output of the following Python code?
re.findall('good', 'good is good')
re.findall('good', 'bad is good’)
a)
[‘good’, ‘good’]
[‘good’]
b)
(‘good’, ‘good’)
(good)
c)
(‘good’)
(‘good’)
d)
[‘good’]
[‘good’]
8) What will be the output of the following Python code?
import re
s = 'abc123 xyz666 lmn-11 def77'
re.sub(r'\b([a-z]+)(\d+)', r'\2\1:', s)
a) ‘123abc: 666xyz: lmn-11 77def:’
b) ‘77def: lmn-11: 666xyz: 123abc’
c) ‘abc123:’, ‘xyz666:’, ‘lmn-11:’, ‘def77:’
d) ‘abc123: xyz666: lmn-11: def77’
10) Which of the following statements regarding the output of
the function re.match is incorrect?
a) ‘pq*’ will match ‘pq’
b) ‘pq?’ matches ‘p’
c) ‘p{4}, q’ does not match ‘pppq’
d) ‘pq+’ matches ‘p
11) Which of the following lines of code will not show a
match?
a) re.match(‘ab*’, ‘a’)
b) re.match(‘ab*’, ‘ab’)
c) re.match(‘ab*’, ‘abb’)
d) re.match(‘ab*’, ‘ba’)
1) a
2) b
3) a
4) a
5) b
6) a
7) a
8) b
9) a
10) d
11) d