Data Science With Python - Intermediate Level V2.0
Data Science With Python - Intermediate Level V2.0
Data Science With Python - Intermediate Level V2.0
Intermediate Level
(This page has been left blank intentionally)
INDEX
PYTHON OOP 3
METACHARACTERS 40
SPECIAL SEQUENCES 43
REGEX MODULE IN PYTHON 44
MATCH OBJECT 53
SEARCH, MATCH AND FIND ALL 55
VERBOSE IN PYTHON REGEX 62
PASSWORD VALIDATION IN PYTHON 64
PYTHON COLLECTIONS 68
COUNTERS 68
ORDEREDDICT IN PYTHON 71
DEFAULTDICT IN PYTHON 75
CHAINMAP IN PYTHON 80
NAMEDTUPLE IN PYTHON 84
DEQUE IN PYTHON 88
HEAP QUEUE (OR HEAPQ) IN PYTHON 94
COLLECTIONS.USERDICT IN PYTHON 97
COLLECTIONS.USERLIST IN PYTHON 100
COLLECTIONS.USERSTRING IN PYTHON 103
A class is a user-defined blueprint or prototype from which objects are created. Classes
provide a means of bundling data and functionality together. Creating a new class creates a
new type of object, allowing new instances of that type to be made. Each class instance can
have attributes attached to it for maintaining its state. Class instances can also have methods
(defined by their class) for modifying their state.
To understand the need for creating a class let’s consider an example, let’s say you wanted to
track the number of dogs that may have different attributes like breed, age. If a list is used,
the first element could be the dog’s breed while the second element could represent its age.
Let’s suppose there are 100 different dogs, then how would you know which element is
supposed to be which? What if you wanted to add other properties to these dogs? This lacks
organization and it’s the exact need for classes.
Class creates a user-defined data structure, which holds its own data members and member
functions, which can be accessed and used by creating an instance of that class. A class is like
a blueprint for an object.
# Python3 program to
# demonstrate defining
# a class
class Dog:
pass
In the above example, the class keyword indicates that you are creating a class followed by
the name of the class (Dog in this case).
Class Objects
An Object is an instance of a Class. A class is like a blueprint while an instance is a copy of the
class with actual values. It’s not an idea anymore, it’s an actual dog, like a dog of breed pug
who’s seven years old. You can have many dogs to create many different instances, but
without the class as a guide, you would be lost, not knowing what information is required.
An object consists of :
When an object of a class is created, the class is said to be instantiated. All the instances share
the attributes and the behavior of the class. But the values of those attributes, i.e. the state
are unique for each object. A single class may have any number of instances.
Example:
Declaring an object –
# Python3 program to
# demonstrate instantiating
# a class
class Dog:
# A simple class
# attribute
attr1 = "mammal"
attr2 = "dog"
# A sample method
def fun(self):
print("I'm a", self.attr1)
print("I'm a", self.attr2)
# Driver code
# Object instantiation
Rodger = Dog()
Output:
mammal
I'm a mammal
I'm a dog
In the above example, an object is created which is basically a dog named Rodger. This class
only has two class attributes that tell us that Rodger is a dog and a mammal.
The self
• Class methods must have an extra first parameter in the method definition. We do not
give a value for this parameter when we call the method, Python provides it.
• If we have a method that takes no arguments, then we still have to have one
argument.
• This is similar to this pointer in C++ and this reference in Java.
When we call a method of this object as myobject.method(arg1, arg2), this is automatically
converted by Python into MyClass.method(myobject, arg1, arg2) – this is all the special self is
about.
__init__ method
The __init__ method is similar to constructors in C++ and Java. Constructors are used to
initializing the object’s state. Like methods, a constructor also contains a collection of
statements(i.e. instructions) that are executed at the time of Object creation. It runs as soon
as an object of a class is instantiated. The method is useful to do any initialization you want
to do with your object.
# Sample Method
def say_hi(self):
print('Hello, my name is', self.name)
p = Person('Nikhil')
p.say_hi()
Output:
Hello, my name is Nikhil
# Class Variable
animal = 'dog'
# Instance Variable
self.breed = breed
self.color = color
print('Rodger details:')
print('Rodger is a', Rodger.animal)
print('Breed: ', Rodger.breed)
print('Color: ', Rodger.color)
print('\nBuzo details:')
print('Buzo is a', Buzo.animal)
print('Breed: ', Buzo.breed)
print('Color: ', Buzo.color)
# Class variables can be accessed using class
# name also
print("\nAccessing class variable using class name")
print(Dog.animal)
Output:
Rodger details:
Rodger is a dog
Breed: Pug
Color: brown
Buzo details:
Buzo is a dog
Breed: Bulldog
Color: black
# Class Variable
animal = 'dog'
# Instance Variable
self.breed = breed
# Driver Code
Rodger = Dog("pug")
Rodger.setColor("brown")
print(Rodger.getColor())
Constructors in Python
Constructors are generally used for instantiating an object. The task of constructors is to
initialize(assign values) to the data members of the class when an object of the class is
created. In Python the __init__() method is called the constructor and is always called when
an object is created.
Syntax of constructor declaration:
def __init__(self):
# body of the constructor
Types of constructors:
class GeekforGeeks:
# default constructor
def __init__(self):
self.geek = "GeekforGeeks"
Output: GeekforGeeks
# parameterized constructor
def __init__(self, f, s):
self.first = f
self.second = s
def display(self):
print("First number = " + str(self.first))
print("Second number = " + str(self.second))
print("Addition of two numbers = " + str(self.answer))
def calculate(self):
self.answer = self.first + self.second
# perform Addition
obj.calculate()
# display result
obj.display()
Output:
First number = 1000
Second number = 2000
Addition of two numbers = 3000
Destructors in Python
Destructors are called when an object gets destroyed. In Python, destructors are not needed
as much as in C++ because Python has a garbage collector that handles memory management
automatically.
The __del__() method is a known as a destructor method in Python. It is called when all
references to the object have been deleted i.e when an object is garbage collected.
Syntax of destructor declaration:
def __del__(self):
# body of destructor
Note : A reference to objects is also deleted when the object goes out of reference or when
the program ends.
Example 1 : Here is the simple example of destructor. By using del keyword we deleted the
all references of object ‘obj’, therefore destructor invoked automatically.
# Initializing
def __init__(self):
print('Employee created.')
obj = Employee()
del obj
Output:
Employee created.
Destructor called, Employee deleted.
Note : The destructor was called after the program ended or when all the references to
object are deleted i.e when the reference count becomes zero, not when object went out of
scope.
Example 2 :This example gives the explanation of above mentioned note. Here, notice that
the destructor is called after the ‘Program End…’ printed.
class Employee:
# Initializing
def __init__(self):
print('Employee created')
# Calling destructor
def __del__(self):
print("Destructor called")
def Create_obj():
print('Making Object...')
obj = Employee()
print('function end...')
return obj
Output:
Calling Create_obj() function...
Making Object...
Employee created
function end...
Program End...
Destructor called
Example 3: Now, consider the following example:
class A:
def __init__(self, bb):
self.b = bb
class B:
def __init__(self):
self.a = A(self)
def __del__(self):
print("die")
def fun():
b = B()
fun()
Output:
die
In this example when the function fun() is called, it creates an instance of class B which passes
itself to class A, which then sets a reference to class B and resulting in a circular reference.
Generally, Python’s garbage collector which is used to detect these types of cyclic references
would remove it but in this example the use of custom destructor marks this item as
“uncollectable”.
Simply, it doesn’t know the order in which to destroy the objects, so it leaves them.
Therefore, if your instances are involved in circular references they will live in memory for
as long as the application run.
Inheritance in Python
Inheritance is the capability of one class to derive or inherit the properties from another class.
The benefits of inheritance are:
• It represents real-world relationships well.
• It provides reusability of a code. We don’t have to write the same code again and
again. Also, it allows us to add more features to a class without modifying it.
• It is transitive in nature, which means that if class B inherits from another class A, then
all the subclasses of B would automatically inherit from class A.
Below is a simple example of inheritance in Python.
class Person(object):
# Constructor
self.name = name
# To get name
def getName(self):
return self.name
def isEmployee(self):
return False
# Inherited or Subclass (Note Person in bracket)
class Employee(Person):
def isEmployee(self):
return True
# Driver code
print(emp.getName(), emp.isEmployee())
print(emp.getName(), emp.isEmployee())
Output:
Geek1 False
Geek2 True
Example:
class subclass_name (superclass_name):
___
___
# Python code to demonstrate how parent constructors
# are called.
# parent class
self.name = name
self.idnumber = idnumber
def display(self):
print(self.name)
print(self.idnumber)
# child class
self.salary = salary
self.post = post
a.display()
Output:
Rahul
886012
‘a’ is the instance created for the class Person. It invokes the __init__() of the referred class.
You can see ‘object’ written in the declaration of the class Person. In Python, every class
inherits from a built-in basic class called ‘object’. The constructor i.e. the ‘__init__’ function
of a class is invoked when we create an object variable or an instance of the class.
The variables defined within __init__() are called as the instance variables or objects. Hence,
‘name’ and ‘idnumber’ are the objects of the class Person. Similarly, ‘salary’ and ‘post’ are the
objects of the class Employee. Since the class Employee inherits from class Person, ‘name’
and ‘idnumber’ are also the objects of class Employee.
If you forget to invoke the __init__() of the parent class then its instance variables would not
be available to the child class.
The following code produces an error for the same reason.
class A:
self.name = n
class B(A):
self.roll = roll
object = B(23)
print (object.name)
Output:
Traceback (most recent call last):
File "/home/de4570cca20263ac2c4149f435dba22c.py", line 12, in
print (object.name)
AttributeError: 'B' object has no attribute 'name'
Different forms of Inheritance:
Inheritance is defined as the capability of one class to derive or inherit the properties from
some other class and use it whenever needed. Inheritance provides the following properties:
Example:
# inheritance
class Child:
# Constructor
self.name = name
# To get name
def getName(self):
return self.name
def isStudent(self):
return False
class Student(Child):
# True is returned
def isStudent(self):
return True
# Driver code
# An Object of Child
std = Child("Ram")
print(std.getName(), std.isStudent())
# An Object of Student
std = Student("Shivam")
print(std.getName(), std.isStudent())
Output:
Ram False
Shivam True
Types of Inheritance in Python
Types of Inheritance depends upon the number of child and parent classes involved. There
are four types of inheritance in Python:
Single Inheritance: Single inheritance enables a derived class to inherit properties from a
single parent class, thus enabling code reusability and the addition of new features to existing
code.
Example:
# single inheritance
# Base class
class Parent:
def func1(self):
# Derived class
class Child(Parent):
def func2(self):
# Driver's code
object = Child()
object.func1()
object.func2()
Output:
This function is in parent class.
This function is in child class.
Multiple Inheritance: When a class can be derived from more than one base class this type
of inheritance is called multiple inheritance. In multiple inheritance, all the features of the
base classes are inherited into the derived class.
Example:
# multiple inheritance
# Base class1
class Mother:
mothername = ""
def mother(self):
print(self.mothername)
# Base class2
class Father:
fathername = ""
def father(self):
print(self.fathername)
# Derived class
def parents(self):
print("Father :", self.fathername)
# Driver's code
s1 = Son()
s1.fathername = "RAM"
s1.mothername = "SITA"
s1.parents()
Output:
Father : RAM
Mother : SITA
Multilevel Inheritance: In multilevel inheritance, features of the base class and the derived
class are further inherited into the new derived class. This is similar to a relationship
representing a child and grandfather.
Example:
# multilevel inheritance
# Base class
class Grandfather:
self.grandfathername = grandfathername
# Intermediate class
class Father(Grandfather):
self.fathername = fathername
Grandfather.__init__(self, grandfathername)
# Derived class
class Son(Father):
self.sonname = sonname
def print_name(self):
# Driver code
print(s1.grandfathername)
s1.print_name()
Output:
Lal mani
Grandfather name : Lal mani
Father name : Rampal
Son name : Prince
Hierarchical Inheritance: When more than one derived classes are created
from a single base this type of inheritance is called hierarchical inheritance. In
this program, we have a parent (base) class and two child (derived) classes.
Example:
# Hierarchical inheritance
# Base class
class Parent:
def func1(self):
# Derived class1
class Child1(Parent):
def func2(self):
class Child2(Parent):
def func3(self):
# Driver's code
object1 = Child1()
object2 = Child2()
object1.func1()
object1.func2()
object2.func1()
object2.func3()
Output:
This function is in parent class.
This function is in child 1.
This function is in parent class.
This function is in child 2.
Hybrid Inheritance: Inheritance consisting of multiple types of inheritance is called hybrid
inheritance.
Example:
# hybrid inheritance
class School:
def func1(self):
class Student1(School):
def func2(self):
print("This function is in student 1. ")
class Student2(School):
def func3(self):
def func4(self):
# Driver's code
object = Student3()
object.func1()
object.func2()
Output:
This function is in school.
This function is in student 1.
Encapsulation in Python
Consider a real-life example of encapsulation, in a company, there are different sections like
the accounts section, finance section, sales section etc. The finance section handles all the
financial transactions and keeps records of all the data related to finance. Similarly, the sales
section handles all the sales-related activities and keeps records of all the sales. Now there
may arise a situation when for some reason an official from the finance section needs all the
data about sales in a particular month. In this case, he is not allowed to directly access the
data of the sales section. He will first have to contact some other officer in the sales section
and then request him to give the particular data. This is what encapsulation is. Here the data
of the sales section and the employees that can manipulate them are wrapped under a single
name “sales section”. Using encapsulation also hides the data. In this example, the data of
the sections like sales, finance, or accounts are hidden from any other section.
Protected members
Protected members (in C++ and JAVA) are those members of the class that cannot be accessed
outside the class but can be accessed from within the class and its subclasses. To accomplish
this in Python, just follow the convention by prefixing the name of the member by a single
underscore “_”.
Although the protected variable can be accessed out of the class as well as in the derived
class(modified too in derived class), it is customary(convention not a rule) to not access the
protected out the class body.
Note: The __init__ method is a constructor and runs as soon as an object of a class is
instantiated.
# Python program to
class Base:
def __init__(self):
# Protected member
self._a = 2
class Derived(Base):
def __init__(self):
# Calling constructor of
# Base class
Base.__init__(self)
self._a)
self._a = 3
self._a)
obj1 = Derived()
obj2 = Base()
Output:
Calling protected member of base class: 2
Calling modified protected member outside class: 3
Accessing protected member of obj1: 3
Accessing protected member of obj2: 2
Private members
Private members are similar to protected members, the difference is that the class members
declared private should neither be accessed outside the class nor by any base class. In Python,
there is no existence of Private instance variables that cannot be accessed except inside a
class.
However, to define a private member prefix the member name with double underscore “__”.
Note: Python’s private and protected members can be accessed outside the class
through python name mangling.
# Python program to
class Base:
def __init__(self):
self.a = "GeeksforGeeks"
self.__c = "GeeksforGeeks"
class Derived(Base):
def __init__(self):
# Calling constructor of
# Base class
Base.__init__(self)
print(self.__c)
# Driver code
obj1 = Base()
print(obj1.a)
# raise an AttributeError
Output:
GeeksforGeeks
Traceback (most recent call last):
File "/home/f4905b43bfcf29567e360c709d3c52bd.py", line 25, in <module>
print(obj1.c)
AttributeError: 'Base' object has no attribute 'c'
Polymorphism in Python
The word polymorphism means having many forms. In programming, polymorphism means
the same function name (but different signatures) being used for different types.
Output:
5
3
Examples of user-defined polymorphic functions:
# Driver code
print(add(2, 3))
print(add(2, 3, 4))
Output:
5
9
Polymorphism with class methods:
The below code shows how Python can use two different class types, in the same way. We
create a for loop that iterates through a tuple of objects. Then call the methods without being
concerned about which class type each object is. We assume that these methods actually
exist in each class.
class India():
def capital(self):
print("New Delhi is the capital of India.")
def language(self):
print("Hindi is the most widely spoken language of India.")
def type(self):
print("India is a developing country.")
class USA():
def capital(self):
print("Washington, D.C. is the capital of USA.")
def language(self):
print("English is the primary language of USA.")
def type(self):
print("USA is a developed country.")
obj_ind = India()
obj_usa = USA()
for country in (obj_ind, obj_usa):
country.capital()
country.language()
country.type()
Output:
New Delhi is the capital of India.
Hindi is the most widely spoken language of India.
India is a developing country.
Washington, D.C. is the capital of USA.
English is the primary language of USA.
USA is a developed country.
class Bird:
def intro(self):
print("There are many types of birds.")
def flight(self):
print("Most of the birds can fly but some cannot.")
class sparrow(Bird):
def flight(self):
print("Sparrows can fly.")
class ostrich(Bird):
def flight(self):
print("Ostriches cannot fly.")
obj_bird = Bird()
obj_spr = sparrow()
obj_ost = ostrich()
obj_bird.intro()
obj_bird.flight()
obj_spr.intro()
obj_spr.flight()
obj_ost.intro()
obj_ost.flight()
Output:
There are many types of birds.
Most of the birds can fly but some cannot.
There are many types of birds.
Sparrows can fly.
There are many types of birds.
Ostriches cannot fly.
def func(obj):
obj.capital()
obj.language()
obj.type()
obj_ind = India()
obj_usa = USA()
func(obj_ind)
func(obj_usa)
class India():
def capital(self):
print("New Delhi is the capital of India.")
def language(self):
print("Hindi is the most widely spoken language of India.")
def type(self):
print("India is a developing country.")
class USA():
def capital(self):
print("Washington, D.C. is the capital of USA.")
def language(self):
print("English is the primary language of USA.")
def type(self):
print("USA is a developed country.")
def func(obj):
obj.capital()
obj.language()
obj.type()
obj_ind = India()
obj_usa = USA()
func(obj_ind)
func(obj_usa)
Output:
New Delhi is the capital of India.
Hindi is the most widely spoken language of India.
India is a developing country.
Washington, D.C. is the capital of USA.
English is the primary language of USA.
USA is a developed country.
All objects share class or static variables. An instance or non-static variables are different for
different objects (every object has a copy). For example, let a Computer Science Student be
represented by class CSStudent. The class may have a static variable whose value is “cse” for
all objects. And class may also have non-static members like name and roll. In C++ and Java,
we can use static keywords to make a variable a class variable. The variables which don’t have
a preceding static keyword are instance variables.
The Python approach is simple; it doesn’t require a static keyword.
All variables which are assigned a value in the class declaration are class variables. And
variables that are assigned values inside methods are instance variables.
class CSStudent:
a = CSStudent('Geek', 1)
b = CSStudent('Nerd', 2)
# name also
a.stream = 'ece'
# To change the stream for all instances of the class we can change it
CSStudent.stream = 'mech'
Class Method
The @classmethod decorator is a built-in function decorator that is an expression that gets
evaluated after your function is defined. The result of that evaluation shadows your function
definition.
A class method receives the class as an implicit first argument, just like an instance method
receives the instance
Syntax:
class C(object):
@classmethod
def fun(cls, arg1, arg2, ...):
....
fun: function that needs to be converted into a class method
returns: a class method for function.
• A class method is a method that is bound to the class and not the object of the class.
• They have the access to the state of the class as it takes a class parameter that points
to the class and not the object instance.
• It can modify a class state that would apply across all the instances of the class. For
example, it can modify a class variable that will be applicable to all the instances.
Static Method
A static method does not receive an implicit first argument.
Syntax:
class C(object):
@staticmethod
def fun(arg1, arg2, ...):
...
returns: a static method for function fun.
• A static method is also a method that is bound to the class and not the object of the
class.
• A static method can’t access or modify the class state.
• It is present in a class because it makes sense for the method to be present in class.
Class method vs Static Method
• A class method takes cls as the first parameter while a static method needs no specific
parameters.
• A class method can access or modify the class state while a static method can’t access
or modify it.
• In general, static methods know nothing about the class state. They are utility-type
methods that take some parameters and work upon those parameters. On the other-
hand class methods must have class as a parameter.
• We use @classmethod decorator in python to create a class method and we use
@staticmethod decorator to create a static method in python.
When to use what?
• We generally use class method to create factory methods. Factory methods return
class objects ( similar to a constructor ) for different use cases.
• We generally use static methods to create utility functions.
How to define a class method and a static method?
To define a class method in python, we use @classmethod decorator, and to define a static
method we use @staticmethod decorator.
Let us look at an example to understand the difference between both of them. Let us say we
want to create a class Person. Now, python doesn’t support method overloading like C++ or
Java so we use class methods to create factory methods. In the below example we use a class
method to create a person object from birth year.
As explained above we use static methods to create utility functions. In the below example
we use a static method to check if a person is an adult or not.
Implementation
class Person:
self.name = name
self.age = age
@classmethod
@staticmethod
def isAdult(age):
print (person1.age)
print (person2.age)
print (Person.isAdult(22))
Output:
21
25
True
Regular Expression in Python
A Regular Expressions (RegEx) is a special sequence of characters that uses a search pattern
to find a string or set of strings. It can detect the presence or absence of a text by matching
with a particular pattern, and also can split a pattern into one or more sub-patterns. Python
provides a re module that supports the use of regex in Python. Its primary function is to offer
a search, where it takes a regular expression and a string. Here, it either returns the first match
or else none.
Example:
import re
match = re.search(r'portal', s)
Output
Start Index: 34
End Index: 40
The above code gives the starting index and the ending index of the string portal.
Note: Here r character (r’portal’) stands for raw, not regex. The raw string is slightly different
from a regular string, it won’t interpret the \ character as an escape character. This is because
the regular expression engine uses \ character for its own escaping purpose.
Before starting with the Python regex module let’s see how to actually write regex using
metacharacters or special sequences.
MetaCharacters
To understand the RE analogy, MetaCharacters are useful, important, and will be used in
functions of module re. Below is the list of metacharacters.
MetaCharacters Description
\ Used to drop the special meaning of character following it
[] Represent a character class
^ Matches the beginning
$ Matches the end
. Matches any character except newline
| Means OR (Matches with any of the characters separated by it.
? Matches zero or one occurrence
* Any number of occurrences (including 0 occurrences)
+ One or more occurrences
{} Indicate the number of occurrences of a preceding regex to match.
() Enclose a group of Regex
Let’s discuss each of these metacharacters in detail
\ – Backslash
The backslash (\) makes sure that the character is not treated in a special way. This can be
considered a way of escaping metacharacters. For example, if you want to search for the dot(.)
in the string then you will find that dot(.) will be treated as a special character as is one of the
metacharacters (as shown in the above table). So for this case, we will use the backslash(\)
just before the dot(.) so that it will lose its specialty. See the below example for a better
understanding.
Example:
import re
s = 'geeks.forgeeks'
# without using \
match = re.search(r'.', s)
print(match)
# using \
match = re.search(r'\.', s)
print(match)
Output
<_sre.SRE_Match object; span=(0, 1), match='g'>
<_sre.SRE_Match object; span=(5, 6), match='.'>
[] – Square Brackets
Square Brackets ([]) represents a character class consisting of a set of characters that we wish
to match. For example, the character class [abc] will match any single a, b, or c.
We can also specify a range of characters using – inside the square brackets. For example,
• ^g will check if the string starts with g such as geeks, globe, girl, g, etc.
• ^ge will check if the string starts with ge such as geeks, geeksforgeeks, etc.
$ – Dollar
Dollar($) symbol matches the end of the string i.e checks whether the string ends with the
given character(s) or not. For example –
• s$ will check for the string that ends with a such as geeks, ends, s, etc.
• ks$ will check for the string that ends with ks such as geeks, geeksforgeeks, ks, etc.
. – Dot
Dot(.) symbol matches only a single character except for the newline character (\n). For
example –
• a.b will check for the string that contains any character at the place of the dot such as
acb, acbd, abbb, etc
• .. will check if the string contains at least 2 characters
| – Or
Or symbol works as the or operator meaning it checks whether the pattern before or after
the or symbol is present in the string or not. For example –
• a|b will match any string that contains a or b such as acd, bcd, abcd, etc.
? – Question Mark
Question mark(?) checks if the string before the question mark in the regex occurs at least
once or not at all. For example –
• ab?c will be matched for the string ac, acb, dabc but will not be matched for abbc
because there are two b. Similarly, it will not be matched for abdc because b is not
followed by c.
* – Star
Star (*) symbol matches zero or more occurrences of the regex preceding the * symbol. For
example –
• ab*c will be matched for the string ac, abc, abbbc, dabc, etc. but will not be matched
for abdc because b is not followed by c.
+ – Plus
Plus (+) symbol matches one or more occurrences of the regex preceding the + symbol. For
example –
• ab+c will be matched for the string abc, abbc, dabc, but will not be matched for ac,
abdc because there is no b in ac and b is not followed by c in abdc.
{m, n} – Braces
Braces match any repetitions preceding regex from m to n both inclusive. For example –
• a{2, 4} will be matched for the string aaab, baaaac, gaad, but will not be matched for
strings like abc, bc because there is only one a or no a in both the cases.
(<regex>) – Group
Group symbol is used to group sub-patterns. For example –
• (a|b)cd will match for strings like acd, abcd, gacd, etc.
Special Sequences
Special sequences do not match for the actual character in the string instead it tells the
specific location in the search string where the match must occur. It makes it easier to write
commonly used patterns.
List of special sequences
Special Description Examples
Sequence
\A Matches if the string begins with the given \Afor for geeks
character for the world
\b Matches if the word begins or ends with the \bge geeks
given character. \b(string) will check for the get
beginning of the word and (string)\b will check
for the ending of the word.
\B It is the opposite of the \b i.e. the string should \Bge together
not start or end with the given regex. forge
\d Matches any decimal digit, this is equivalent to \d 123
the set class [0-9] gee1
\D Matches any non-digit character, this is \D geeks
equivalent to the set class [^0-9] geek1
\s Matches any whitespace character. \s gee ks
a bc a
\S Matches any non-whitespace character \S a bd
abcd
\w Matches any alphanumeric character, this is \w 123
equivalent to the class [a-zA-Z0-9_]. geeKs4
\W Matches any non-alphanumeric character. \W >$
gee<>
\Z Matches if the string ends with the given regex ab\Z abcdab
abababab
Python has a module named re that is used for regular expressions in Python. We can import
this module by using the import statement.
Example: Importing re module in Python
import re
Let’s see various functions provided by this module to work with regex in Python.
re.findall()
Return all non-overlapping matches of pattern in string, as a list of strings. The string is
scanned left-to-right, and matches are returned in the order found.
Example: Finding all occurrences of a pattern
# findall()
import re
# is searched.
regex = '\d+'
print(match)
Output
['123456789', '987654321']
re.compile()
Regular expressions are compiled into pattern objects, which have methods for various
operations such as searching for pattern matches or performing string substitutions.
Example 1:
# using __import__().
import re
p = re.compile('[a-e]')
Output:
['e', 'a', 'd', 'b', 'e', 'a']
Understanding the Output:
• First occurrence is ‘e’ in “Aye” and not ‘A’, as it being Case Sensitive.
• Next Occurrence is ‘a’ in “said”, then ‘d’ in “said”, followed by ‘b’ and ‘e’ in “Gibenson”,
the Last ‘a’ matches with “Stark”.
• Metacharacter backslash ‘\’ has a very important role as it signals various sequences.
If the backslash is to be used without its special meaning as metacharacter, use’\\’
Example 2: Set class [\s,.] will match any whitespace character, ‘,’, or, ‘.’ .
import re
# \d is equivalent to [0-9].
p = re.compile('\d')
p = re.compile('\d+')
Output:
['1', '1', '4', '1', '8', '8', '6']
['11', '4', '1886']
Example 3:
import re
# \w is equivalent to [a-zA-Z0-9_].
p = re.compile('\w')
p = re.compile('\w+')
p = re.compile('\W')
Output:
['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_',
'l', 'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said', 'in',
'some_language']
[' ', ' ', '*', '*', '*', ' ', ' ', '.']
Example 4:
import re
# of a character.
p = re.compile('ab*')
print(p.findall("ababbaabbb"))
Output:
['ab', 'abb', 'a', 'abbb']
Understanding the Output:
• Our RE is ab*, which ‘a’ accompanied by any no. of ‘b’s, starting from 0.
• Output ‘ab’, is valid because of single ‘a’ accompanied by single ‘b’.
• Output ‘abb’, is valid because of single ‘a’ accompanied by 2 ‘b’.
• Output ‘a’, is valid because of single ‘a’ accompanied by 0 ‘b’.
• Output ‘abbb’, is valid because of single ‘a’ accompanied by 3 ‘b’.
re.split()
Split string by the occurrences of a character or a pattern, upon finding that pattern, the
remaining characters from the string are returned as part of the resulting list.
Syntax:
re.split(pattern, string, maxsplit=0, flags=0)
The First parameter, pattern denotes the regular expression, string is the given string in which
pattern will be searched for and in which splitting occurs, maxsplit if not provided is
considered to be zero ‘0’, and if any nonzero value is provided, then at most that many splits
occur. If maxsplit = 1, then the string will split once only, resulting in a list of length 2. The
flags are very useful and can help to shorten code, they are not necessary parameters, eg:
flags = re.IGNORECASE, in this split, the case, i.e. the lowercase or the uppercase will be
ignored.
Example 1:
Output:
['Words', 'words', 'Words']
['Word', 's', 'words', 'Words']
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']
Example 2:
import re
# flags = re.IGNORECASE
Output:
['On ', 'th Jan 2016, at 11:02 AM']
['', 'y, ', 'oy oh ', 'oy, ', 'om', ' h', 'r', '']
['A', 'y, Boy oh ', 'oy, ', 'om', ' h', 'r', '']
re.sub()
The ‘sub’ in the function stands for SubString, a certain regular expression pattern is searched
in the given string(3rd parameter), and upon finding the substring pattern is replaced by
repl(2nd parameter), count checks and maintains the number of times this occurs.
Syntax:
re.sub(pattern, repl, string, count=0, flags=0)
Example 1:
import re
flags=re.IGNORECASE))
count=1, flags=re.IGNORECASE))
flags=re.IGNORECASE))
Output
S~*ject has ~*er booked already
S~*ject has Uber booked already
S~*ject has Uber booked already
Baked Beans & Spam
re.subn()
subn() is similar to sub() in all ways, except in its way of providing output. It returns a tuple
with count of the total of replacement and the new string rather than just the string.
Syntax:
re.subn(pattern, repl, string, count=0, flags=0)
Example:
import re
flags=re.IGNORECASE)
print(t)
print(len(t))
print(t[0])
Output
('S~*ject has Uber booked already', 1)
('S~*ject has ~*er booked already', 2)
Length of Tuple is: 2
S~*ject has ~*er booked already
re.escape()
Returns string with all non-alphanumerics backslashed, this is useful if you want to match an
arbitrary literal string that may have regular expression metacharacters in it.
Syntax:
re.escape(string)
Example:
import re
Output
This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\]\,\ he\ said\ \ \ \^WoW
re.search()
This method either returns None (if the pattern doesn’t match), or a re.MatchObject contains
information about the matching part of the string. This method stops after the first match, so
this is best suited for testing a regular expression more than extracting data.
Example: Searching an occurrence of the pattern
import re
if match != None:
# In particular:
else:
Output
Match at index 14, 21
Full match: June 24
Month: June
Day: 24
Match Object
A Match object contains all the information about the search and the result and if there is no
match found then None will be returned. Let’s see some of the commonly used methods and
attributes of the match object.
Getting the string and the regex
match.re attribute returns the regular expression passed and match.string attribute returns
the string passed.
Example: Getting the string and the regex of the matched object
import re
s = "Welcome to GeeksForGeeks"
# here x is the match object
res = re.search(r"\bG", s)
print(res.re)
print(res.string)
Output
re.compile('\\bG')
Welcome to GeeksForGeeks
Getting index of matched object
• start() method returns the starting index of the matched substring
• end() method returns the ending index of the matched substring
• span() method returns a tuple containing the starting and the ending index of the
matched substring
Example: Getting index of matched object
import re
s = "Welcome to GeeksForGeeks"
res = re.search(r"\bGee", s)
print(res.start())
print(res.end())
print(res.span())
Output
11
14
(11, 14)
Getting matched substring
group() method returns the part of the string for which the patterns match. See the below
example for a better understanding.
Example: Getting matched substring
import re
s = "Welcome to Python"
print(res.group())
Output
me t
In the above example, our pattern specifies for the string that contains at least 2 characters
which are followed by a space, and that space is followed by a t.
The module re provides support for regular expressions in Python. Below are main methods
in this module.
Searching an occurrence of pattern
re.search(): This method either returns None (if the pattern doesn’t match), or a
re.MatchObject that contains information about the matching part of the string. This method
stops after the first match, so this is best suited for testing a regular expression more than
extracting data.
import re
# In particular:
else:
Output:
Match at index 14, 21
Full match: June 24
Month: June
Day: 24
Matching a Pattern with Text
re.match(): This function attempts to match pattern to whole string. The re.match function
returns a match object on success, None on failure.
re.match(pattern, string, flags=0)
# of re.match().
import re
def findMonthAndDate(string):
if match == None:
return
findMonthAndDate("Jun 24")
print("")
Output:
Given Data: Jun 24
Month: Jun
Day: 24
# findall()
import re
# is searched.
regex = '\d+'
print(match)
import re
if match != None:
# In particular:
else:
Output:
Match at index 14, 21
Full match: June 24
Month: June
Day: 24
re.findall()
Return all non-overlapping matches of pattern in string, as a list of strings. The string is
scanned left-to-right, and matches are returned in the order found.
Example:
# findall()
import re
# is searched.
print(match)
Output:
['123456789', '987654321']
re.VERBOSE : This flag allows you to write regular expressions that look nicer and are more
readable by allowing you to visually separate logical sections of the pattern and add
comments.
Whitespace within the pattern is ignored, except when in a character class, or when preceded
by an unescaped backslash, or within tokens like *?, (?: or (?P. When a line contains a
# that is not in a character class and is not preceded by an unescaped backslash, all characters
from the leftmost such # through the end of the line are ignored.
# Using VERBOSE
regex_email = re.compile(r"""
^([a-z0-9_\.-]+) # local Part
@ # single @ sign
([0-9a-z\.-]+) # Domain name
\. # single Dot .
([a-z]{2,6})$ # Top level Domain
""",re.VERBOSE | re.IGNORECASE)
Input: [email protected]@
Output: Invalid
This is invalid because there is @ after the top level domain name.
Below is the Python implementation –
def validate_email(email):
else:
#If match is not found,string is invalid
print("{} is Invalid".format(email))
# Driver Code
validate_email("[email protected]")
validate_email("[email protected]@")
validate_email("[email protected]")
Output:
[email protected] is Valid. Details are as follow:
Local:expectopatronum
Domain:gmail
Top Level domain:com
[email protected]@ is Invalid
[email protected] is Invalid
def password_check(passwd):
val = True
if len(passwd) < 6:
val = False
val = False
val = False
val = False
val = False
if not any(char in SpecialSym for char in passwd):
val = False
if val:
return val
# Main method
def main():
passwd = 'Geek12@'
if (password_check(passwd)):
print("Password is valid")
else:
# Driver Code
if __name__ == '__main__':
main()
Output:
Password is valid
This code used boolean functions to check if all the conditions were satisfied or not. We see
that though the complexity of the code is basic, the length is considerable.
# importing re library
import re
def main():
passwd = 'Geek12@'
reg = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*#?&])[A-Za-
z\d@$!#%*?&]{6,20}$"
# compiling regex
pat = re.compile(reg)
# searching regex
# validating conditions
if mat:
print("Password is valid.")
else:
# Driver Code
if __name__ == '__main__':
main()
Output:
Password is valid.
Python Collections
The collection Module in Python provides different types of containers. A Container is an
object that is used to store different objects and provide a way to access the contained objects
and iterate over them. Some of the built-in containers are Tuple, List, Dictionary, etc. In this
article, we will discuss the different containers provided by the collections module as below.
• Counters
• OrderedDict
• DefaultDict
• ChainMap
• NamedTuple
• DeQue
• UserDict
• UserList
• UserString
Counters
# Counter
print(Counter(['B','B','A','B','C','A','B','B','A','C']))
# with dictionary
coun = Counter()
coun.update([1, 2, 3, 1, 2, 1, 1, 2])
print(coun)
coun.update([1, 2, 4])
print(coun)
Output:
Counter({1: 4, 2: 3, 3: 1})
Counter({1: 5, 2: 4, 3: 1, 4: 1})
• Data can be provided in any of the three ways as mentioned in initialization and the
counter’s data will be increased not replaced.
• Counts can be zero and negative also.
c1.subtract(c2)
print(c1)
Output :
Counter({'c': 6, 'B': 0, 'A': -6})
# Create a list
print(Counter(z))
Output:
Counter({'blue': 3, 'red': 2, 'yellow': 1})
OrderedDict in Python
An OrderedDict is a dictionary subclass that remembers the order that keys were first
inserted. The only difference between dict() and OrderedDict() is that:
OrderedDict preserves the order in which the keys are inserted. A regular dict doesn’t track
the insertion order and iterating it gives the values in an arbitrary order. By contrast, the order
the items are inserted is remembered by OrderedDict.
print("This is a Dict:\n")
d = {}
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4
print(key, value)
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
print(key, value)
Output:
This is a Dict:
a 1
c 3
b 2
d 4
print("Before:\n")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
print(key, value)
print("\nAfter:\n")
od['c'] = 5
print(key, value)
Output:
Before:
a 1
b 2
c 3
d 4
After:
a 1
b 2
c 5
d 4
• Deletion and Re-Inserting: Deleting and re-inserting the same key will push it to the
back as OrderedDict, however, maintains the order of insertion.
# re-insertion in OrderedDict
print("Before deleting:\n")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
print(key, value)
print("\nAfter deleting:\n")
od.pop('c')
print(key, value)
print("\nAfter re-inserting:\n")
od['c'] = 3
print(key, value)
Output:
Before deleting:
a 1
b 2
c 3
d 4
After deleting:
a 1
b 2
d 4
After re-inserting:
a 1
b 2
d 4
c 3
Other Considerations:
• Ordered dict in Python version 2.7 consumes more memory than normal dict. This is
due to the underlying Doubly Linked List implementation for keeping the order. In
Python 2.7 Ordered Dict is not dict subclass, it’s a specialized container from
collections module.
• Starting from Python 3.7, insertion order of Python dictionaries is guaranteed.
• Ordered Dict can be used as a stack with the help of popitem function. Try
implementing LRU cache with Ordered Dict.
Defaultdict in Python
Dictionary in Python is an unordered collection of data values that are used to store data
values like a map. Unlike other Data Types that hold only single value as an element, the
Dictionary holds key-value pair. In Dictionary, the key must be unique and immutable. This
means that a Python Tuple can be a key whereas a Python List can not. A Dictionary can be
created by placing a sequence of elements within curly {} braces, separated by ‘comma’.
Example:
# dictionary
print("Dictionary:")
print(Dict)
print(Dict[1])
Output:
Dictionary:
{1: 'Geeks', 2: 'For', 3: 'Geeks'}
Geeks
Traceback (most recent call last):
File "/home/1ca83108cc81344dc7137900693ced08.py", line 11, in
print(Dict[4])
KeyError: 4
Sometimes, when the KeyError is raised, it might become a problem. To overcome this Python
introduces another dictionary like container known as Defaultdict which is present inside the
collections module.
Defaultdict is a container like dictionaries present in the module collections. Defaultdict is a
sub-class of the dictionary class that returns a dictionary-like object. The functionality of both
dictionaries and defaultdict are almost same except for the fact that defaultdict never raises
a KeyError. It provides a default value for the key that does not exists.
Syntax: defaultdict(default_factory)
Parameters:
• default_factory: A function returning the default value for the dictionary defined. If
this argument is absent then the dictionary raises a KeyError.
Example:
# defaultdict
# present
def def_value():
d = defaultdict(def_value)
d["a"] = 1
d["b"] = 2
print(d["a"])
print(d["b"])
print(d["c"])
Output:
1
2
Not Present
Inner Working of defaultdict
Defaultdict adds one writable instance variable and one method in addition to the standard
dictionary operations. The instance variable is the default_factory parameter and the method
provided is __missing__.
• Default_factory: It is a function returning the default value for the dictionary defined.
If this argument is absent then the dictionary raises a KeyError.
Example:
# default_factory argument of
# defaultdict
d["a"] = 1
d["b"] = 2
print(d["a"])
print(d["b"])
print(d["c"])
Output:
1
2
Not Present
• __missing__(): This function is used to provide the default value for the dictionary.
This function takes default_factory as an argument and if this argument is None, a
KeyError is raised otherwise it provides a default value for the given key. This method
is basically called by the __getitem__() method of the dict class when the requested
key is not found. __getitem__() raises or return the value returned by the
__missing__(). method.
Example:
# defaultdict
d["a"] = 1
d["b"] = 2
print(d.__missing__('a'))
print(d.__missing__('d'))
Output:
Not Present
Not Present
Using List as default_factory
When the list class is passed as the default_factory argument, then a defaultdict is created
with the values that are list.
Example:
# defaultdict
# Defining a dict
d = defaultdict(list)
for i in range(5):
d[i].append(i)
print(d)
Output:
Dictionary with values as list:
defaultdict(<class 'list'>, {0: [0], 1: [1], 2: [2], 3: [3], 4: [4]})
Using int as default_factory
When the int class is passed as the default_factory argument, then a defaultdict is created
with default value as zero.
Example:
# defaultdict
d = defaultdict(int)
L = [1, 2, 3, 4, 2, 4, 1, 2]
for i in L:
# so there is no need to
d[i] += 1
print(d)
Output:
defaultdict(<class 'int'>, {1: 2, 2: 3, 3: 1, 4: 2})
ChainMap in Python
Python contains a container called “ChainMap” which encapsulates many dictionaries into
one unit. ChainMap is member of module “collections“.
Example:
# ChainMap
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}
d3 = {'e': 5, 'f': 6}
print(c)
Output:
ChainMap({'a': 1, 'b': 2}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6})
Let’s see various Operations on ChainMap
Access Operations
• keys() :- This function is used to display all the keys of all the dictionaries in
ChainMap.
• values() :- This function is used to display values of all the dictionaries in ChainMap.
• maps() :- This function is used to display keys with corresponding values of all the
dictionaries in ChainMap.
import collections
# initializing dictionaries
# initializing ChainMap
print (chain.maps)
print (list(chain.keys()))
print (list(chain.values()))
Output:
All the ChainMap contents are :
[{'b': 2, 'a': 1}, {'c': 4, 'b': 3}]
All keys of ChainMap are :
['a', 'c', 'b']
All values of ChainMap are :
[1, 4, 2]
Note: Notice the key named “b” exists in both dictionaries, but only first dictionary key is
taken as key value of “b”. Ordering is done as the dictionaries are passed in function.
Manipulating Operations
• new_child() :- This function adds a new dictionary in the beginning of the ChainMap.
• reversed() :- This function reverses the relative ordering of dictionaries in the
ChainMap.
import collections
# initializing dictionaries
# initializing ChainMap
print (chain.maps)
chain1 = chain.new_child(dic3)
print (chain1.maps)
print (chain1['b'])
chain1.maps = reversed(chain1.maps)
print (chain1['b'])
Output:
All the ChainMap contents are :
[{'b': 2, 'a': 1}, {'b': 3, 'c': 4}]
Displaying new ChainMap :
[{'f': 5}, {'b': 2, 'a': 1}, {'b': 3, 'c': 4}]
Value associated with b before reversing is : 2
Value associated with b after reversing is : 3
Namedtuple in Python
Python supports a type of container like dictionaries called “namedtuple()” present in the
module, “collections“. Like dictionaries, they contain keys that are hashed to a particular
value. But on contrary, it supports both access from key-value and iteration, the
functionality that dictionaries lack.
Example:
# Declaring namedtuple()
# Adding values
print(S[1])
print(S.name)
Output:
The Student age using index is : 19
The Student name using keyname is : Nandini
Let’s see various Operations on namedtuple()
Access Operations
• Access by index: The attribute values of namedtuple() are ordered and can be
accessed using the index number unlike dictionaries which are not accessible by index.
• Access by keyname: Access by keyname is also allowed as in dictionaries.
• using getattr(): This is yet another way to access the value by giving namedtuple and
key value as its argument.
import collections
# Declaring namedtuple()
# Adding values
print(S[1])
print(S.name)
print(getattr(S, 'DOB'))
Output:
The Student age using index is : 19
The Student name using keyname is : Nandini
The Student DOB using getattr() is : 2541997
Conversion Operations
• _make() :- This function is used to return a namedtuple() from the iterable passed
as argument.
• _asdict() :- This function returns the OrderedDict() as constructed from the mapped
values of namedtuple().
• using “**” (double star) operator :- This function is used to convert a dictionary into
the namedtuple().
import collections
# Declaring namedtuple()
Student = collections.namedtuple('Student',
# Adding values
# initializing iterable
# initializing dict
print(S._asdict())
print(Student(**di))
Output:
The namedtuple instance using iterable is :
Student(name='Manjeet', age='19', DOB='411997')
The OrderedDict instance using namedtuple is :
OrderedDict([('name', 'Nandini'), ('age', '19'), ('DOB',
'2541997')])
The namedtuple instance from dict is :
Student(name='Nikhil', age=19, DOB='1391997')
Additional Operation
• _fields: This function is used to return all the keynames of the namespace declared.
• _replace(): _replace() is like str.replace() but targets named fields( does not modify
the original values)
import collections
# Declaring namedtuple()
print(S._fields)
print(S._replace(name='Manjeet'))
# original namedtuple
print(S)
Output:
All the fields of students are :
('name', 'age', 'DOB')
The modified namedtuple is :
Student(name='Manjeet', age='19', DOB='2541997')
Deque in Python
Deque (Doubly Ended Queue) in Python is implemented using the module “collections“.
Deque is preferred over a list in the cases where we need quicker append and pop operations
from both the ends of the container, as deque provides an O(1) time complexity for append
and pop operations as compared to list which provides O(n) time complexity.
Example:
# Declaring deque
queue = deque(['name','age','DOB'])
print(queue)
Output:
deque(['name', 'age', 'DOB'])
Let’s see various Operations on deque:
• append():- This function is used to insert the value in its argument to the right end of
the deque.
• appendleft():- This function is used to insert the value in its argument to the left
end of the deque.
• pop():- This function is used to delete an argument from the right end of the deque.
• popleft():- This function is used to delete an argument from the left end of the
deque.
import collections
# initializing deque
de = collections.deque([1,2,3])
de.append(4)
print (de)
de.appendleft(6)
print (de)
de.pop()
print (de)
de.popleft()
print (de)
Output:
The deque after appending at right is :
deque([1, 2, 3, 4])
The deque after appending at left is :
deque([6, 1, 2, 3, 4])
The deque after deleting from right is :
deque([6, 1, 2, 3])
The deque after deleting from left is :
deque([1, 2, 3])
• index(ele, beg, end):- This function returns the first index of the value mentioned in
arguments, starting searching from beg till end index.
• insert(i, a) :- This function inserts the value mentioned in arguments(a) at
index(i) specified in arguments.
• remove():- This function removes the first occurrence of value mentioned in
arguments.
• count():- This function counts the number of occurrences of value mentioned in
arguments.
import collections
# initializing deque
de = collections.deque([1, 2, 3, 3, 4, 2, 4])
print (de.index(4,2,5))
de.insert(4,3)
print (de)
print (de.count(3))
# using remove() to remove the first occurrence of 3
de.remove(3)
print (de)
Output:
The number 4 first occurs at a position :
4
The deque after inserting 3 at 5th position is :
deque([1, 2, 3, 3, 3, 4, 2, 4])
The count of 3 in deque is :
3
The deque after deleting first occurrence of 3 is :
deque([1, 2, 3, 3, 4, 2, 4])
• extend(iterable):- This function is used to add multiple values at the right end of the
deque. The argument passed is iterable.
• extendleft(iterable):- This function is used to add multiple values at the left end of
the deque. The argument passed is iterable. Order is reversed as a result of left
appends.
• reverse():- This function is used to reverse the order of deque elements.
• rotate():- This function rotates the deque by the number specified in arguments. If
the number specified is negative, rotation occurs to the left. Else rotation is to
right.
import collections
# initializing deque
de = collections.deque([1, 2, 3,])
de.extend([4,5,6])
print (de)
de.extendleft([7,8,9])
print (de)
# rotates by 3 to left
de.rotate(-3)
print (de)
de.reverse()
# printing modified deque
print (de)
Output:
The deque after extending deque at end is :
deque([1, 2, 3, 4, 5, 6])
The deque after extending deque at beginning is :
deque([9, 8, 7, 1, 2, 3, 4, 5, 6])
The deque after rotating deque is :
deque([1, 2, 3, 4, 5, 6, 9, 8, 7])
The deque after reversing deque is :
deque([7, 8, 9, 6, 5, 4, 3, 2, 1])
Heap data structure is mainly used to represent a priority queue. In Python, it is available
using “heapq” module. The property of this data structure in Python is that each time
the smallest of heap element is popped(min heap). Whenever elements are pushed or
popped, heap structure in maintained. The heap[0] element also returns the smallest
element each time.
Let’s see various Operations on heap:
• heapify(iterable) :- This function is used to convert the iterable into a heap data
structure. i.e. in heap order.
• heappush(heap, ele) :- This function is used to insert the element mentioned in its
arguments into heap. The order is adjusted, so as heap structure is maintained.
• heappop(heap) :- This function is used to remove and return the smallest
element from heap. The order is adjusted, so as heap structure is maintained.
import heapq
# initializing list
li = [5, 7, 9, 1, 3]
heapq.heapify(li)
print (list(li))
# pushes 4
heapq.heappush(li,4)
print (list(li))
print (heapq.heappop(li))
Output:
The created heap is : [1, 3, 9, 7, 5]
The modified heap after push is : [1, 3, 4, 7, 5, 9]
The popped and smallest element is : 1
• heappushpop(heap, ele) :- This function combines the functioning of both push and
pop operations in one statement, increasing efficiency. Heap order is maintained
after this operation.
• heapreplace(heap, ele) :- This function also inserts and pops element in one
statement, but it is different from above function. In this, element is first popped,
then the element is pushed.i.e, the value larger than the pushed value can be
returned. heapreplace() returns the smallest value originally in heap regardless of the
pushed element as opposed to heappushpop().
# Python code to demonstrate working of
import heapq
# initializing list 1
li1 = [5, 7, 9, 4, 3]
# initializing list 2
li2 = [5, 7, 9, 4, 3]
heapq.heapify(li1)
heapq.heapify(li2)
# pops 2
# pops 3
Output:
The popped item using heappushpop() is : 2
The popped item using heapreplace() is : 3
• nlargest(k, iterable, key = fun) :- This function is used to return the k largest elements
from the iterable specified and satisfying the key if mentioned.
• nsmallest(k, iterable, key = fun) :- This function is used to return the k smallest
elements from the iterable specified and satisfying the key if mentioned.
import heapq
# initializing list
heapq.heapify(li1)
print(heapq.nlargest(3, li1))
# prints 1, 3 and 4
print(heapq.nsmallest(3, li1))
Output:
The 3 largest numbers in list are : [10, 9, 8]
The 3 smallest numbers in list are : [1, 3, 4]
Collections.UserDict in Python
An unordered collection of data values that are used to store data values like a map is known
as Dictionary in Python. Unlike other Data Types that hold only a single value as an element,
Dictionary holds key:value pair. Key-value is provided in the dictionary to make it more
optimized.
Collections.UserDict
Python supports a dictionary like a container called UserDict present in the collections
module. This class acts as a wrapper class around the dictionary objects. This class is useful
when one wants to create a dictionary of their own with some modified functionality or with
some new functionality. It can be considered as a way of adding new behaviors to the
dictionary. This class takes a dictionary instance as an argument and simulates a dictionary
that is kept in a regular dictionary. The dictionary is accessible by the data attribute of this
class.
Syntax: collections.UserDict([initialdata])
Example 1:
# userdict
d = {'a':1,
'b': 2,
'c': 3}
# Creating an UserDict
userD = UserDict(d)
print(userD.data)
userD = UserDict()
print(userD.data)
Output:
{'a': 1, 'b': 2, 'c': 3}
{}
Example 2: Let’s create a class inheriting from UserDict to implement a customized
dictionary.
# userdict
class MyDict(UserDict):
# from dictionary
def __del__(self):
# dictionary
# from Dictionary
# Driver's code
d = MyDict({'a':1,
'b': 2,
'c': 3})
print("Original Dictionary")
print(d)
d.pop(1)
Output:
Original Dictionary
{'a': 1, 'c': 3, 'b': 2}
Traceback (most recent call last):
File "/home/3ce2f334f5d25a3e24d10d567c705ce6.py", line 35, in
d.pop(1)
File "/home/3ce2f334f5d25a3e24d10d567c705ce6.py", line 20, in pop
raise RuntimeError("Deletion not allowed")
RuntimeError: Deletion not allowed
Exception ignored in:
Traceback (most recent call last):
File "/home/3ce2f334f5d25a3e24d10d567c705ce6.py", line 15, in __del__
RuntimeError: Deletion not allowed
Collections.UserList in Python
Python Lists are array-like data structure but unlike it can be homogeneous. A single list may
contain DataTypes like Integers, Strings, as well as Objects. List in Python are ordered and
have a definite count. The elements in a list are indexed according to a definite sequence and
the indexing of a list is done with 0 being the first index.
Collections.UserList
Python supports a List like a container called UserList present in the collections module. This
class acts as a wrapper class around the List objects. This class is useful when one wants to
create a list of their own with some modified functionality or with some new functionality. It
can be considered as a way of adding new behaviors for the list. This class takes a list instance
as an argument and simulates a list that is kept in a regular list. The list is accessible by the
data attribute of the this class.
Syntax: collections.UserList([list])
Example 1:
# userlist
L = [1, 2, 3, 4]
# Creating a userlist
userL = UserList(L)
print(userL.data)
userL = UserList()
print(userL.data)
Output:
[1, 2, 3, 4]
[]
Example 2:
# userlist
class MyList(UserList):
# Function to stop deletion
# from List
# List
# Driver's code
L = MyList([1, 2, 3, 4])
print("Original List")
# Inserting to List"
L.append(5)
print("After Insertion")
print(L)
L.remove()
Output:
Original List
After Insertion
[1, 2, 3, 4, 5]
Collections.UserString in Python
Strings are the arrays of bytes representing Unicode characters. However, Python does not
support the character data type. A character is a string of length one.
Example:
# string
# Creating a String
print(String1)
# Creating a String
print(String1)
Output:
String with the use of Single Quotes:
Welcome to the Geeks World
Collections.UserString
Python supports a String like a container called UserString present in the collections module.
This class acts as a wrapper class around the string objects. This class is useful when one wants
to create a string of their own with some modified functionality or with some new
functionality. It can be considered as a way of adding new behaviors for the string. This class
takes any argument that can be converted to string and simulates a string whose content is
kept in a regular string. The string is accessible by the data attribute of this class.
Syntax: collections.UserString(seq)
Example 1:
# userstring
d = 12344
# Creating an UserDict
userS = UserString(d)
print(userS.data)
userS = UserString("")
print(userS.data)
Output:
12344
Example 2:
# userstring
# Function to append to
# string
self.data += s
# string
# Driver's code
s1 = Mystring("Geeks")
# Appending to string
s1.append("s")
s1.remove("e")
Output:
Original String: Geeks
String After Appending: Geekss
String after Removing: Gkss
Interacting with the OS
The OS module in Python provides functions for interacting with the operating system. OS
comes under Python’s standard utility modules. This module provides a portable way of using
operating system-dependent functionality. The *os* and *os.path* modules include many
functions to interact with the file system.
# importing os module
import os
# directory (CWD)
cwd = os.getcwd()
# directory (CWD)
Output:
Current working directory: /home/nikhil/Desktop/gfg
Changing the Current working directory
To change the current working directory(CWD) os.chdir() method is used. This method
changes the CWD to a specified path. It only takes a single argument as a new directory path.
Note: The current working directory is the folder in which the Python script is operating.
Example:
import os
# working directory
def current_path():
print(os.getcwd())
print()
# Driver's code
current_path()
os.chdir('../')
current_path()
Output:
Current working directory before
C:\Users\Nikhil Aggarwal\Desktop\gfg
There are different methods available in the OS module for creating a directory. These are –
• os.mkdir()
• os.makedirs()
Using os.mkdir()
os.mkdir() method in Python is used to create a directory named path with the specified
numeric mode. This method raises FileExistsError if the directory to be created already exists.
Example:
# importing os module
import os
# Directory
directory = "GeeksforGeeks"
# Path
# 'GeeksForGeeks' in
os.mkdir(path)
# Directory
directory = "Geeks"
# Parent Directory path
# mode
mode = 0o666
# Path
# 'GeeksForGeeks' in
os.mkdir(path, mode)
Output:
Directory 'GeeksforGeeks' created
Directory 'Geeks' created
Using os.makedirs()
os.makedirs() method in Python is used to create a directory recursively. That means while
making leaf directory if any intermediate-level directory is missing, os.makedirs() method will
create them all.
Example:
# importing os module
import os
# Leaf directory
directory = "Nikhil"
# Parent Directories
# Path
# 'Nikhil'
os.makedirs(path)
# be created too
# Leaf directory
directory = "c"
# Parent Directories
# mode
mode = 0o666
# directory is missing
# create them
Output:
Directory 'Nikhil' created
Directory 'c' created
Listing out Files and Directories with Python
os.listdir() method in Python is used to get the list of all files and directories in the specified
directory. If we don’t specify any directory, then the list of files and directories in the current
working directory will be returned.
Example:
# importing os module
import os
path = "/"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
print(dir_list)
Output:
Files and directories in ' / ' :
['sys', 'run', 'tmp', 'boot', 'mnt', 'dev', 'proc', 'var', 'bin',
'lib64', 'usr',
'lib', 'srv', 'home', 'etc', 'opt', 'sbin', 'media']
OS module proves different methods for removing directories and files in Python. These are–
• Using os.remove()
• Using os.rmdir()
Using os.remove()
os.remove() method in Python is used to remove or delete a file path. This method can not
remove or delete a directory. If the specified path is a directory then OSError will be raised by
the method.
Example: Suppose the file contained in the folder are:
# importing os module
import os
# File name
file = 'file1.txt'
# File location
# Path
# 'file.txt'
os.remove(path)
e)
Output:
Using os.rmdir()
os.rmdir() method in Python is used to remove or delete an empty directory. OSError will be
raised if the specified path is not an empty directory.
Example: Suppose the directories are
# Python program to explain os.rmdir() method
# importing os module
import os
# Directory name
directory = "Geeks"
# Parent Directory
# Path
# "Geeks"
os.rmdir(path)
Output:
os.name: This function gives the name of the operating system dependent module imported.
The following names have currently been registered: ‘posix’, ‘nt’, ‘os2’, ‘ce’, ‘java’ and ‘riscos’.
import os
print(os.name)
Output:
posix
Note: It may give different output on different interpreters, such as ‘posix’ when you run the
code here.
os.error: All functions in this module raise OSError in the case of invalid or inaccessible file
names and paths, or other arguments that have the correct type, but are not accepted by the
operating system. os.error is an alias for built-in OSError exception.
import os
try:
filename = 'GFG.txt'
f = open(filename, 'rU')
text = f.read()
f.close()
except IOError:
Output:
Problem reading: GFG.txt
os.popen(): This method opens a pipe to or from command. The return value can be read or
written depending on whether the mode is ‘r’ or ‘w’.
Syntax: os.popen(command[, mode[, bufsize]])
Parameters mode & bufsize are not necessary parameters, if not provided, default ‘r’ is taken
for mode.
import os
fd = "GFG.txt"
file.write("Hello")
file.close()
text = file.read()
print(text)
file.write("Hello")
Output:
Hello
Note: Output for popen() will not be shown, there would be direct changes into the file.
os.close(): Close file descriptor fd. A file opened using open(), can be closed by close()only.
But file opened through os.popen(), can be closed with close() or os.close(). If we try closing
a file opened with open(), using os.close(), Python would throw TypeError.
import os
fd = "GFG.txt"
text = file.read()
print(text)
os.close(file)
Output:
Traceback (most recent call last):
File "C:\Users\GFG\Desktop\GeeksForGeeksOSFile.py", line 6, in
os.close(file)
TypeError: an integer is required (got type _io.TextIOWrapper)
Note: The same error may not be thrown, due to the non-existent file or permission privilege.
os.rename(): A file old.txt can be renamed to new.txt, using the function os.rename(). The
name of the file changes only if, the file exists and the user has sufficient privilege permission
to change the file.
import os
fd = "GFG.txt"
os.rename(fd,'New.txt')
os.rename(fd,'New.txt')
Output:
Traceback (most recent call last):
File "C:\Users\GFG\Desktop\ModuleOS\GeeksForGeeksOSFile.py", line
3, in
os.rename(fd,'New.txt')
FileNotFoundError: [WinError 2] The system cannot find the
file specified: 'GFG.txt' -> 'New.txt'
Understanding the Output: A file name “GFG.txt” exists, thus when os.rename() is used the
first time, the file gets renamed. Upon calling the function os.rename() second time, file
“New.txt” exists and not “GFG.txt” thus Python throws FileNotFoundError.
os.remove(): Using the Os module we can remove a file in our system using the remove()
method. To remove a file we need to pass the name of the file as a parameter.
The OS module provides us a layer of abstraction between us and the operating system. When
we are working with os module always specify the absolute path depending upon the
operating system the code can run on any os but we need to change the path exactly. If you
try to remove a file that does not exist you will get FileNotFoudError.
os.path.exists(): This method will check whether a file exists or not by passing the name of
the file as a parameter. OS module has a sub-module named PATH by using which we can
perform many more functions.
import os
#importing os module
print(result)
Output
False
As in the above code, the file does not exist it will give output False. If the file exists it will give
us output True.
os.path.getsize(): In this method, python will give us the size of the file in bytes. To use this
method we need to pass the name of the file as a parameter.
size = os.path.getsize("filename")
Output:
Size of the file is 192 bytes.
Iterators in Python
Iterator in python is an object that is used to iterate over iterable objects like lists, tuples,
dicts, and sets. The iterator object is initialized using the iter() method. It uses
the next() method for iteration.
• __iter(iterable)__ method that is called for the initialization of an iterator. This returns
an iterator object
• next ( __next__ in Python 3) The next method returns the next value for the iterable.
When we use a for loop to traverse any iterable object, internally it uses the iter()
method to get an iterator object which further uses next() method to iterate over. This
method raises a StopIteration to signal the end of the iteration.
How an iterator really works in python.
while True:
try:
Output :
P
y
t
h
o
n
Below is a simple Python custom iterator that creates iterator type that iterates from 10 to a
given limit. For example, if the limit is 15, then it prints 10 11 12 13 14 15. And if the limit is
5, then it prints nothing.
# A simple Python program to demonstrate
# working of iterators using an example type
# that iterates from 10 to given value
# Constructor
def __init__(self, limit):
self.limit = limit
# Prints nothing
for i in Test(5):
print(i)
Output :
10
11
12
13
14
15
In the following iterations, the for loop is internally(we can’t see it) using iterator object to
traverse over the iterables.
Output :
List Iteration
datascience
with
Python
Tuple Iteration
Python
in
Datascience
String Iteration
P
y
t
h
o
n
Dictionary Iteration
xyz 123
abc 345
Python in its definition also allows some interesting and useful iterator functions for efficient
looping and making execution of the code faster. There are many build-in iterators in the
module “itertools“.
This module implements a number of iterator building blocks.
Some useful Iterators:
• accumulate(iter, func) :- This iterator takes two arguments, iterable target and the
function which would be followed at each iteration of value in target. If no function is
passed, addition takes place by default.If the input iterable is empty, the output
iterable will also be empty.
• chain(iter1, iter2..) :- This function is used to print all the values in iterable targets one
after another mentioned in its arguments.
# initializing list 1
li1 = [1, 4, 5, 7]
# initializing list 2
li2 = [1, 6, 5, 9]
# initializing list 3
li3 = [8, 10, 5, 4]
# using accumulate()
# prints the successive summation of elements
print ("The sum after each iteration is : ",end="")
print (list(itertools.accumulate(li1)))
# using accumulate()
# prints the successive multiplication of elements
print ("The product after each iteration is : ",end="")
print (list(itertools.accumulate(li1,operator.mul)))
Output:
The sum after each iteration is : [1, 5, 10, 17]
The product after each iteration is : [1, 4, 20, 140]
All values in mentioned chain are : [1, 4, 5, 7, 1, 6, 5, 9, 8, 10,
5, 4]
• chain.from_iterable() :- This function is implemented similarly as chain() but the
argument here is a list of lists or any other iterable container.
• compress(iter, selector) :- This iterator selectively picks the values to print from the
passed container according to the boolean list value passed as other argument. The
arguments corresponding to boolean true are printed else all are skipped.
# initializing list 1
li1 = [1, 4, 5, 7]
# initializing list 2
li2 = [1, 6, 5, 9]
# initializing list 3
li3 = [8, 10, 5, 4]
# initializing list of list
li4 = [li1, li2, li3]
Output:
All values in mentioned chain are : [1, 4, 5, 7, 1, 6, 5, 9, 8, 10,
5, 4]
The compressed values in string are : ['G', 'F', 'G']
• dropwhile(func, seq) :- This iterator starts printing the characters only after the func.
in argument returns false for the first time.
• filterfalse(func, seq) :- As the name suggests, this iterator prints only values that
return false for the passed function.
# initializing list
li = [2, 4, 5, 7, 8]
Output:
The values after condition returns false : [5, 7, 8]
The values that return false to function are : [5, 7]
Python __iter__() and __next__() | Converting an object into an iterator
At many instances, we get a need to access an object like an iterator. One way is to form a
generator loop but that extends the task and time taken by the programmer. Python eases
this task by providing a built-in method __iter__() for this task.
The __iter__() function returns an iterator for the given object (array, set, tuple, etc. or
custom objects). It creates an object that can be accessed one element at a time
using __next__() function, which generally comes in handy when dealing with loops.
Syntax:
iter(object)
iter(callable, sentinel)
• Object: The object whose iterator has to be created. It can be a collection object like
list or tuple or a user-defined object (using OOPS).
• Callable, Sentinel: Callable represents a callable object, and sentinel is the value at
which the iteration is needed to be terminated, sentinel value represents the end of
sequence being iterated.
Exception:
If we call the iterator after all the elements have been iterated, then StopIterationError is
raised.
The __iter__() function returns an iterator object that goes through each element of the given
object. The next element can be accessed through __next__() function. In the case of callable
object and sentinel value, the iteration is done until the value is found or the end of elements
reached. In any case, the original object is not modified.
Code #1:
listA = ['a','e','i','o','u']
iter_listA = iter(listA)
try:
print( next(iter_listA))
print( next(iter_listA))
print( next(iter_listA))
print( next(iter_listA))
print( next(iter_listA))
except:
pass
Output:
a
e
i
o
u
Code #2:
iter_lst = iter(lst)
while True:
try:
print(iter_lst.__next__())
except:
break
Output:
11
22
33
44
55
Code #3:
iter_listB = listB.__iter__()
try:
print(iter_listB.__next__())
print(iter_listB.__next__())
print(iter_listB.__next__())
print(iter_listB.__next__())
except:
Output:
Cat
Bat
Sat
Mat
Throwing 'StopIterationError' I cannot count more.
Code #4: User-defined objects (using OOPS)
class Counter:
self.num = start
self.end = end
def __iter__(self):
return self
def __next__(self):
raise StopIteration
else:
self.num += 1
return self.num - 1
# Driver code
if __name__ == '__main__' :
a, b = 2, 5
c1 = Counter(a, b)
c2 = Counter(a, b)
for i in c1:
obj = iter(c2)
try:
except:
# when StopIteration raised, Print custom message
Output:
Print the range without iter()
Eating more Pizzas, counting 2
Eating more Pizzas, counting 3
Eating more Pizzas, counting 4
Eating more Pizzas, counting 5
Iterable is an object, that one can iterate over. It generates an Iterator when passed to iter()
method. An iterator is an object, which is used to iterate over an iterable object using the
__next__() method. Iterators have the __next__() method, which returns the next item of the
object. Note that every iterator is also an iterable, but not every iterable is an iterator. For
example, a list is iterable but a list is not an iterator. An iterator can be created from an
iterable by using the function iter(). To make this possible, the class of an object needs either
a method __iter__, which returns an iterator, or a __getitem__ method with sequential
indexes starting with 0.
Code #1
# code
next("APTECH")
Output:
Traceback (most recent call last):
File "/home/1c9622166e9c268c0d67cd9ba2177142.py", line 2, in <module>
next("APTECH")
TypeError: 'str' object is not an iterator
We know that str is iterable but it is not an iterator. where if we run this in for loop to print
string then it is possible because when for loop executes it converts into an iterator to execute
the code.
# code
s="APTECH"
s=iter(s)
next(s)
Here iter( ) is converting s which is a string (iterable) into an iterator and prints G for the first
time we can call multiple times to iterate over strings.
When a for loop is executed, for statement calls iter() on the object, which it is supposed to
loop over. If this call is successful, the iter call will return an iterator object that defines the
method __next__(), which accesses elements of the object one at a time. The __next__()
method will raise a StopIteration exception if there are no further elements available. The for
loop will terminate as soon as it catches a StopIteration exception. Let’s call the __next__()
method using the next() built-in function.
Code #2: Function ‘iterable’ will return True if the object ‘obj’ is an iterable and False
otherwise.
# list of cities
cities = ["Bengaluru", "New Delhi", "Mumbai"]
print(next(iterator_obj))
print(next(iterator_obj))
print(next(iterator_obj))
Output:
Bengaluru
New Delhi
Mumbai
Note: If ‘next(iterator_obj)’ is called one more time, it would return ‘StopIteration’.
Python Debugger – Python pdb
Debugging in Python is facilitated by pdb module(python debugger) which comes built-in to
the Python standard library. It is actually defined as the class Pdb which internally makes use
of bdb(basic debugger functions) and cmd(support for line-oriented command interpreters)
modules. The major advantage of pdb is it runs purely in the command line thereby making it
great for debugging code on remote servers when we don’t have the privilege of a GUI-based
debugger.
pdb supports-
• Setting breakpoints
• Stepping through code
• Source code listing
• Viewing stack traces
import pdb
answer = a + b
return answer
pdb.set_trace()
sum = addition(x, y)
print(sum)
Output:
set_trace
In the output on the first line after the angle bracket, we have the directory path of our
file, line number where our breakpoint is located, and <module>. It’s basically saying that we
have a breakpoint in exppdb.py on line number 10 at the module level. If you introduce the
breakpoint inside the function then its name will appear inside <>. The next line is showing
the code line where our execution is stopped. That line is not executed yet. Then we have
the pdb prompt. Now to navigate the code we can use the following commands:
Command Function
help To display all commands
where Display the stack trace and line number of the current line
next Execute the current line and move to the next line ignoring function calls
step Step into functions called at the current line
Now to check the type of variable just write whatis and variable name. In the example given
below the output of type of x is returned as <class string>. Thus typecasting string to int in our
program will resolve the error.
Example 2:
• From the Command Line: It is the easiest way of using a debugger. You just have to
run the following command in terminal
python -m pdb exppdb.py (put your file name instead of exppdb.py)
This statement loads your source code and stops execution on the first line of code.
Example 3:
answer = a + b
return answer
print(sum)
Output:
command_line
• Post-mortem debugging means entering debug mode after the program is finished
with the execution process (failure has already occurred). pdb supports post-mortem
debugging through the pm() and post_mortem() functions. These functions look for
active trace back and start the debugger at the line in the call stack where the
exception occurred. In the output of the given example you can notice pdb appear
when exception is encountered in the program.
Example 4:
answer = a * b
return answer
result = multiply(x, y)
print(result)
Output:
Checking variables on the Stack
All the variables including variables local to the function being executed in the program as
well as global are maintained on the stack. We can use args (or use a) to print all the
arguments of function which is currently active. p command evaluates an expression given as
an argument and prints the result.
Here, example 4 of this article is executed in debugging mode to show you how to check for
variables:
cheking_variable_values
Python pdb Breakpoint
While working with large programs we often want to add a number of breakpoints where we
know errors might occur. To do this you just have to use the break command. When you
insert a breakpoint, the debugger assigns a number to it starting from 1. Use the break to
display all the breakpoints in the program.
Adding_breakpoints
Managing Breakpoints
After adding breakpoints with the help of numbers assigned to them we can manage the
breakpoints using the enable and disable and remove command. disable tells the debugger
not to stop when that breakpoint is reached while enable turns on the disabled breakpoints.
Given below is the implementation to manage breakpoints using Example 4.
Manage_breakpoints
"""
self.side = side
def area(self):
"""
return self.side**2
def perimeter(self):
"""
return 4 * self.side
def __repr__(self):
"""
return s
if __name__ == '__main__':
square = Square(side)
# print the created square
print(square)
Now that we have our software ready, let’s have a look at the directory structure of our
project folder and after that, we’ll start testing our software.
---Software_Testing
|--- __init__.py (to initialize the directory as python package)
|--- app.py (our software)
|--- tests (folder to keep all test files)
|--- __init__.py
One of the major problems with manual testing is that it requires time and effort. In manual
testing, we test the application over some input, if it fails, either we note it down or we debug
the application for that particular test input, and then we repeat the process.
With unittest, all the test inputs can be provided at once and then you can test your
application. In the end, you get a detailed report with all the failed test cases clearly specified,
if any.
The unittest module has both a built-in testing framework and a test runner. A testing
framework is a set of rules which must be followed while writing test cases, while a test runner
is a tool which executes these tests with a bunch of settings, and collects the results.
Installation: unittest is available at PyPI and can be installed with the following command –
pip install unittest
Use: We write the tests in a Python module (.py). To run our tests, we simply execute the test
module using any IDE or terminal.
Now, let’s write some tests for our small software discussed above using
the unittest module.
self.assertEqual(sq.area(), 4,
if __name__ == '__main__':
unittest.main()
• Because of these lines, as soon as you run execute the script “test.py”, the
function unittest.main() would be called and all the tests will be executed.
Finally the “tests.py” module should resemble the code given below.
import unittest
class TestSum(unittest.TestCase):
def test_area(self):
sq = app.Square(2)
self.assertEqual(sq.area(), 4,
if __name__ == '__main__':
unittest.main()
Having written our test cases let us now test our application for any bugs. To test your
application you simply need to execute the test file “tests.py” using the command prompt or
any IDE of your choice. The output should be something like this.
-------------------------------------------------------------------
Ran 1 test in 0.000s
OK
In the first line, a .(dot) represents a successful test while an ‘F’ would represent a failed test
case. The OK message, in the end, tells us that all the tests were passed successfully.
Let’s add a few more tests in “tests.py” and retest our application.
import unittest
class TestSum(unittest.TestCase):
def test_area(self):
sq = app.Square(2)
self.assertEqual(sq.area(), 4,
def test_area_negative(self):
sq = app.Square(-3)
self.assertEqual(sq.area(), -1,
def test_perimeter(self):
sq = app.Square(5)
self.assertEqual(sq.perimeter(), 20,
def test_perimeter_negative(self):
sq = app.Square(-6)
self.assertEqual(sq.perimeter(), -1,
if __name__ == '__main__':
unittest.main()
.F.F
===================================================================
FAIL: test_area_negative (__main__.TestSum)
-------------------------------------------------------------------
Traceback (most recent call last):
File "tests_unittest.py", line 11, in test_area_negative
self.assertEqual(sq.area(), -1, f'Area is shown {sq.area()} rather than
-1 for negative side length')
AssertionError: 9 != -1 : Area is shown 9 rather than -1 for negative side
length
======================================================================
FAIL: test_perimeter_negative (__main__.TestSum)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests_unittest.py", line 19, in test_perimeter_negative
self.assertEqual(sq.perimeter(), -1, f'Perimeter is {sq.perimeter()}
rather than -1 for negative side length')
AssertionError: -24 != -1 : Perimeter is -24 rather than -1 for negative
side length
----------------------------------------------------------------------
Ran 4 tests in 0.001s
FAILED (failures=2)
A few things to note in the above test report are –
• The first line represents that test 1 and test 3 executed successfully while test 2 and
test 4 failed
• Each failed test case is described in the report, the first line of the description contains
the name of the failed test case and the last line contains the error message we
defined for that test case.
• At the end of the report you can see the number of failed tests, if no test fails the
report will end with OK
The purpose of nose2 is to extend unittest to make testing easier. nose2 is compatible with
tests written using the unittest testing framework and can be used as a replacement of
the unittest test runner.
Installation: nose2 can be installed from PyPI using the command,
pip install nose2
Use: nose2 does not have any testing framework and is merely a test runner which is
compatible with the unittest testing framework. Therefore we’ll the run same tests we wrote
above (for unittest) using nose2. To run the tests we use the following command in the
project source directory (“Software_Testing” in our case),
nose2
In nose2 terminology all the python modules (.py) with name starting from “test” (i.e.
test_file.py, test_1.py) are considered as test files. On execution, nose2 will look for all test
files in all the sub-directories which lie under one or more of the following categories,
pytest is the most popular testing framework for python. Using pytest you can test
anything from basic python scripts to databases, APIs and UIs. Though pytest is mainly used
for API testing, in this article we’ll cover only the basics of pytest.
Installation: You can install pytest from PyPI using the command,
pip install pytest
Use: The pytest test runner is called using the following command in project source,
py.test
Unlike nose2, pytest looks for test files in all the locations inside the project directory. Any
file with name starting with “test_” or ending with “_test” is considered a test file in
the pytest terminology. Let’s create a file “test_file1.py” in the folder “tests” as our test file.
Creating test methods:
pytest supports the test methods written in the unittest framework, but
the pytest framework provides easier syntax to write tests. See the code below to
understand the test method syntax of the pytest framework.
def test_file1_area():
sq = app.Square(2)
assert sq.area() == 4,
def test_file1_perimeter():
sq = app.Square(-1)
Note: similar to unittest, pytest requires all test names to start with “test”.
Unlike unittest, pytest uses the default python assert statements which make it
further easier to use.
Note that, now the “tests” folder contains two files namely, “tests.py” (written
in unittest framework) and “test_file1.py” (written in pytest framework). Now let’s run
the pytest test runner.
py.test
You’ll get a similar report as obtained by using unittest.
tests/test_file1.py .F [
33%]
tests/test_file2.py .F.F [100%]
===================================
FAILURES
===================================
The percentages on the right side of the report show the percentage of tests that have been
completed at that moment, i.e. 2 out of the 6 test cases were completed at the end of the
“test_file1.py”.
Here are a few more basic customisations that come with pytest.
• Running specific test files: To run only a specific test file, use the command,
py.test <filename>
• Substring matching: Suppose we want to test only the area() method of
our Square class, we can do this using substring matching as follows,
py.test -k "area"
With this command pytest will execute only those tests which have the string “area”
in their names, i.e. “test_file1_area()”, “test_area()” etc.
• Marking: As a substitute to substring matching, marking is another method using
which we can run a specific set of tests. In this method we put a mark on the tests we
want to run. Observe the code example given below,
# @pytest.mark.<tag_name>
@pytest.mark.area
def test_file1_area():
sq = app.Square(2)
assert sq.area() == 4,
• In the above code example test_file1_area() is marked with tag “area”. All the
test methods which have been marked with some tag can be executed by using the
command,
py.test -m <tag_name>
• Parallel Processing: If you have a large number of tests then pytest can be
customised to run these test methods in parallel. For that you need to install pytest-
xdist which can be installed using the command,
pip install pytest-xdist
• Now you can use the following command to execute your tests faster using
multiprocessing,
py.test -n 4
• With this command pytest assigns 4 workers to perform the tests in parallel, you
can change this number as per your needs.
• If your tests are thread-safe, you can also use multithreading to speed up the testing
process. For that you need to install pytest-parallel (using pip). To run your tests in
multithreading use the command,
pytest --workers 4
Unit Testing in Python – Unittest
What is Unit Testing?
Unit Testing is the first level of software testing where the smallest testable parts of a
software are tested. This is used to validate that each unit of the software performs as
designed. The unittest test framework is python’s xUnit style framework.
Method:
White Box Testing method is used for Unit testing.
OOP concepts supported by unittest framework:
• test fixture:
A test fixture is used as a baseline for running tests to ensure that there is a fixed
environment in which tests are run so that results are repeatable.
Examples:
• creating temporary databases.
• starting a server process.
• test case:
A test case is a set of conditions which is used to determine whether a system under
test works correctly.
• test suite:
Test suite is a collection of testcases that are used to test a software program to
show that it has some specified set of behaviours by executing the aggregated tests
together.
• test runner:
A test runner is a component which set up the execution of tests and provides the
outcome to the user.
Basic Test Structure: unittest defines tests by the following two ways:
import unittest
class SimpleTest(unittest.TestCase):
def test(self):
self.assertTrue(True)
if __name__ == '__main__':
unittest.main()
This is the basic test code using unittest framework, which is having a single test. This test()
method will fail if TRUE is ever FALSE.
Running Tests
if __name__ == '__main__':
unittest.main()
The last block helps to run the test by running the file through the command line.
-------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Here, in the output the “.” on the first line of output means that a test passed.
“-v” option is added in the command line while running the tests to obtain more detailed test
results.
test (__main__.SimpleTest) ... ok
-------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Outcomes Possible:
There are three types of possible test outcomes:
import unittest
class TestStringMethods(unittest.TestCase):
def setUp(self):
pass
def test_strings_a(self):
def test_upper(self):
self.assertEqual('foo'.upper(), 'FOO')
def test_isupper(self):
self.assertTrue('FOO'.isupper())
self.assertFalse('Foo'.isupper())
def test_strip(self):
s = 'geeksforgeeks'
self.assertEqual(s.strip('geek'), 'sforgeeks')
def test_split(self):
s = 'hello world'
with self.assertRaises(TypeError):
s.split(2)
if __name__ == '__main__':
unittest.main()
The above code is a short script to test 5 string methods. unittest.TestCase is used to
create test cases by subclassing it. The last block of the code at the bottom allows us to run
all the tests just by running the file.
Basic terms used in the code:
• assertEqual() – This statement is used to check if the result obtained is equal to the
expected result.
• assertTrue() / assertFalse() – This statement is used to verify if a given statement is
true or false.
• assertRaises() – This statement is used to raise a specific exception.
Description of tests:
• test_strings_a
This test is used to test the property of string in which a character say ‘a’ multiplied by
a number say ‘x’ gives the output as x times ‘a’. The assertEqual() statement returns
true in this case if the result matches the given output.
• test_upper
This test is used to check if the given string is converted to uppercase or not. The
assertEqual() statement returns true if the string returned is in uppercase.
• test_isupper
This test is used to test the property of string which returns TRUE if the string is in
uppercase else returns False. The assertTrue() / assertFalse() statement is used for this
verification.
• test_strip
This test is used to check if all chars passed in the function have been stripped from
the string. The assertEqual() statement returns true if the string is stripped and
matches the given output.
• test_split
This test is used to check the split function of the string which splits the string through
the argument passed in the function and returns the result as list. The assertEqual()
statement returns true in this case if the result matches the given output.
unittest.main() provides a command-line interface to the test script.On running the above
script from the command line, following output is produced:
-------------------------------------------------------------------
Ran 5 tests in 0.000s
OK