18CS55 ADP Notes Module 4 and 5

Application Development using Python [18CS55]
Department of Computer Science & Engineering
Academic Year 2021-22
Study Material
Module 4 and 5
Course Name : Application Development using Python

Course Code : 18CS55
Course Coordinator: Mr. Ramesh Babu N
Assoc. Prof.,
Dept. of CSE, AIEMS
Ramesh Babu N, Assc. Prof., Page 1 of 72

Dept. of CSE, AIEMS
Contents
Module 4 ................................................................................................................................................. 6
Chapter 15: Classes and objects ............................................................................................................... 6
15.1 User-defined types ................................................................................................................... 6
15.2 Attributes................................................................................................................................. 7
15.3 Rectangles................................................................................................................................ 8
15.4 Instances as return values ........................................................................................................ 8
15.5 Objects are mutable ................................................................................................................. 9
15.6 Copying .................................................................................................................................... 9
15.7 Debugging .............................................................................................................................. 10
15.8 Glossary ................................................................................................................................. 11
Chapter 16: Classes and functions ......................................................................................................... 12
16.1 Time....................................................................................................................................... 12
16.3 Modifiers ............................................................................................................................... 13
16.4 Prototyping versus planning ................................................................................................... 14
16.5 Debugging .............................................................................................................................. 15
16.6 Glossary ................................................................................................................................. 16
Chapter 17: Classes and methods .......................................................................................................... 18
17.1 Object-oriented features ........................................................................................................ 18
17.2 Printing objects ...................................................................................................................... 18
17.3 Another example ................................................................................................................... 19
17.4 A more complicated example ................................................................................................. 19
17.5 The init method...................................................................................................................... 20
17.6 The __str__ method ............................................................................................................... 20
17.7 Operator overloading ............................................................................................................. 21
17.8 Type-based dispatch .............................................................................................................. 21
17.9 Polymorphism ........................................................................................................................ 22
17.10 Debugging .......................................................................................................................... 23
17.11 Interface and implementation ............................................................................................ 23
17.12 Glossary ............................................................................................................................. 24
isinstance(object, classinfo) ............................................................................................................... 24

Dept. of CSE, AIEMS
Chapter 18: Inheritance ......................................................................................................................... 26

18.1 Card objects ............................................................................................................................... 26
18.2 Class attributes .......................................................................................................................... 26
18.3 Comparing cards ........................................................................................................................ 27
18.4 Decks ......................................................................................................................................... 28
18.5 Printing the deck ........................................................................................................................ 28
18.6 Add, remove, shuffle and sort .................................................................................................... 29
18.7 Inheritance ................................................................................................................................ 29
18.8 Class diagrams ........................................................................................................................... 30
Module 5: .............................................................................................................................................. 32
Textbook 1: Chapters 11 – 14 ................................................................................................................ 32
Chapter 11 – Web Scraping................................................................................................................ 32
Project: mapIt.py with the webbrowser Module............................................................................ 32
Checking for Errors ........................................................................................................................ 34
Resources for Learning HTML ........................................................................................................ 35
A Quick Refresher .......................................................................................................................... 35
Viewing the Source HTML of a Web Page ....................................................................................... 35
Opening Your Browser’s Developer Tools....................................................................................... 36
Using the Developer Tools to Find HTML Elements ........................................................................ 36
Creating a BeautifulSoup Object from HTML .................................................................................. 37
Finding an Element with the select() Method ................................................................................. 37
Getting Data from an Element’s Attributes .................................................................................... 39
Step 1: Design the Program............................................................................................................ 41
Step 2: Download the Web Page .................................................................................................... 41
Step 3: Find and Download the Comic Image ................................................................................. 41
Step 4: Save the Image and Find the Previous Comic ...................................................................... 42
Starting a Selenium-Controlled Browser......................................................................................... 43
Finding Elements on the Page ........................................................................................................ 43
Clicking the Page............................................................................................................................ 45
Filling Out and Submitting Forms ................................................................................................... 45
Sending Special Keys ...................................................................................................................... 45
Clicking Browser Buttons ............................................................................................................... 46
Chapter 12 – Working with Excel Spreadsheets.................................................................................. 47
Excel Documents ............................................................................................................................ 47
Installing the openpyxl Module ...................................................................................................... 47

Dept. of CSE, AIEMS
Install openpyxl using pip................................................................................................................... 47

Reading Excel Documents ............................................................................................................. 47
File: example.xlsx........................................................................................................................... 47
Opening Excel Documents with OpenPyXL ..................................................................................... 48
Getting Sheets from the Workbook................................................................................................ 48
Converting Between Column Letters and Numbers ........................................................................ 50
Getting Rows and Columns from the Sheets .................................................................................. 51
Workbooks, Sheets, Cells ............................................................................................................... 52
Project: Reading Data from a Spreadsheet .................................................................................. 53
Step 1: Read the Spreadsheet Data ................................................................................................ 53
Step 2: Populate the Data Structure ............................................................................................... 53
Step 3: Write the Results to a File .................................................................................................. 54
Writing Excel Documents ................................................................................................................... 55
Creating and Saving Excel Documents ............................................................................................ 55
Creating and Removing Sheets....................................................................................................... 55
Parameters: ................................................................................................................................... 56
Note: Deprecated: Use wb.remove(worksheet) or del wb[sheetname] .......................................... 56
Writing Values to Cells ................................................................................................................... 56
Project: Updating a Spreadsheet........................................................................................................ 57
Step 1: Set Up a Data Structure with the Update Information: use dictionary................................. 57
Step 2: Check All Rows and Update Incorrect Prices ....................................................................... 57
Setting the Font Style of Cells ............................................................................................................ 58
Font Objects ...................................................................................................................................... 59
Formulas ........................................................................................................................................... 60
Adjusting Rows and Columns ............................................................................................................. 61
Setting Row Height and Column Width .......................................................................................... 61
Merging and Unmerging Cells ........................................................................................................ 61
Freeze Panes.................................................................................................................................. 62
Charts ................................................................................................................................................ 63
Chapter 13 – Working with PDF and Word Documents ...................................................................... 65
PDF Documents ................................................................................................................................. 65
Extracting Text from PDFs .............................................................................................................. 65
Decrypting PDFs............................................................................................................................. 66
Creating PDFs ................................................................................................................................ 67
Project: Combining Select Pages from Many PDF s ............................................................................. 71

Dept. of CSE, AIEMS
Step 1: Find All PDF Files ................................................................................................................ 71

Step 2: Open Each PDF ................................................................................................................... 71
Step 3: Add Each Page ................................................................................................................... 71
Step 4: Save the Results ................................................................................................................. 71

Dept. of CSE, AIEMS
Module 4
Chapter 15: Classes and objects
15.1 User-defined types
We have used many of Python’s built-in types; now we are going to define a new type. As an
example, we will create a type called Point that represents a point in two-dimensional space.
In mathematical notation, points are often written in parentheses with a comma separating the
coordinates. For example, (0,0) represents the origin, and (x,y) represents the point x units to
the right and y units up from the origin.
There are several ways we might represent points in Python:

 We could store the coordinates separately in two variables, x and y.
 We could store the coordinates as elements in a list or tuple.
 We could create a new type to represent points as objects.
A user-defined type is also called a class. A class definition looks like this:
class Point(object):
"""Represents a point in 2-D space."""
This header indicates that the new class is a Point, which is a kind of object, which is a built-in
type.
The body is a docstring that explains what the class is for. You can define variables and
functions inside a class definition.
Defining a class named Point creates a class object.

>>> print Point
<class ' main .Point'>
Because Point is defined at the top level, its “full name” is main .Point. To create a Point object,
you call Point as if it were a function.

>>> blank = Point()
>>> print (blank)
< main .Point instance at 0xb7e9d3ac>
The return value is a reference to a Point object, which we assign to blank. Creating a new
object is called instantiation, and the object is an instance of the class.
When you print an instance, Python tells you what class it belongs to and where it is stored in
memory (the prefix 0x means that the following number is in hexadecimal).

Dept. of CSE, AIEMS
15.2 Attributes
You can assign values to an instance using dot notation:
>>> blank.x = 3.0
>>> blank.y = 4.0
These elements are called attributes.
The following diagram shows the result of these assignments. A state diagram that shows an
object and its attributes is called an object diagram; see Figure 15.1.
Figure 15.1: Object diagram.
The variable blank refers to a Point object, which contains two attributes. Each attribute refers
to a floating-point number.
You can read the value of an attribute using the same syntax:
>>> x = blank.x
>>> print (x) 3.0
The expression blank.x means, “Go to the object blank refers to and get the value of x.”. There
is no conflict between the variable x and the attribute x.
You can use dot notation as part of any expression. For example:
>>> print '(%g, %g)' % (blank.x, blank.y) (3.0, 4.0)
>>> distance = math.sqrt(blank.x**2 + blank.y**2)
>>> print distance 5.0
You can pass an instance as an argument in the usual way. For example:
def print_point(p):
print '(%g, %g)' % (p.x, p.y)
print_point takes a point as an argument and displays it in mathematical notation. To invoke it,
you can pass blank as an argument:
>>> print_point(blank) (3.0, 4.0)
Inside the function, p is an alias for blank, so if the function modifies p, blank changes.

Dept. of CSE, AIEMS
15.3 Rectangles
Sometimes it is obvious what the attributes of an object should be, but other times you have to
make decisions.
For example, imagine you are designing a class to represent rectangles. What attributes would
you use to specify the location and size of a rectangle?.
You could specify one corner of the rectangle (or the center), the width, and the height. Here is
the class definition:
class Rectangle(object):
"""Represents a rectangle.
attributes: width, height, corner. """
The docstring lists the attributes: width and height are numbers; corner is a Point object that
specifies the lower-left corner.
To represent a rectangle, you have to instantiate a Rectangle object and assign values to the
attributes:
box = Rectangle() box.width = 100.0
box.height = 200.0 box.corner = Point() box.corner.x = 0.0
box.corner.y = 0.0
The expression box.corner.x means, “Go to the object box refers to and select the attribute
named corner; then go to that object and select the attribute named x.”
Figure 15.2 shows the state of this object. An object that is an attribute of another object is
embedded.
15.4 Instances as return values

Functions can return instances. For example, find_center takes a Rectangle as an argument and
returns a Point that contains the coordinates of the center of the Rectangle:
def find_center(rect): p = Point()
p.x = rect.corner.x + rect.width/2.0
p.y = rect.corner.y + rect.height/2.0 return p

Dept. of CSE, AIEMS
Here is an example that passes box as an argument and assigns the resulting Point to center:
>>> center = find_center(box)
>>> print_point(center) (50.0, 100.0)
15.5 Objects are mutable

You can change the state of an object by making an assignment to one of its attributes.
box.width = box.width + 50 box.height = box.width + 100
You can also write functions that modify objects. For example, grow_rectangle takes a Rectangle
object and two numbers, dwidth and dheight, and adds the numbers to the width and height of
the rectangle:
def grow_rectangle(rect, dwidth, dheight): rect.width += dwidth
rect.height += dheight
Here is an example that demonstrates the effect:

>>> print (box.width) 100.0
>>> print (box.height) 200.0
>>> grow_rectangle(box, 50, 100)
>>> print (box.width) 150.0
>>> print (box.height) 300.0
Inside the function, rect is an alias for box, so if the function modifies rect, box changes.
15.6 Copying
The copy module contains a function called copy that can duplicate any object:
>>> p1 = Point()
>>> p1.x = 3.0
>>> p1.y = 4.0
>>> import copy

>>> p2 = copy.copy(p1)
p1 and p2 contain the same data, but they are not the same Point.
>>> print_point(p1) (3.0, 4.0)
>>> print_point(p2) (3.0, 4.0)
>>> p1 is p2 False
>>> p1 == p2
False
The is operator indicates that p1 and p2 are not the same object, which is what we expected. But
you might have expected == to yield True because these points contain the same data. In that
case, you will be disappointed to learn that for instances, the default behavior of the ==operator
is the same as the is operator; it checks object identity, not object equivalence. This behavior can
be changed—we’ll see how later.

Dept. of CSE, AIEMS
If you use copy.copy to duplicate a Rectangle, you will find that it copies the Rectangle object but
not the embedded Point.
>>> box2 = copy.copy(box)
>>> box2 is box False
>>> box2.corner is box.corner True
Figure 15.3 shows what the object diagram looks like. This operation is called a shallow copy
because it copies the object and any references it contains, but not the embedded objects.
Fortunately, the copy module contains a method named deepcopy that copies not only the
object but also the objects it refers to, and the objects they refer to, and so on. You will not be
surprised to learn that this operation is called a deep copy.
>>> box3 = copy.deepcopy(box)
>>> box3 is box False
>>> box3.corner is box.corner False
box3 and box are completely separate objects.
15.7 Debugging
When you start working with objects, you are likely to encounter some new exceptions. If you
try to access an attribute that doesn’t exist, you get an AttributeError:
>>> p = Point()
>>> print p.z
AttributeError: Point instance has no attribute 'z'
If you are not sure what type an object is, you can ask:
>>> type(p)
<type ' main .Point'>
If you are not sure whether an object has a particular attribute, you can use the built-in
function hasattr:
>>> hasattr(p, 'x') True
>>> hasattr(p, 'z') False
The first argument can be any object; the second argument is a string that contains the name of
the attribute.

Dept. of CSE, AIEMS
15.8 Glossary
class:
A user-defined type. A class definition creates a new class object.
class object:
An object that contains information about a user-defined type. The class object can be used to
create instances of the type.
instance:
An object that belongs to a class.
attribute:
One of the named values associated with an object.
embedded (object):
An object that is stored as an attribute of another object.
shallow copy:
To copy the contents of an object, including any references to embedded objects; implemented
by the copy function in the copy module.
deep copy:
To copy the contents of an object as well as any embedded objects, and any objects embedded
in them, and so on; implemented by the deepcopy function in the copy module.
object diagram:
A diagram that shows objects, their attributes, and the values of the attributes.
Instead of using the normal statements to access attributes, you can use the following functions
The getattr(obj, name[, default]) − to access the attribute of object.
The hasattr(obj,name) − to check if an attribute exists or not.
The setattr(obj,name,value) − to set an attribute. If attribute does not exist, then it would be
created.
The delattr(obj, name) − to delete an attribute.
Examples:
hasattr(emp1, 'salary') # Returns true if 'salary' attribute exists
getattr(emp1, 'salary') # Returns value of 'salary' attribute setattr(emp1,
'salary', 7000) # Set attribute 'salary' at 7000 delattr(emp1, 'salary') # Delete
attribute 'salary'

Dept. of CSE, AIEMS
Chapter 16: Classes and functions

16.1 Time
As another example of a user-defined type, we’ll define a class called Time that records the
time of day. The class definition looks like this:
class Time(object):
"""Represents the time of day.
attributes: hour, minute, second """
We can create a new Time object and assign attributes for hours, minutes, and seconds:
time = Time() time.hour = 11
time.minute = 59
time.second = 30
The state diagram for the Time object looks like Figure 16.1.
16.2 Pure functions

We’ll write two functions that add time values. They demonstrate two kinds of functions: pure
functions and modifiers. They also demonstrate a development plan called prototype and patch,
which is a way of tackling a complex problem by starting with a simple prototype and
incrementally dealing with the complications.
Here is a simple prototype of add_time:

def add_time(t1, t2):
sum = Time()
sum.hour = t1.hour + t2.hour
sum.minute = t1.minute + t2.minute
sum.second = t1.second + t2.second
return sum
The function creates a new Time object, initializes its attributes, and returns a reference to the
new object. This is called a pure function because it does not modify any of the objects passed
to it as arguments and it has no effect, like displaying a value or getting user input, other than
returning a value.
To test this function, We’ll create two Time objects:

start contains the start time of a movie, like Monty Python and the Holy Grail, and
duration contains the run time of the movie, which is one hour 35 minutes.

Dept. of CSE, AIEMS
add_time figures out when the movie will be done.

>>> start = Time()
>>> start.hour = 9
>>> start.minute = 45
>>> start.second = 0
>>> duration = Time()

>>> duration.hour = 1
>>> duration.minute = 35
>>> duration.second = 0
>>> done = add_time(start, duration)

>>> print_time(done) 10:80:00
The result, 10:80:00 might not be what you were hoping for. The problem is that this function
does not deal with cases where the number of seconds or minutes adds up to more than sixty.
Here’s an improved version:
def add_time(t1, t2): sum = Time()
sum.hour = t1.hour + t2.hour
sum.minute = t1.minute + t2.minute
sum.second = t1.second + t2.second
if sum.second >= 60:

sum.second -= 60
sum.minute += 1
if sum.minute >= 60:

sum.minute -= 60
sum.hour += 1 return sum
16.3 Modifiers
modifiers are the functions that modify the objects it gets as parameters and the changes are
visible to the caller.
increment, which adds a given number of seconds to a Time object, can be written naturally as
a modifier. Here is a rough draft:
def increment(time, seconds):
time.second += seconds
if time.second >= 60:

time.second -= 60
time.minute += 1
if time.minute >= 60:

time.minute -= 60
time.hour += 1

Dept. of CSE, AIEMS
Is this function correct? What happens if the parameter seconds is much greater than sixty?
In that case, it is not enough to carry once; we have to keep doing it until time.second is less than
sixty. One solution is to replace the if statements with while statements. That would make the
function correct, but not very efficient.
Anything that can be done with modifiers can also be done with pure functions. In fact, some
programming languages only allow pure functions. There is some evidence that programs that
use pure functions are faster to develop and less error-prone than programs that use modifiers.
But modifiers are convenient at times, and functional programs tend to be less efficient.
In general, I recommend that you write pure functions whenever it is reasonable and resort to
modifiers only if there is a compelling advantage. This approach might be called a functional
programming style.
16.4 Prototyping versus planning

The development plan I am demonstrating is called “prototype and patch.” For each function, I
wrote a prototype that performed the basic calculation and then tested it, patching errors along
the way.
This approach can be effective, especially if you don’t yet have a deep understanding of the
problem. But incremental corrections can generate code that is unnecessarily complicated—
since it deals with many special cases—and unreliable—since it is hard to know if you have found
all the errors.
An alternative is planned development, in which high-level insight into the problem can make
the programming much easier. In this case, the insight is that a Time object is really a three-digit
number in base 60 (see http://en.wikipedia.org/wiki/Sexagesimal.)! The second attribute is the
“ones column,” the minute attribute is the “sixties column,” and the hour attribute is the “thirty-
six hundreds column.”
When we wrote add_time and increment, we were effectively doing addition in base 60, which
is why we had to carry from one column to the next.
This observation suggests another approach to the whole problem—we can convert Time
objects to integers and take advantage of the fact that the computer knows how to do integer
arithmetic.
Here is a function that converts Times to integers:
def time_to_int(time):
minutes = time.hour * 60 + time.minute
seconds = minutes * 60 + time.second
return seconds

Dept. of CSE, AIEMS
And here is the function that converts integers to Times (recall that divmod divides the first
argument by the second and returns the quotient and remainder as a tuple).
def int_to_time(seconds): time = Time()
minutes, time.second = divmod(seconds, 60)
time.hour, time.minute = divmod(minutes, 60)
return time
One way to test them is to check that time_to_int(int_to_time(x)) == x for many values
of x. This is an example of a consistency check.
Once you are convinced they are correct, you can use them to rewrite add_time:
seconds = time_to_int(t1) + time_to_int(t2)
return int_to_time(seconds)
This version is shorter than the original, and easier to verify.
16.5 Debugging
A Time object is well-formed if the values of minute and second are between 0 and 60 (including
0 but not 60) and if hour is positive. hour and minute should be integral values, but we might
allow second to have a fraction part.
Requirements like these are called invariants because they should always be true. To put it a
different way, if they are not true, then something has gone wrong.
Writing code to check your invariants can help you detect errors and find their causes. For
example, you might have a function like valid_time that takes a Time object and returns False if
it violates an invariant:
def valid_time(time):
if time.hour < 0 or time.minute < 0 or time.second < 0:
return False
if time.minute >= 60 or time.second >= 60:

return False
return True
Then at the beginning of each function you could check the arguments to make sure they are
valid:
if not valid_time(t1) or not valid_time(t2):
raise ValueError('invalid Time object in add_time')


Dept. of CSE, AIEMS
Or you could use an assert statement, which checks a given invariant and raises an exception if
it fails:
assert valid_time(t1) and valid_time(t2)

assert statements are useful because they distinguish code that deals with normal conditions
from code that checks for errors.
16.6 Glossary
prototype and patch:
A development plan that involves writing a rough draft of a program, testing, and correcting
errors as they are found.
planned development:
A development plan that involves high-level insight into the problem and more planning than
incremental development or prototype development.
pure function:
A function that does not modify any of the objects it receives as arguments. Most pure
functions are fruitful.
modifier:
A function that changes one or more of the objects it receives as arguments. Most modifiers are
fruitless.
functional programming style:

A style of program design in which the majority of functions are pure.
invariant:
A condition that should always be true during the execution of a program.
divmod(x, y)
The divmod() method takes two numbers and returns a pair of numbers (a tuple) consisting of
their quotient and remainder.
The divmod() takes two parameters:

 x - a non-complex number (numerator)
 y - a non-complex number (denominator)
Return Value from divmod() The divmod() returns

 (q, r) - a pair of numbers (a tuple) consisting of quotient q and remainder r

Dept. of CSE, AIEMS
Examples:
print('divmod(8, 3) = ', divmod(8, 3))
Answers:
divmod(8, 3) = (2, 2)
divmod(3, 8) = (0, 3)
divmod(5, 5) = (1, 0)

Dept. of CSE, AIEMS
Chapter 17: Classes and methods

17.1 Object-oriented features
Python is an object-oriented programming language, which means that it provides features
that support object-oriented programming.
A method is a function that is associated with a particular class. We will define methods for
user-defined types.
Methods are semantically the same as functions, but there are two syntactic differences:
 Methods are defined inside a class definition in order to make the relationship between the
class and the method explicit.
 The syntax for invoking a method is different from the syntax for calling a function.
17.2 Printing objects

In Chapter 16, we defined a class named Time and in Exercise 1, you wrote a function named
print_time:
class Time(object):
"""Represents the time of day."""
def print_time(time):
print ('%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second))
To call this function, you have to pass a Time object as an argument:

>>> start = Time()
>>> start.hour = 9
>>> start.minute = 45
>>> start.second = 00
>>> print_time(start) 09:45:00
To make print_time a method, all we have to do is move the function definition inside the class
definition. Notice the change in indentation.
class Time(object):
def print_time(time):
print '%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second)
Two ways to call print_time.

The first (and less common) way is to use function syntax:
>>> Time.print_time(start) 09:45:00
In this use of dot notation, Time is the name of the class, and print_time is the name of the
method. start is passed as a parameter.

Dept. of CSE, AIEMS
The second (and more concise) way is to use method syntax:

>>> start.print_time() 09:45:00
In this use of dot notation, print_time is the name of the method (again), and start is the object
the method is invoked on, which is called the subject.
Inside the method, the subject is assigned to the first parameter, so in this case start is assigned
to time.
By convention, the first parameter of a method is called self, so it would be more common to
write print_time like this:
class Time(object): def print_time(self):

print ('%.2d:%.2d:%.2d' % (self.hour, self.minute, self.second))
17.3 Another example

Here’s a version of increment (from Section 16.3) rewritten as a method:
# inside class Time:
def increment(self, seconds):

seconds += self.time_to_int()
This version assumes that time_to_int is written as a method, as in Exercise 1. Here’s how you
would invoke increment:
>>> start.print_time() 09:45:00
>>> end = start.increment(1337)
>>> end.print_time() 10:07:17
The subject, start, gets assigned to the first parameter, self. The argument, 1337, gets assigned to
the second parameter, seconds.
17.4 A more complicated example

is_after (from Exercise 2) is slightly more complicated because it takes two Time objects as
parameters. In this case it is conventional to name the first parameter self and the second
parameter other:
def is_after(self, other):

return self.time_to_int() > other.time_to_int()
To use this method, you have to invoke it on one object and pass the other as an argument:
>>> end.is_after(start) True

Dept. of CSE, AIEMS
17.5 The init method

The init method (short for “initialization”) is a special method that gets invoked when an object
is instantiated. Its full name is init (two underscore characters, followed by init, and then two
more underscores).
Example:
def init (self, hour=0, minute=0, second=0):

self.hour = hour
self.minute = minute
self.second = second
It is common for the parameters of init to have the same names as the attributes.
The parameters are optional, so if you call Time with no arguments, you get the default values.
Examples:
>>> time = Time()
>>> time.print_time() 00:00:00
If you provide one argument, it overrides hour:

>>> time = Time (9)
If you provide two arguments, they override hour and minute.

>>> time = Time(9, 45)
And if you provide three arguments, they override all three default values.
17.6 The __str__ method

str is a special method, that is supposed to return a string representation of an object. For
example, here is a str method for Time objects:
def str (self):

return '%.2d:%.2d:%.2d' % (self.hour, self.minute, self.second)
When you print an object, Python invokes the str method:

>>> time = Time(9, 45)
>>> print (time) 09:45:00

Dept. of CSE, AIEMS
17.7 Operator overloading

By defining other special methods, you can specify the behavior of operators on user-defined
types. For example, if you define a method named add for the Time class, you can use the
+ operator on Time objects.
Here is what the definition might look like:
def add (self, other):

seconds = self.time_to_int() + other.time_to_int()
And here is how you could use it:

>>> start = Time(9, 45)
>>> duration = Time(1, 35)
>>> print (start + duration) 11:20:00
When you apply the + operator to Time objects, Python invokes add . When you print the result,
Python invokes str .
Changing the behavior of an operator so that it works with user-defined types is called operator
overloading. For more details see
http://docs.python.org/2/reference/datamodel.html#specialnames.
17.8 Type-based dispatch

You might want to add an integer to a Time object. The following is a version of add that checks
the type of other and invokes either add_time or increment:
def add (self, other):

if isinstance(other, Time):
return self.add_time(other)
else:
return self.increment(other)
def add_time(self, other):

seconds = self.time_to_int() + other.time_to_int()
def increment(self, seconds):

seconds += self.time_to_int()
The built-in function isinstance takes a value and a class object, and returns True if the value is an
instance of the class.
If other is a Time object, add invokes add_time. Otherwise it assumes that the parameter is a
number and invokes increment. This operation is called a type-based dispatch because it

Dept. of CSE, AIEMS
dispatches the computation to different methods based on the type of the arguments.
Here are examples that use the + operator with different types:
>>> start = Time(9, 45)
>>> duration = Time(1, 35)
>>> print (start + duration) 11:20:00
>>> print (start + 1337) 10:07:17
Unfortunately, this implementation of addition is not commutative. If the integer is the first
operand, you get
>>> print (1337 + start)
TypeError: unsupported operand type(s) for +: 'int' and 'instance'
The problem is, instead of asking the Time object to add an integer, Python is asking an integer
to add a Time object, and it doesn’t know how to do that.
But there is a clever solution for this problem: the special method radd , which stands for “right-
side add.” This method is invoked when a Time object appears on the right side of the +
operator. Here’s the definition:
def radd (self, other):

return self. add (other)
And here’s how it’s used:

>>> print (1337 + start) 10:07:17
17.9 Polymorphism
The word polymorphism means having many forms. In programming, polymorphism means same
function name (but different signatures) being uses for different types.
Many of the functions we wrote for strings will actually work for any kind of sequence. For
example, we used histogram to count the number of times each letter appears in a word.
def histogram(s):
d = dict() for c in s:
if c not in d:
d[c] = 1
else:
d[c] = d[c]+1 return d
This function also works for lists, tuples, and even dictionaries, as long as the elements of s are
hashable, so they can be used as keys in d.
>>> t = ['spam', 'egg', 'spam', 'spam', 'bacon', 'spam']
>>> histogram(t)
{'bacon': 1, 'egg': 1, 'spam': 4}
Functions that can work with several types are called polymorphic.

Dept. of CSE, AIEMS
17.10 Debugging
It is legal to add attributes to objects at any point in the execution of a program. It is usually a
good idea to initialize all of an object’s attributes in the init method.
If you are not sure whether an object has a particular attribute, you can use the built-in
function hasattr.
Another way to access the attributes of an object is through the special attribute dict , which
is a dictionary that maps attribute names (as strings) and values:
>>> p = Point(3, 4)
>>> print p. dict
{'y': 4, 'x': 3}
For purposes of debugging, you might find it useful to keep this function handy:
def print_attributes(obj):
for attr in obj. dict :
print (attr, getattr(obj, attr))
print_attributes traverses the items in the object’s dictionary and prints each attribute name and
its corresponding value.
The built-in function getattr takes an object and an attribute name (as a string) and returns the
attribute’s value.
17.11 Interface and implementation

A design principle is to keep interfaces separate from implementations. For objects, that means
that the methods a class provides should not depend on how the attributes are represented.
For example, in this chapter we developed a class that represents a time of day. Methods
provided by this class include time_to_int, is_after, and add_time.
We could implement those methods in several ways. The details of the implementation depend
on how we represent time. In this chapter, the attributes of a Time object are hour, minute, and
second.
As an alternative, we could replace these attributes with a single integer representing the number
of seconds since midnight. This implementation would make some methods, like is_after, easier
to write, but it makes some methods harder.
After you deploy a new class, you might discover a better implementation. If other parts of the
program are using your class, it might be time-consuming and error-prone to change the
interface.
But if you designed the interface carefully, you can change the implementation without changing
the interface, which means that other parts of the program don’t have to change.
Keeping the interface separate from the implementation means that you have to hide the
attributes. Code in other parts of the program (outside the class definition) should use methods
to read and modify the state of the object. They should not access the attributes directly. This
Ramesh Babu N, Assc. Prof.,
Dept. of CSE, AIEMS Page 23 of 72
principle is called information hiding; see http://en.wikipedia.org/wiki/Information_hiding.
17.12 Glossary
object-oriented language:
A language that provides features, such as user-defined classes and method syntax, that
facilitate object-oriented programming.
object-oriented programming:
A style of programming in which data and the operations that manipulate it are organized into
classes and methods.
method:
A function that is defined inside a class definition and is invoked on instances of that class.
subject:
The object a method is invoked on.
operator overloading:
Changing the behavior of an operator like + so it works with a user-defined type.
type-based dispatch:
A programming pattern that checks the type of an operand and invokes different functions for
different types.
polymorphic:
Pertaining to a function that can work with more than one type.
information hiding:
The principle that the interface provided by an object should not depend on its
implementation, in particular the representation of its attributes.
Emulating numeric types

The following methods can be defined to emulate numeric objects.
object. add (self, other) object. sub (self, other) object. mul (self, other) object.
floordiv (self, other) object. mod (self, other) object. divmod (self, other)
object. pow (self, other[, modulo]) object. lshift (self, other) object. rshift (self,
other) object. and (self, other)
object. xor (self, other)
object. or (self, other)
For instance, to evaluate the expression x + y, where x is an instance of a class that has an
add () method, x. add (y) is called.
isinstance(object, classinfo)
The isinstance() function checks if the object (first argument) is an instance or subclass of
classinfo class (second argument).
The isinstance() takes two parameters:

 object - object to be checked
 classinfo - class, type, or tuple of classes and types

Return Value
The isinstance() returns:
 True if the object is an instance or subclass of a class, or any element of the tuple
 False otherwise
Example:
class Foo:
a = 5 fooInstance = Foo()
print(isinstance(fooInstance, Foo)) print(isinstance(fooInstance, (list, tuple)))

print(isinstance(fooInstance, (list, tuple, Foo)))

Chapter 18: Inheritance

Inheritance is the ability to define a new class that is a modified version of an existing class.
18.1 Card objects

There are fifty-two cards in a deck, each of which belongs to one of four suits and one of thirteen ranks.
The suits are Spades, Hearts, Diamonds, and Clubs (in descending order in bridge). The ranks are Ace, 2,
3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, and King.
To represent a playing card, create a class card with attributes: rank and suit. Type of attributes: strings
or integers.
If type is string, difficult to compare cards. If type is integer, easy to compare.
Encoding: this table shows the suits and the corresponding integer codes:
Spades  3
Hearts  2
Diamonds  1
Clubs  0
The mapping for ranks

Jack  11
Queen  12
King  13
The class definition for Card looks like this:

class Card:
"""Represents a standard playing card."""
def __init__(self, suit=0, rank=2):
self.suit = suit
self.rank = rank
As usual, the init method takes an optional parameter for each attribute. The default card
is the 2 of Clubs.
To create a Card, you call Card with the suit and rank of the card you want.
queen_of_diamonds = Card(1, 12)
18.2 Class attributes

To print Card objects in a way that people can easily read, we need a mapping from the integer codes to
the corresponding ranks and suits.
# inside class Card:

suit_names = ['Clubs', 'Diamonds', 'Hearts', 'Spades']
rank_names = [None, 'Ace', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'Jack', 'Queen', 'King']
def __str__(self):
return '%s of %s' % (Card.rank_names[self.rank], Card.suit_names[self.suit])

Variables like suit_names and rank_names, are called class attributes because they are associated
with the class object Card.
suit and rank, are called instance attributes because they are associated with a particular instance.
Both kinds of attribute are accessed using dot notation. Every card has its own suit and rank, but there
is only one copy of suit_names and rank_names.
The first element of rank_names is None because there is no card with rank zero.
With the methods we have so far, we can create and print cards:
>>> card1 = Card(2, 11)
>>> print(card1)
Jack of Hearts
18.3 Comparing cards

For built-in types, there are relational operators (<, >, ==, etc.) that compare values. For programmer-
defined types, we can override the behavior of the built-in operators by providing a method named
__lt__, which stands for “less than”.
__lt__ takes two parameters, self and other, and returns True if self is strictly less than
other.
Assume suit is more important, so all of the Spades outrank all of the Diamonds, and so on.
With that decided, we can write __lt__:

def __lt__(self, other):
# check the suits
if self.suit < other.suit: return True
if self.suit > other.suit: return False
# suits are the same... check ranks

return self.rank < other.rank
You can write this more concisely using tuple comparison:

def __lt__(self, other):
t1 = self.suit, self.rank
t2 = other.suit, other.rank
return t1 < t2
18.4 Decks
A deck is made up of cards, it is natural for each Deck to contain a list of cards as an attribute.
The following is a class definition for Deck. The init method creates the attribute cards and
generates the standard set of fifty-two cards:
class Deck:
def __init__(self):
self.cards = []
for suit in range(4):

for rank in range(1, 14):
card = Card(suit, rank)
self.cards.append(card)
Each iteration creates a new Card with the current suit and rank, and appends it to self.cards.
18.5 Printing the deck

Here is a __str__ method for Deck:
#inside class Deck:

def __str__(self):
res = []
for card in self.cards:

res.append(str(card))
return '\n'.join(res)
This method demonstrates an efficient way to accumulate a large string: building a list of strings and
then using the string method join.
Since we invoke join on a newline character, the cards are separated by newlines. Here’s what the result
looks like:
>>> deck = Deck()
>>> print(deck)
Ace of Clubs
2 of Clubs
3 of Clubs
…

18.6 Add, remove, shuffle and sort

To add a card, we can use the list method append():
#inside class Deck:
def add_card(self, card):
self.cards.append(card)
To remove a card, we can use the list method pop(): removes the last element in the list.
#inside class Deck:
def pop_card(self):
return self.cards.pop()
To shuffle the deck, we can use the random.shuffle( ) method:

# inside class Deck:
def shuffle(self):
random.shuffle(self.cards)
To sort cards, we can use the list method sort():

#inside class Deck:
def sort(self):
return self.cards.sort()
A method that uses another method without doing much work is sometimes called a veneer.
18.7 Inheritance
Inheritance is the ability to define a new class that is a modified version of an existing class.
To define a new class that inherits from an existing class, you put the name of the existing
class in parentheses:
class Hand(Deck):
"""Represents a hand of playing cards."""
def __init__(self, label=''):

self.cards = [ ] #represents initial condition (no cards)
self.label = label
class Hand represents the cards held by one player.
This definition indicates that Hand inherits from Deck; that means we can use methods like
pop_card and add_card for Hands as well as Decks.
When a new class inherits from an existing one, the existing one is called the parent and
the new class is called the child.
If we provide an init method in the Hand class, it overrides the one in the Deck class:
# inside class Hand:
def __init__(self, label=''):
self.cards = []
self.label = label

When you create a Hand, Python invokes this init method, not the one in Deck.
>>> hand = Hand('new hand')
>>> hand.cards
[]
>>> hand.label
'new hand'
The other methods are inherited from Deck, so we can use pop_card and add_card to deal
a card:
>>> deck = Deck()
>>> card = deck.pop_card()
>>> hand.add_card(card)
>>> print(hand)
King of Spades
A natural next step is to encapsulate this code in a method called move_cards:

#inside class Deck:
def move_cards(self, hand, num):
for i in range(num):
hand.add_card(self.pop_card())
move_cards takes two arguments, a Hand object and the number of cards to deal. It modifies
both self and hand, and returns None.
Advantages:
 Programs that would be repetitive without inheritance can be written more elegantly with it.
 Facilitates code reuse
 Inheritance structure reflects the natural structure of the problem, which makes the
design easier to understand.
Disadvantages:
 Inheritance can make programs difficult to read.
How to play Poker: https://www.youtube.com/watch?v=CpSewSHZhmo
18.8 Class diagrams

A class diagram is a more abstract representation of the structure of a program. A class diagram is a
graphical representation of has-a, is-a and dependency relationships.
There are several kinds of relationship between classes:

 Objects in one class might contain references to objects in another class. For example,
each Rectangle contains a reference to a Point. This kind of relationship is called HAS-A, as in,
“a Rectangle has a Point.”
 One class might inherit from another. This relationship is called IS-A, as in, “a Hand
is a kind of a Deck.”
 One class might depend on another in the sense that objects in one class take objects
in the second class as parameters. This kind of relationship is called a dependency.
The arrow with a hollow triangle head represents an IS-A relationship; in this case it indicates
that Hand inherits from Deck.
The standard arrow head represents a HAS-A relationship; in this case a Deck has references
to Card objects.
The star (*) near the arrow head is a multiplicity; it indicates how many Cards a Deck has.
A multiplicity can be a simple number, like 52, a range, like 5..7 or a star, which indicates
that a Deck can have any number of Cards.

Module 5:
[You will learn:
Web Scraping, Project: MAPIT.PY with the webbrowser Module, Downloading Files from the Web with
the requests Module, Saving Downloaded Files to the Hard Drive, HTML, Parsing HTML with the
BeautifulSoup Module, Project: “I’m Feeling Lucky” Google Search,Project: Downloading All XKCD
Comics, Controlling the Browser with the selenium Module,
Working with Excel Spreadsheets, Excel Documents, Installing the openpyxl Module, Reading Excel
Documents, Project: Reading Data from a Spreadsheet, Writing Excel Documents, Project: Updating a
Spreadsheet, Setting the Font Style of Cells, Font Objects, Formulas, Adjusting Rows and Columns,
Charts,
Working with PDF and Word Documents, PDF Documents, Project: Combining Select Pages from Many
PDFs, Word Documents,
Working with CSV files and JSON data, The csv Module, Project: Removing the Header from CSV Files,
JSON and APIs, The json Module, Project: Fetching Current Weather Data]
Textbook 1: Chapters 11 – 14
Chapter 11 – Web Scraping

Web scraping is the term for using a program to download and process content from the Web.
Example, Google runs many web scraping programs to index web pages for its search engine.
There are modules that make it easy to scrape web pages in Python.
webbrowser Comes with Python and opens a browser to a specific page.
Requests Downloads files and web pages from the Internet.
Beautiful Soup Parses HTML, the format that web pages are written in.
Selenium Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this
browser.
Project: mapIt.py with the webbrowser Module

The webbrowser module’s open() function can launch a new browser to a specified URL. Enter the following
into the interactive shell:
>>> import webbrowser

>>> webbrowser.open('http://inventwithpython.com/')
A web browser tab will open to the URL http://inventwithpython.com/.
Usecase: To automatically launch the map in your browser using the contents of your clipboard/passed
as command line arguments.
Steps:
Step 1: Figure Out the URL
Step 2: Handle the Command Line Arguments
Step 3: Handle the Clipboard Content and Launch the Browser

Program:
#! python3
# mapIt.py - Launches a map in the browser using an address #from the command line or clipboard.
import webbrowser, sys, pyperclip
if len(sys.argv) > 1:
# Get address from command line.
address = ' '.join(sys.argv[1:])
else:
# Get address from clipboard.
address = pyperclip.paste()
webbrowser.open('https://www.google.com/maps/place/' + address)
Execution 1: From command line

C:\> mapit Abbanakuppe
C:\> mapit 870 Valencia St, San Francisco, CA 94110
Google Map will be opened in browser
Execution 2: Using clipboard

Copy the address to clipboard
Execute the program
Google Map will be opened in browser
Downloading Files from the Web with the requests Module

The requests module lets you easily download files from the Web. To install requests module, from the
command line, run pip install requests.
Downloading a Web Page with the requests.get() Function

The requests.get() function takes a string of a URL to download. The get() method sends a GET request
to the specified url. The get() method returns a requests.Response object.
Example:
>>> import requests
>>> res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
>>> type(res)
<class 'requests.models.Response'>

>>> res.status_code == requests.codes.ok

True
>>> len(res.text)
178981
>>> print(res.text[:250])
The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare. This eBook is for the use of
anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or
re-use it under the terms of the Projec
The URL goes to a text web page for the entire play of Romeo and Juliet. Response status code
res.status_code is OK then page was downloaded successfully(Error: 404 Not Found).
Checking for Errors

A simpler way to check for success is to call the raise_for_status() method on the Response object. This will
raise an exception if there was an error downloading the file and will do nothing if the download
succeeded.
Example:
import requests
res = requests.get('http://inventwithpython.com/page_that_does_not_exist')
try:
res.raise_for_status()
except Exception as exc:
print('There was a problem: %s' % (exc)) # There was a problem: 404 Client Error: Not Found
Saving Downloaded Files to the Hard Drive

The complete process for downloading and saving a file:
1. Call requests.get() to download the file.
2. Call open() with 'wb' to create a new file in write binary mode. (to write binary data instead of text data
in order to maintain the Unicode encoding of the text.)
3. Loop over the Response object’s iter_content() method.
The iter_content() method returns “chunks” of the content(bytes) on each iteration through the
loop.
4. Call write() on each iteration to write the content to the file.
5. Call close() to close the file.
Example:
>>> import requests
>>> res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
>>> res.raise_for_status()
>>> playFile = open('RomeoAndJuliet.txt', 'wb')
>>> for chunk in res.iter_content(100000): # 100000 bytes
playFile.write(chunk)
100000
78981
>>> playFile.close()

HTML
Hypertext Markup Language (HTML) is the format that web pages are written in.
Resources for Learning HTML

Beginner tutorial:
 http://htmldog.com/guides/html/beginner/
 http://www.codecademy.com/tracks/web/
 https://developer.mozilla.org/en-US/learn/html/
A Quick Refresher
An HTML file is a plaintext file with the .html file extension.
The text in these files is surrounded by tags, which are words enclosed in angle brackets. The tags tell the browser
how to format the web page.
A starting tag and closing tag can enclose some text to form an element.
The text (or inner HTML) is the content between the starting and closing tags.
Example: To display Hello world! in the browser, with Hello in bold:

Hello world!
Viewing the Source HTML of a Web Page

To look at the HTML source of the web pages: Mouse right-click (or ctrl-click on OS X) any web page in
your web browser, and select View Source or View page source to see the HTML text of the page.

Opening Your Browser’s Developer Tools

You can look through a page’s HTML using your browser’s developer tools. In Chrome and Internet
Explorer for Windows, press F12 to make them appear, Pressing F12 again will make the developer tools
disappear.
In Chrome, you can also bring up the developer tools by selecting ViewDeveloperDeveloper Tools.
Using the Developer Tools to Find HTML Elements

Once your program has downloaded a web page using the requests module, you will have the page’s
HTML content as a single string value.
To figure out which part of the HTML corresponds to the information on the web page you’re interested
in:
Right-click where it is on the page (or control-click on OS X) and select Inspect Element from the context
menu that appears. This will bring up the Developer Tools window, which shows you the HTML that
produces this particular part of the web page.
Example:
Program to pull weather forecast data from http://weather.gov/. Visit the site and search for the 94105
ZIP code, the site will take you to a page showing the forecast for that area.
The HTML responsible for the temperature part of the web page is 59°F.

Parsing HTML with the BeautifulSoup Module

Beautiful Soup is a module for extracting information from an HTML. To install it, you will need to run pip
install beautifulsoup4 from the command line.
While beautifulsoup4 is the name used for installation, to import Beautiful Soup you run import bs4.
Creating a BeautifulSoup Object from HTML

The bs4.BeautifulSoup(html_str)
html_str: string containing the HTML it will parse,
returns a BeautifulSoup object.
File: example.html

<html>
<head><title>The Website Title</title></head>
<body>
Download my Python book from
<a href="http://inventwithpython.com">my website</a>.

Learn Python the easy way!
By Al Sweigart

</body>
</html>
Example 1: Downloading a page from internet

>>> import requests, bs4
>>> res = requests.get('http://nostarch.com')
>>> res.raise_for_status()
>>> noStarchSoup = bs4.BeautifulSoup(res.text)
>>> type(noStarchSoup)
<class 'bs4.BeautifulSoup'>
Example 2: load an HTML file from hard drive

>>> exampleFile = open('example.html')
>>> exampleSoup = bs4.BeautifulSoup(exampleFile)
>>> type(exampleSoup)
<class 'bs4.BeautifulSoup'>
Once you have a BeautifulSoup object, you can use its methods to locate specific parts of an HTML
document.
Finding an Element with the select() Method

select( css_str )
used to retrieve a web page element from a BeautifulSoup object.
css_str: string of a CSS selector for the element you are looking for.
return a list of Tag objects,

Example:
>>> import bs4
>>> exampleFile = open('example.html')
>>> exampleSoup = bs4.BeautifulSoup(exampleFile.read())
>>> elems = exampleSoup.select('#author') #returns list of tag objects
>>> type(elems)
<class 'list'>
>>> len(elems)
1
>>> type(elems[0])
<class 'bs4.element.Tag'>
>>> str(elems[0])
'Al Sweigart'
>>> elems[0].getText()
'Al Sweigart'
>>> elems[0].attrs
{'id': 'author'}
getText() on the element returns the element’s text, or inner HTML.

str(tag_object) returns a string with the starting and closing tags and the element’s text.
attrs gives us a dictionary with the element’s attribute, 'id', and the value of the id attribute, 'author'.
Example 2: pull all the elements

>>> pElems = exampleSoup.select('p')
>>> str(pElems[0])
'Download my Python book from <a href="http://
inventwithpython.com">my website</a>.'
>>> pElems[0].getText()
'Download my Python book from my website.'
>>> str(pElems[1])
'Learn Python the easy way!'
'Learn Python the easy way!'

>>> str(pElems[2])
'By Al Sweigart'
'By Al Sweigart'
Using str() on pElems[0], pElems[1], and pElems[2] shows you each element as a string, and using
getText() on each element shows you its text.
Getting Data from an Element’s Attributes

get(attribute_name) method for Tag objects
returns that attribute’s value.
Example:
>>> import bs4
>>> soup = bs4.BeautifulSoup(open('example.html'))
>>> spanElem = soup.select('span')[0]
>>> str(spanElem)
'Al Sweigart'
>>> spanElem.get('id')
'author'
>>> spanElem.get('some_nonexistent_addr') == None
True
>>> spanElem.attrs
{'id': 'author'}
Here we use select() to find any elements and then store the first matched element in spanElem.
Passing the attribute name 'id' to get() returns the attribute’s value, 'author'.
Project: “I’m Feeling Lucky” Google Search

Google search, you can see that the result page has a URL like https://www.google.com/
search?q=SEARCH_TERM_HERE.
The user will specify the search terms using command line arguments when they launch the program.
If you look up a little from the <a> element, though, there is an element like this: <h3 class="r">. Looking
through the rest of the HTML source, it looks like the r class is used only for search result links. Use the
selector '.r a' to find all <a> elements that are within an element that has the r CSS class.
The soup.select() call returns a list of all the elements that matched your '.r a' selector, so the number of
tabs you want to open is either 5 or the length of this list
Open the first five search results in new tabs using the webbrowser module.
Step 1: Get the Command Line Arguments and Request the Search Page
Step 2: Find All the Results
Step 3: Open Web Browsers for Each Result

#! python3
# t1_ch11_google_search.py - Opens several Google search results.
import requests, sys, webbrowser, bs4, pyperclip
print('Googling...') # display text while downloading the Google page

if len(sys.argv) > 1:
search_key = ' '.join(sys.argv[1:])
else:
search_key = pyperclip.paste()
res = requests.get('http://google.com/search?q=' + search_key)
# Retrieve top search result links.

soup = bs4.BeautifulSoup(res.text)
# Open a browser tab for each result.

linkElems = soup.select('.r a') #class r child a
numOpen = min(5, len(linkElems))
for i in range(numOpen):
webbrowser.open('http://google.com' + linkElems[i].get('href'))
Project: Downloading All XKCD Comics

Program to download web pages(save), so as to read when you’re not online.
XKCD is a popular geek webcomic with a website. The front page at http://xkcd.com/ has a Prev button
that guides the user back through prior comics. Downloading each comic by hand would take forever.
Automate it.
Here’s what your program does:

 Loads the XKCD home page.
 Saves the comic image on that page.
 Follows the Previous Comic link.
 Repeats until it reaches the first comic.

Step 1: Design the Program

If you open the browser’s developer tools and inspect the elements on the page, you’ll find the
following:
 The URL of the comic’s image file is given by the href attribute of an <img> element.
 The <img> element is inside a <div id="comic"> element.
 The Prev button has a rel HTML attribute with the value prev.
 The first comic’s Prev button links to the http://xkcd.com/# URL, indicating that there are no
more previous pages.
You’ll have a url variable that starts with the value 'http://xkcd.com' and repeatedly update it (in a for loop)
with the URL of the current page’s Prev link.
At every step in the loop, you’ll download the comic at url. You’ll know to end the loop when url ends
with '#'.
You will download the image files to a folder in the current working directory named xkcd. The call
os.makedirs() ensures that this folder exists, and the exist_ok=True keyword argument prevents the function
from throwing an exception if this folder already exists.
Step 2: Download the Web Page
Step 3: Find and Download the Comic Image

From inspecting the XKCD home page with your developer tools, you know that the <img> element for
the comic image is inside a <div> element with the id attribute set to comic, so the selector '#comic img' will
get you the correct <img> element from the BeautifulSoup object.
You can get the src attribute from this <img> element and pass it to requests.get() to download the comic’s
image file.

Step 4: Save the Image and Find the Previous Comic

At this point, the image file of the comic is stored in the res variable. You need to write this image data to
a file on the hard drive.
You’ll need a filename for the local image file to pass to open(). The comicUrl will have a value like
'http://imgs.xkcd.com/comics/heartbleed _explanation.png'—which you might have noticed looks a lot like a file path.
And in fact, you can call os.path.basename() with comicUrl, and it will return just the last part of the URL,
'heartbleed_explanation.png'. You can use this as the filename when saving the image to your hard drive.
Program:
#! python3
# t1_ch11_downloadXkcd.py - Downloads every single XKCD comic.
import requests, os, bs4
url = 'http://xkcd.com' # starting url

os.makedirs('xkcd', exist_ok=True) # store comics in ./xkcd
while not url.endswith('#'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
soup = bs4.BeautifulSoup(res.text)
# Find the URL of the comic image.

comicElem = soup.select('#comic img')
if comicElem == []:
print('Could not find comic image.')
else:
comicUrl = 'http://xkcd.com'+comicElem[0].get('src')
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
# Save the image to ./xkcd

imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev button's url.

prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')
print('Done.')

Output :
Downloading page http://xkcd.com...
Downloading image http://imgs.xkcd.com/comics/phone_alarm.png...
Downloading page http://xkcd.com/1358/...
….
Controlling the Browser with the selenium Module

The selenium module lets Python directly control the browser by programmatically clicking links and filling
in login information, almost as though there is a human user interacting with the page.
Starting a Selenium-Controlled Browser

For the examples below, use the Firefox web browser.
Example:
>>> from selenium import webdriver #import selenium
>>> browser = webdriver.Firefox() #opens Firefox browser
>>> type(browser)
<class 'selenium.webdriver.firefox.webdriver.WebDriver'>
>>> browser.get('http://inventwithpython.com') #opens the url in Firefox browser
Finding Elements on the Page

WebDriver objects have quite a few methods for finding elements on a page. They are divided into the
find_element_* and find_elements_* methods.
The find_element_* methods return a single WebElement object, representing the first element on the page
that matches your query.
The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.

Except for the *_by_tag_name() methods, the arguments to all the methods are case sensitive. If no
elements exist on the page that match what the method is looking for, the selenium module raises a
NoSuchElement exception.
Once you have the WebElement object, you can find out more about it by reading the attributes or calling
the methods.
Example:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://inventwithpython.com')
try:
elem = browser.find_element_by_class_name('bookcover')
print('Found <%s> element with that class name!' % (elem.tag_name))
except:

print('Was not able to find an element with that name.')
Output:
Found <img> element with that class name!
Clicking the Page

WebElement objects
returned from the find_element_* and find_elements_* methods have a click() method that
simulates a mouse click on that element.
This method can be used to follow a link, make a selection on a radio button, click a Submit button, or
trigger whatever else might happen when the element is clicked by the mouse.
Example:
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.get('http://inventwithpython.com')
>>> linkElem = browser.find_element_by_link_text('Read It Online')
>>> type(linkElem)
<class 'selenium.webdriver.remote.webelement.WebElement'>
>>> linkElem.click() # follows the "Read It Online" link
This opens Firefox to http://inventwithpython.com/, gets the WebElement object for the <a> element with
the text Read It Online, and then simulates clicking that <a> element.
Filling Out and Submitting Forms

Sending keystrokes to text fields on a web page is a matter of finding the <input> or <textarea> element for
that text field and then calling the send_keys() method.
Example:
>>> browser.get('https://mail.yahoo.com')
>>> emailElem = browser.find_element_by_id('login-username')
>>> emailElem.send_keys('not_my_real_email')
>>> passwordElem = browser.find_element_by_id('login-passwd')
>>> passwordElem.send_keys('12345')
>>> passwordElem.submit()
Calling the submit() method on any element will have the same result as clicking the Submit button for the
form that element is in.
Sending Special Keys

Selenium has a module for keyboard keys that are impossible to type into a string value. These values
are stored in attributes in the selenium.webdriver.common.keys module.

For example, if the cursor is not currently in a text field, pressing the home and end keys will scroll the
browser to the top and bottom of the page, respectively.
Example:
>>> from selenium.webdriver.common.keys import Keys
>>> browser.get('http://nostarch.com')
>>> htmlElem = browser.find_element_by_tag_name('html')
>>> htmlElem.send_keys(Keys.END) # scrolls to bottom
>>> htmlElem.send_keys(Keys.HOME) # scrolls to top
Clicking Browser Buttons

Selenium can simulate clicks on various browser buttons as well through the following methods:
browser.back() Clicks the Back button.
browser.forward() Clicks the Forward button.
browser.refresh() Clicks the Refresh/Reload button.
browser.quit() Clicks the Close Window button.

Chapter 12 – Working with Excel Spreadsheets

Excel is a popular and powerful spreadsheet application for Windows. The openpyxl module allows your
Python programs to read and modify Excel spreadsheet files.
Free alternatives that run on Windows, OS X, and Linux. Both LibreOffice Calc and OpenOffice Calc work
with Excel’s .xlsx file format for spreadsheets, which means the openpyxl module can work on
spreadsheets from these applications as well.
Excel Documents
An Excel spreadsheet document is called a workbook.
A single workbook is saved in a file with the .xlsx extension.
Each workbook can contain multiple sheets (also called worksheets).
The sheet the user is currently viewing (or last viewed before closing Excel) is called the active sheet.
Each sheet has columns (addressed by letters starting at A) and rows (addressed by numbers starting at
1).
A box at a particular column and row is called a cell.
Each cell can contain a number or text value. The grid of cells with data makes up a sheet.
Installing the openpyxl Module

Install openpyxl using pip.
$ pip install openpyxl
If the module was correctly installed, this should produce no error messages else you’ll get a NameError:
name 'openpyxl' is not defined error.
Reading Excel Documents

File: example.xlsx
Sheet 1 in the example file should look like

Opening Excel Documents with OpenPyXL

The openpyxl.load_workbook() function takes in the filename and returns a value of the workbook object
representing the Excel file.
Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> type(wb)
<class 'openpyxl.workbook.workbook.Workbook'>
Getting Sheets from the Workbook

get_sheet_names() method, returns a list of all the sheet names in the workbook.
get_sheet_by_name(sheet_name) workbook method, returns worksheet object for the sheet

requested.
get_active_sheet() method of a Workbook object, returns the workbook’s active sheet (Workbook sheet
object).
title attribute on worksheet object returns name of the sheet.
Example:
>>> import openpyxl
>>> wb.get_sheet_names()
['Sheet1', 'Sheet2', 'Sheet3']
>>> sheet = wb.get_sheet_by_name('Sheet3')

>>> sheet
<Worksheet "Sheet3">
>>> type(sheet)
<class 'openpyxl.worksheet.worksheet.Worksheet'>
>>> sheet.title
'Sheet3'
>>> anotherSheet = wb.get_active_sheet()

>>> anotherSheet
Getting Cells from the Sheets

Once you have a Worksheet object, you can access a Cell object by its name.
Cell Attribute Description

value Get or set the value held in the cell.
row Row number of this cell (1-based)
column Column number of this cell (1-based)
column_letter Column number of this cell (A-based)
coordinate This cell’s coordinate (ex. ‘A5’)
Examlple:
>>> import openpyxl
>>> sheet['A1']
<Cell Sheet1.A1>
>>> sheet['A1'].value
datetime.datetime(2015, 4, 5, 13, 34, 2)
>>> c = sheet['B1']
>>> c.value
'Apples'
>>> 'Row ' + str(c.row) + ', Column ' + c.column_letter + ' is ' + c.value
'Row 1, Column B is Apples'
>>> 'Cell ' + c.coordinate + ' is ' + c.value

'Cell B1 is Apples'
>>> sheet['C1'].value
73
OpenPyXL will automatically interpret the dates in column A and return them as datetime values rather
than strings.
cell(row, column, value=None) #Worksheet object method

To get a cell using the sheet’s cell() method, pass integers for its row and column keyword arguments.
Example:
>>> sheet.cell(row=1, column=2)
<Cell Sheet1.B1>
>>> sheet.cell(row=1, column=2).value
'Apples'
>>> for i in range(1, 8, 2):
print(i, sheet.cell(row=i, column=2).value)
1 Apples
3 Pears
5 Apples
7 Strawberries
get_highest_row() and get_highest_column() methods of Worksheet object determines the size of the
sheet. (has been replaced with max_row and max_column)
max_column The maximum column index containing data (1-based)
max_row The maximum row index containing data (1-based)
Example:
>>> import openpyxl
>>> sheet.max_row
7
>>> sheet.max_column
3
Converting Between Column Letters and Numbers

openpyxl.utils.cell.column_index_from_string(str_col)
Convert a column name into a numerical index (‘A’ ->1)
openpyxl.utils.cell.get_column_letter(idx)
Convert a column index into a column letter (3 -> ‘C’)
Example:
""" t1_ch12_convert_colnames_numbers.py """
import openpyxl
from openpyxl.utils.cell import get_column_letter, column_index_from_string
get_column_letter(1) #'A'
get_column_letter(2) #'B'
get_column_letter(27) #'AA'
get_column_letter(900) #'AHP'
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
#get_column_letter(sheet.get_highest_column()) #'C'
get_column_letter(sheet.max_column) #'C'
column_index_from_string('A') #1
column_index_from_string('AA') #27

Getting Rows and Columns from the Sheets

You can slice Worksheet objects to get all the Cell objects in a row, column, or rectangular area of the
spreadsheet. Then you can loop over all the cells in the slice.
Example:
""" t1_ch12_getting_rows_columns.py """
import openpyxl
sheet = wb.get_sheet_by_name('Sheet1')
tuple(sheet['A1':'C3'])
"""
((<Cell Sheet1.A1>, <Cell Sheet1.B1>, <Cell Sheet1.C1>), (<Cell Sheet1.A2>,<Cell Sheet1.B2>, <Cell
Sheet1.C2>), (<Cell Sheet1.A3>, <Cell Sheet1.B3>, <Cell Sheet1.C3>))"""
for rowOfCellObjects in sheet['A1':'C3']:

for cellObj in rowOfCellObjects:
print(cellObj.coordinate, cellObj.value)
print('--- END OF ROW ---')
Output:
A1 2015-04-05 13:34:02
B1 Apples
C1 73
--- END OF ROW ---
A2 2015-04-05 03:41:23
B2 Cherries
C2 85
--- END OF ROW ---
A3 2015-04-06 12:46:51
B3 Pears
C3 14
--- END OF ROW ---
sheet[‘A1’ : ‘C3’] returns a Generator object containing the Cell objects in that area.
We can use tuple() on it to display its Cell objects in a tuple. This tuple contains three tuples: one for
each row, from the top of the desired area to the bottom.
Worksheet attributes: rows and columns

rows
Produces all cells in the worksheet, by row.
columns
Produces all cells in the worksheet, by column
Can be used to access values of cells in a particular row or column.

Example:
import openpyxl
sheet = wb.get_active_sheet()
data = sheet.columns #returns generator object containing data
#col: cell objects column wise as tuple

for col in data:
#print("",col)
for cell_obj in col:
print(cell_obj.value)
print("--END OF COLUMN--")
#To access a row

sheet[1] #returns first row data
"""
(<Cell 'Sheet1'.A1>, <Cell 'Sheet1'.B1>, <Cell 'Sheet1'.C1>)
"""
#To access a column
sheet['B'] #returns coloumn B cell objects
"""
(<Cell 'Sheet1'.B1>, <Cell 'Sheet1'.B2>, <Cell 'Sheet1'.B3>, <Cell 'Sheet1'.B4>, <Cell 'Sheet1'.B5>,
<Cell 'Sheet1'.B6>, <Cell 'Sheet1'.B7>)
"""
#Accessing coloumn B data
for cellObj in sheet['B']:
print(cellObj.value)
"""
Apples
Cherries
Pears
Oranges
Apples
Bananas
Strawberries
"""
Workbooks, Sheets, Cells
As a quick review, here’s a rundown of all the functions, methods, and data types involved in reading a
cell out of a spreadsheet file:
1. Import the openpyxl module.
2. Call the openpyxl.load_workbook() function.
3. Get a Workbook object.
4. Call the get_active_sheet() or get_sheet_by_name() workbook method.
5. Get a Worksheet object.
6. Use indexing or the cell() sheet method with row and column keyword arguments.
7. Get a Cell object.
8. Read the Cell object’s value attribute.

Project: Reading Data from a Spreadsheet

censuspopdata.xlsx A spreadsheet of data from the 2010 US Census and you have the boring task of going through
its thousands of rows to count both the total population and the number of census tracts for each county.
Each row represents a single census tract.
In this project, you’ll write a script that can read from the census spreadsheet file and calculate statistics
for each county in a matter of seconds.
This is what your program does:

 Reads the data from the Excel spreadsheet.
 Counts the number of census tracts in each county.
 Counts the total population of each county.
 Prints the results.
This means your code will need to do the following:
 Open and read the cells of an Excel document with the openpyxl module.
 Calculate all the tract and population data and store it in a data structure.
 Write the data structure.
Step 1: Read the Spreadsheet Data

There is just one sheet in the censuspopdata.xlsx spreadsheet, named 'Population by Census Tract', and each
row holds the data for a single census tract. The columns are the tract number (A), the state
abbreviation (B), the county name (C), and the population of the tract (D).
Step 2: Populate the Data Structure

The data structure stored in countyData will be a dictionary with state abbreviations as its keys. Each state
abbreviation will map to another dictionary, whose keys are strings of the county names in that state.
Each county name will in turn map to a dictionary with just two keys, 'tracts' and 'pop'. These keys map to
the number of census tracts and population for the county.
{'AK': {'Aleutians East': {'pop': 3141, 'tracts': 1},

'Aleutians West': {'pop': 5561, 'tracts': 2},
'Anchorage': {'pop': 291826, 'tracts': 55},
'Bethel': {'pop': 17013, 'tracts': 3},
'Bristol Bay': {'pop': 997, 'tracts': 1},

--snip—
If the previous dictionary were stored in countyData, the following expressions would evaluate like this:
>>> countyData['AK']['Anchorage']['pop']
291826
>>> countyData['AK']['Anchorage']['tracts']
55
More generally, the countyData dictionary’s keys will look like this:
countyData[state abbrev][county]['tracts']
countyData[state abbrev][county]['pop']
Step 3: Write the Results to a File

After the for loop has finished, the countyData dictionary will contain all of the population and tract
information keyed by county and state.
Program:
#! python3
# readCensusExcel.py - Tabulates population and number of census tracts for each county.
import openpyxl, pprint

print('Opening workbook...')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb.get_sheet_by_name('Population by Census Tract')
countyData = {}
# Fill in countyData with each county's population and tracts.

print('Reading rows...')
for row in range(2, sheet.max_row + 1):
# Each row in the spreadsheet has data for one census tract.
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
# Make sure the key for this state exists.

countyData.setdefault(state, {})
# Make sure the key for this county in this state exists.
countyData[state].setdefault(county, {'tracts': 0, 'pop': 0})
# Each row represents one census tract, so increment by one.

countyData[state][county]['tracts'] += 1
# Increase the county pop by the pop in this census tract.

countyData[state][county]['pop'] += int(pop)

# Open a new text file and write the contents of countyData to it.
print('Writing results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(countyData))
resultFile.close()
print('Done.')
Writing Excel Documents

OpenPyXL also provides ways of writing data, meaning that your programs can create and edit
spreadsheet files.
Creating and Saving Excel Documents

Call the openpyxl.Workbook() function to create a new, blank Workbook object.
>>> import openpyxl

>>> wb = openpyxl.Workbook()
['Sheet']
>>> sheet = wb.get_active_sheet()
>>> sheet.title
'Sheet'
>>> sheet.title = 'Spam Bacon Eggs Sheet'
['Spam Bacon Eggs Sheet']
The workbook will start off with a single sheet named Sheet. You can change the name of the sheet by
storing a new string in its title attribute.
save() workbook method: Saves the workbook object.

Passing a different filename than the original, such as 'example_copy.xlsx', saves the changes to a copy of
the spreadsheet.
Example:
>>> import openpyxl
>>> sheet.title = 'Spam Spam Spam'
>>> wb.save('example_copy.xlsx')
Creating and Removing Sheets

create_sheet(title=None, index=None)
Create a worksheet (at an optional index).
Parameters:
title (str) – optional title of the sheet
index (int) – optional position at which the sheet will be inserted (start at 0)
Returns
new Worksheet object named SheetX, which by default is set to be the last sheet in the
workbook.

remove_sheet(worksheet)
Remove worksheet from this workbook.
Parameters:
worksheet : a Worksheet object, not a string of the sheet name, as its argument.
Note: Deprecated: Use wb.remove(worksheet) or del wb[sheetname]
Example:
>>> import openpyxl
['Sheet']
>>> wb.create_sheet()
['Sheet', 'Sheet1']
>>> wb.create_sheet(index=0, title='First Sheet')
<Worksheet "First Sheet">
['First Sheet', 'Sheet', 'Sheet1']
>>> wb.create_sheet(index=2, title='Middle Sheet')
<Worksheet "Middle Sheet">
['First Sheet', 'Sheet', 'Middle Sheet', 'Sheet1']
['First Sheet', 'Sheet', 'Middle Sheet', 'Sheet1']
>>> wb.remove_sheet(wb.get_sheet_by_name('Middle Sheet'))
>>> wb.remove_sheet(wb.get_sheet_by_name('Sheet1'))
['First Sheet', 'Sheet']
Writing Values to Cells

Writing values to cells use the cell’s coordinate as a string, you can use it just like a dictionary key on the
Worksheet object.
Example:
>>> import openpyxl
>>> sheet = wb.get_sheet_by_name('Sheet')
>>> sheet['A1'] = 'Hello world!'
'Hello world!'

Project: Updating a Spreadsheet

Your program will look through the spreadsheet, find specific kinds of produce, and update their prices.
Each row represents an individual sale. The columns are the type of produce sold (A), the cost per pound
of that produce (B), the number of pounds sold (C), and the total revenue from the sale (D). The TOTAL
column is set to the Excel formula =ROUND(B3*C3, 2), which multiplies the cost per pound by the
number of pounds sold and rounds the result to the nearest cent. With this formula, the cells in the
TOTAL column will automatically update themselves if there is a change in column B or C.
To update the cost per pound for any garlic, celery, and lemon rows.
This means your code will need to do the following:

 Open the spreadsheet file.
 For each row, check whether the value in column A is Celery, Garlic, or Lemon.
 If it is, update the price in column B.
 Save the spreadsheet to a new file (so that you don’t lose the old spreadsheet, just in case).
Step 1: Set Up a Data Structure with the Update Information: use dictionary
Step 2: Check All Rows and Update Incorrect Prices
Program:
#! python3
# updateProduce.py - Corrects costs in produce sales spreadsheet.
import openpyxl
wb = openpyxl.load_workbook('produceSales.xlsx')
sheet = wb.get_sheet_by_name('Sheet')
# The produce types and their updated prices

PRICE_UPDATES = {'Garlic': 3.07,
'Celery': 1.19,
'Lemon': 1.27}

# Loop through the rows and update the prices.

for rowNum in range(2, sheet.max_row): # skip the first row
produceName = sheet.cell(row=rowNum, column=1).value
if produceName in PRICE_UPDATES:
sheet.cell(row=rowNum, column=2).value = PRICE_UPDATES[produceName]
wb.save('updatedProduceSales.xlsx')
Output:
Setting the Font Style of Cells

Styling certain cells, rows, or columns can help you emphasize important areas in your spreadsheet.
To customize font styles in cells, important, import the Font() and Style() functions from the openpyxl.styles
module.
from openpyxl.styles import Font, Style
A cell’s style can be set by assigning the Style object to the style attribute.
Example:
>>> import openpyxl
>>> from openpyxl.styles import Font, Style
>>> sheet = wb.get_sheet_by_name('Sheet')
>>> italic24Font = Font(size=24, italic=True)
>>> styleObj = Style(font=italic24Font)
>>> sheet['A1'].style = styleObj
>>> sheet['A1'] = 'Hello world!'
>>> wb.save('styles.xlsx')
Font(size=24, italic=True) returns a Font object, which is stored in italic24Font. The keyword arguments
to Font(), size and italic, configure the Font object’s style attributes. This Font object is then passed into
the Style(font=italic24Font) call, which returns the value you stored in styleObj. And when styleObj is
assigned to the cell’s style attribute, all that font styling information gets applied to cell A1.

Font Objects
To set font style attributes, you pass keyword arguments to Font().
Steps for styling cells:

a. Call Font() to create a Font object and store that Font object in a variable.
b. Pass that to Style(), store the resulting Style object in a variable, and
c. Assign that variable to a Cell object’s style attribute.
Example:
""" t1_ch12_setting_font.py """
import openpyxl
from openpyxl.styles import Font
wb = openpyxl.Workbook()
sheet = wb['Sheet']
fontObj1 = Font(name='Times New Roman', bold=True)

#styleObj1 = Style(font=fontObj1)
sheet['A1'].font = fontObj1
sheet['A1'] = 'Bold Times New Roman'
fontObj2 = Font(size=24, italic=True)
sheet['B3'].font = fontObj2
sheet['B3'] = '24 pt Italic'
wb.save('styles.xlsx')

Formulas
Formulas, which begin with an equal sign, can configure cells to contain values calculated from other
cells.
Example:
>>> sheet['B9'] = '=SUM(B1:B8)'

This will store =SUM(B1:B8) as the value in cell B9.
A formula is set just like any other text value in a cell.

>>> import openpyxl
>>> sheet['A1'] = 200
>>> sheet['A2'] = 300
>>> sheet['A3'] = '=SUM(A1:A2)' #500
>>> wb.save('writeFormula.xlsx')
You can also read the formula in a cell just as you would any value. However, if you want to see the
result of the calculation for the formula instead of the literal formula, you must pass True for the data_only
keyword argument to load_workbook().
Examlpe:
>>> import openpyxl
>>> wbFormulas = openpyxl.load_workbook('writeFormula.xlsx')
>>> sheet = wbFormulas.get_active_sheet()
'=SUM(A1:A2)'
>>> wbDataOnly = openpyxl.load_workbook('writeFormula.xlsx', data_only=True)
>>> sheet = wbDataOnly.get_active_sheet()
500

Adjusting Rows and Columns

In Excel, adjusting the sizes of rows and columns is as easy as clicking and dragging the edges of a row or
column header. It will be much quicker to write a Python program to do it.
Setting Row Height and Column Width

Worksheet objects have row_dimensions and column_dimensions attributes that control row heights and column
widths.
Example:
>>> import openpyxl
>>> sheet['A1'] = 'Tall row'
>>> sheet['B2'] = 'Wide column'
>>> sheet.row_dimensions[1].height = 70
>>> sheet.column_dimensions['B'].width = 20
>>> wb.save('dimensions.xlsx')
A sheet’s row_dimensions and column_dimensions are dictionary-like values;

row_dimensions contains RowDimension objects and
column_dimensions contains ColumnDimension objects.
In row_dimensions, you can access one of the objects using the number of the row (in this case, 1 or 2).
In column_dimensions, you can access one of the objects using the letter of the column (in this case, A
or B).
Row height can be set to an integer or float value between 0 and 409. (1 points = 1/72 of an inch. The
default row height is 12.75.)
The column width can be set to an integer or float value between 0 and 255. This value represents the
number of characters at the default font size (11 point). The default column width is 8.43 characters.
Merging and Unmerging Cells

A rectangular area of cells can be merged into a single cell with the merge_cells() sheet method.
Example:
>>> import openpyxl
>>> sheet.merge_cells('A1:D3')
>>> sheet['A1'] = 'Twelve cells merged together.'

>>> sheet.merge_cells('C5:D5')
>>> sheet['C5'] = 'Two merged cells.'
>>> wb.save('merged.xlsx')
The argument to merge_cells() is a single string of the top-left and bottom-right cells of the rectangular
area to be merged: 'A1:D3' merges 12 cells into a single cell.
To set the value of these merged cells, simply set the value of the top-left cell of the merged group.
To unmerge cells, call the unmerge_cells() sheet method.

>>> import openpyxl
>>> wb = openpyxl.load_workbook('merged.xlsx')
>>> sheet.unmerge_cells('A1:D3')
>>> sheet.unmerge_cells('C5:D5')
>>> wb.save('merged.xlsx')
Freeze Panes
For spreadsheets too large to be displayed all at once, it’s helpful to “freeze” a few of the top rows or
leftmost columns onscreen. Frozen column or row headers, are always visible to the user even as they
scroll. These are known as freeze panes.
In OpenPyXL, each Worksheet object has a freeze_panes attribute that can be set to a Cell object or a string of
a cell’s coordinates. Note that all rows above and all columns to the left of this cell will be frozen.
To unfreeze all panes, set freeze_panes to None or 'A1'.

Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('produceSales.xlsx')
>>> sheet.freeze_panes = 'A2'
>>> wb.save('freezeExample.xlsx')
Charts
OpenPyXL supports creating bar, line, scatter, and pie charts using the data in a sheet’s cells.
To make a chart, you need to do the following:

1. Create a Reference object from a rectangular selection of cells.
2. Create a Series object by passing in the Reference object.
3. Create a Chart object.
4. Append the Series object to the Chart object.
5. Optionally, set the drawing.top, drawing.left , drawing.width, and drawing.height variables of the Chart object.
6. Add the Chart object to the Worksheet object.
Reference objects are created by calling the openpyxl.charts.Reference() function and passing three arguments:
1. The Worksheet object containing your chart data.
2. A tuple of two integers, representing the top-left cell of the rectangular selection of cells containing
your chart data: The first integer in the tuple is the row, and the second is the column. Note that 1 is
the first row, not 0.
3. A tuple of two integers, representing the bottom-right cell of the rectangular selection of cells
containing your chart data: The first integer in the tuple is the row, and the second is the column.

Example:
""" t1_ch12_chart.py """
from openpyxl import Workbook
from openpyxl.chart import Reference, Series, BarChart
wb = Workbook()
sheet = wb.active
for i in range(1, 11): # create some data in column A
sheet['A' + str(i)] = i
refObj = Reference(sheet, min_row=1, min_col=1, max_row=10, max_col=1)
seriesObj = Series(refObj, title='First series')

chartObj = BarChart()
chartObj.append(seriesObj)
sheet.add_chart(chartObj)
wb.save('sampleChart.xlsx')
Output:

Chapter 13 – Working with PDF and Word Documents
PDF Documents
PDF and Word documents are binary files, which makes them much more complex than plaintext files.
In addition to text, they store lots of font, color, and layout information.
PDF stands for Portable Document Format and uses the .pdf file extension.
The module you’ll use to work with PDFs is PyPDF2. To install it, run pip install PyPDF2 from the command
line.
Extracting Text from PDFs
Open meetingminutes.pdf in read binary mode and store it in pdfFileObj.

To get a PdfFileReader object that represents this PDF, call PyPDF2.PdfFileReader(stream) and pass it pdfFileObj.
PdfFileReader Methods and attributes

getNumPages()
Calculates the number of pages in this PDF file.
Returns: number of pages
Return type: int
getPage(pageNumber)
Retrieves a page by number from this PDF file.
Parameters: pageNumber (int) – The page number to retrieve (pages begin at zero)
Returns: a PageObject instance.
Return type: PageObject

numPages
Read-only property that accesses the getNumPages() function.
Page object method

extractText()
extract the Page object text and returns as a string.
Example:
""" t1_ch13_read_pdf.py """
import PyPDF2
# Open the file in binary mode
pdfFileObj = open('meetingminutes.pdf', 'rb')
# Create PdfFileReader object to read from pdf file

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
# Get no. of pages

print("No. of Pages: ",pdfReader.numPages) #19
# Get first page

pageObj = pdfReader.getPage(0)
# Extract text from page object

print("PDF Text: ",pageObj.extractText())
"""
OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of
March 7, 2014
The Board of Elementary and Secondary Education shall provide leadership and ...
"""
Decrypting PDFs
Some PDF documents are password protected.
All PdfFileReader objects have an isEncrypted attribute that is True if the PDF is encrypted and False if it isn’t.
To read an encrypted PDF, call the decrypt() function and pass the password as a string.
Example:
""" t1_ch13_decrypt_pdf.py """
import PyPDF2
pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
print(pdfReader.isEncrypted) # True
#pdfReader.getPage(0)
# PdfReadError: file has not been decrypted
# Decrypt PDF with password

pdfReader.decrypt('rosebud')
pageObj = pdfReader.getPage(0)
# Extract text from page object

print("PDF Text: ",pageObj.extractText())
"""
PDF Text: OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of
March 7
, 2014...
"""
Creating PDFs
PyPDF2’s PDF-writing capabilities are limited to copying pages from other PDFs, rotating pages,
overlaying pages, and encrypting files.
PyPDF2 doesn’t allow you to directly edit a PDF. Instead, you have to create a new PDF and then copy
content over from an existing document.
The examples in this section will follow this general approach:

1. Open one or more existing PDFs (the source PDFs) into PdfFileReader objects.
2. Create a new PdfFileWriter object.
3. Copy pages from the PdfFileReader objects into the PdfFileWriter object.
4. Finally, use the PdfFileWriter object to write the output PDF.
PdfFileWriter API
class PyPDF2.PdfFileWriter
This class supports writing PDF files.
write(stream)
Writes the collection of pages added to this object out as a PDF file.
Parameters: stream – An object to write the file to.
addPage(page)
Adds a page to this PDF file (at end). The page is usually acquired from a PdfFileReader instance.
Parameters: page (PageObject) – The page to add to the document.
Page object method

rotateClockwise(angle)
Rotates a page clockwise by increments of 90 degrees.
Parameters: angle (int) – Angle to rotate the page. Must be an increment of 90 deg.
rotateCounterClockwise(angle)
Rotates a page counter-clockwise by increments of 90 degrees.
Parameters: angle (int) – Angle to rotate the page. Must be an increment of 90 deg.
mergePage(page2)
Merges the content streams of two pages into one.
Parameters: page2 (PageObject) – The page to be merged into this one.
Copying Pages
You can use PyPDF2 to copy pages from one PDF document to another.
This allows you to combine multiple PDF files, cut unwanted pages, or reorder pages.
Example:
""" t1_ch13_copying_pdf.py """
import PyPDF2
# Open the pdf files

pdf1File = open('meetingminutes.pdf', 'rb')
pdf2File = open('meetingminutes2.pdf', 'rb')
# Create PDF Reader objects

pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
pdf2Reader = PyPDF2.PdfFileReader(pdf2File)
# Create PDF Writer object

pdfWriter = PyPDF2.PdfFileWriter()
# Copy pages from pdf1 to pdfwriter

for pageNum in range(pdf1Reader.numPages):
pageObj = pdf1Reader.getPage(pageNum)
pdfWriter.addPage(pageObj)
# Copy pages from pdf2 to pdfwriter

for pageNum in range(pdf2Reader.numPages):
pageObj = pdf2Reader.getPage(pageNum)
# Create the output file

pdfOutputFile = open('combinedminutes.pdf', 'wb')
pdfWriter.write(pdfOutputFile)
# Close the files

pdfOutputFile.close()
pdf1File.close()
pdf2File.close()
Rotating Pages
The pages of a PDF can also be rotated in 90-degree increments with the rotateClockwise() and
rotateCounterClockwise() methods. Pass one of the integers 90, 180, or 270 to these methods.
Example:
""" t1_ch13_rotating_pdf.py """
import PyPDF2
# Open file and create pdfreader object

minutesFile = open('meetingminutes.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(minutesFile)
page = pdfReader.getPage(0) #get first page
# Rotate the page

page.rotateClockwise(90)
# Write to the PDF

pdfWriter.addPage(page)
resultPdfFile = open('rotatedPage.pdf', 'wb')

pdfWriter.write(resultPdfFile)
# Close the files

resultPdfFile.close()
minutesFile.close()
Overlaying Pages
PyPDF2 can also overlay the contents of one page over another, which is useful for adding a logo,
timestamp, or watermark to a page.
Example:
""" t1_ch13_merge_pdf.py """
import PyPDF2
# Open the file and create pdfreader object

minutesFile = open('meetingminutes.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(minutesFile)
# Get the first page

minutesFirstPage = pdfReader.getPage(0)
# create pdfreader object for watermark pdf

pdfWatermarkReader = PyPDF2.PdfFileReader(open('watermark.pdf', 'rb'))
# Overlay watermark on pdf first page

minutesFirstPage.mergePage(pdfWatermarkReader.getPage(0))
# Write to PDF, add first page

pdfWriter.addPage(minutesFirstPage)
# Add the remaining pages (2 - n)

for pageNum in range(1, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
# Save PDF
resultPdfFile = open('watermarkedCover.pdf', 'wb')
pdfWriter.write(resultPdfFile)
# Close files
minutesFile.close()
resultPdfFile.close()
Encrypting PDFs
A PdfFileWriter object can also add encryption to a PDF document.
PdfFileWriter method
encrypt(user_pwd, owner_pwd=None)
Encrypt this PDF file with the PDF Standard encryption handler.
Parameters:
user_pwd (str) – The “user password”, which allows for opening and reading the PDF file with
the restrictions provided.
owner_pwd (str) – The “owner password”, which allows for opening the PDF files without any
restrictions. By default, the owner password is the same as the user password.

Example:
""" t1_ch13_encrypt_pdf.py """
import PyPDF2
# Open file and create reader object

pdfFile = open('meetingminutes.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFile)
# Create pdf writer object

# Copy pages from reader to writer object

for pageNum in range(pdfReader.numPages):
pdfWriter.addPage(pdfReader.getPage(pageNum))
# Encrypt the file

pdfWriter.encrypt('swordfish')
# Create PDF file

resultPdf = open('encryptedminutes.pdf', 'wb')
pdfWriter.write(resultPdf)
# Close the file

resultPdf.close()
Project: Combining Select Pages from Many PDF s

Say you have the boring job of merging several dozen PDF documents into a single PDF file.
At a high level, here’s what the program will do:

 Find all PDF files in the current working directory.
 Sort the filenames so the PDFs are added in order.
 Write each page, excluding the first page, of each PDF to the output file.
In terms of implementation, your code will need to do the following:

 Call os.listdir() to find all the files in the working directory and remove any non-PDF files.
 Call Python’s sort() list method to alphabetize the filenames.
 Create a PdfFileWriter object for the output PDF.
 Loop over each PDF file, creating a PdfFileReader object for it.
 Loop over each page (except the first) in each PDF file.
 Add the pages to the output PDF.
 Write the output PDF to a file named allminutes.pdf.
For this project, open a new file editor window and save it as t1_ch13_combined.pdfs.py.
Step 1: Find All PDF Files
Step 2: Open Each PDF
Step 3: Add Each Page
Step 4: Save the Results

Program:
#! python3
# combinePdfs.py - Combines all the PDFs in the current working directory into a single PDF.
import PyPDF2, os
os.chdir("E:\\Notes\\Python18CS55\\temp\\ch13_combine_pdf")
# Get all the PDF filenames.

pdfFiles = []
for filename in os.listdir('.'):
if filename.endswith('.pdf'):
pdfFiles.append(filename)
pdfFiles.sort()
# Loop through all the PDF files.

for filename in pdfFiles:
pdfFileObj = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
# Loop through all the pages (except the first) and add them.
for pageNum in range(0, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
# Save the resulting PDF to a file.

pdfOutput = open('allminutes.pdf', 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()
References:
1. Al Sweigart,“Automate the Boring Stuff with Python”,1stEdition, No Starch Press, 2015.
(Available under CC-BY-NC-SA license at https://automatetheboringstuff.com/)
2. Allen B. Downey, “Think Python: How to Think Like a Computer Scientist”, 2nd Edition, Green Tea
Press, 2015. (Available under CC-BY-NC license at
http://greenteapress.com/thinkpython2/thinkpython2.pdf)
3. https://openpyxl.readthedocs.io/en/stable/api/openpyxl.cell.cell.html
4. https://openpyxl.readthedocs.io/en/stable/api/openpyxl.utils.cell.html
5. https://pythonhosted.org/PyPDF2/PdfFileWriter.html
6. https://python-docx.readthedocs.org/

18CS55 ADP Notes Module 4 and 5

Uploaded by

Copyright:

Available Formats

18CS55 ADP Notes Module 4 and 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18CS55 ADP Notes Module 4 and 5

Uploaded by

Copyright:

Available Formats

Application Development using Python [18CS55]

Department of Computer Science & Engineering

Academic Year 2021-22

Course Name : Application Development using Python

Ramesh Babu N, Assc. Prof., Page 1 of 72

Ramesh Babu N, Assc. Prof., Page 2 of 72

Chapter 18: Inheritance ......................................................................................................................... 26

Ramesh Babu N, Assc. Prof., Page 3 of 72

Install openpyxl using pip................................................................................................................... 47

Ramesh Babu N, Assc. Prof., Page 4 of 72

Step 1: Find All PDF Files ................................................................................................................ 71

Ramesh Babu N, Assc. Prof., Page 5 of 72

There are several ways we might represent points in Python:

Defining a class named Point creates a class object.

you call Point as if it were a function.

Ramesh Babu N, Assc. Prof., Page 6 of 72

These elements are called attributes.

Figure 15.1: Object diagram.

Ramesh Babu N, Assc. Prof., Page 7 of 72

Figure 15.2: Object diagram.

15.4 Instances as return values

Ramesh Babu N, Assc. Prof., Page 8 of 72

15.5 Objects are mutable

Here is an example that demonstrates the effect:

>>> import copy

Ramesh Babu N, Assc. Prof., Page 9 of 72

Figure 15.3: Object diagram.

Ramesh Babu N, Assc. Prof., Page 10 of 72

Ramesh Babu N, Assc. Prof., Page 11 of 72

Chapter 16: Classes and functions

Figure 16.1: Object diagram.

16.2 Pure functions

Here is a simple prototype of add_time:

To test this function, We’ll create two Time objects:

Ramesh Babu N, Assc. Prof., Page 12 of 72

add_time figures out when the movie will be done.

>>> duration = Time()

>>> done = add_time(start, duration)

if sum.second >= 60:

if sum.minute >= 60:

if time.second >= 60:

if time.minute >= 60:

Ramesh Babu N, Assc. Prof., Page 13 of 72

16.4 Prototyping versus planning

Ramesh Babu N, Assc. Prof., Page 14 of 72

This version is shorter than the original, and easier to verify.

if time.minute >= 60 or time.second >= 60:

seconds = time_to_int(t1) + time_to_int(t2)

Ramesh Babu N, Assc. Prof., Page 15 of 72

seconds = time_to_int(t1) + time_to_int(t2)

functional programming style:

The divmod() takes two parameters:

Return Value from divmod() The divmod() returns

Ramesh Babu N, Assc. Prof., Page 16 of 72

Ramesh Babu N, Assc. Prof., Page 17 of 72

Chapter 17: Classes and methods

17.2 Printing objects

To call this function, you have to pass a Time object as an argument:

Two ways to call print_time.

Ramesh Babu N, Assc. Prof., Page 18 of 72

17.6 The str method

With that decided, we can write lt:

def init(self, label=''):