18CS55 ADP Notes Module 4 and 5
18CS55 ADP Notes Module 4 and 5
18CS55 ADP Notes Module 4 and 5
Study Material
Module 4 and 5
Contents
Module 4 ................................................................................................................................................. 6
Chapter 15: Classes and objects ............................................................................................................... 6
15.1 User-defined types ................................................................................................................... 6
15.2 Attributes................................................................................................................................. 7
15.3 Rectangles................................................................................................................................ 8
15.4 Instances as return values ........................................................................................................ 8
15.5 Objects are mutable ................................................................................................................. 9
15.6 Copying .................................................................................................................................... 9
15.7 Debugging .............................................................................................................................. 10
15.8 Glossary ................................................................................................................................. 11
Chapter 16: Classes and functions ......................................................................................................... 12
16.1 Time....................................................................................................................................... 12
16.3 Modifiers ............................................................................................................................... 13
16.4 Prototyping versus planning ................................................................................................... 14
16.5 Debugging .............................................................................................................................. 15
16.6 Glossary ................................................................................................................................. 16
Chapter 17: Classes and methods .......................................................................................................... 18
17.1 Object-oriented features ........................................................................................................ 18
17.2 Printing objects ...................................................................................................................... 18
17.3 Another example ................................................................................................................... 19
17.4 A more complicated example ................................................................................................. 19
17.5 The init method...................................................................................................................... 20
17.6 The __str__ method ............................................................................................................... 20
17.7 Operator overloading ............................................................................................................. 21
17.8 Type-based dispatch .............................................................................................................. 21
17.9 Polymorphism ........................................................................................................................ 22
17.10 Debugging .......................................................................................................................... 23
17.11 Interface and implementation ............................................................................................ 23
17.12 Glossary ............................................................................................................................. 24
isinstance(object, classinfo) ............................................................................................................... 24
Module 4
Chapter 15: Classes and objects
15.1 User-defined types
We have used many of Python’s built-in types; now we are going to define a new type. As an
example, we will create a type called Point that represents a point in two-dimensional space.
In mathematical notation, points are often written in parentheses with a comma separating the
coordinates. For example, (0,0) represents the origin, and (x,y) represents the point x units to
the right and y units up from the origin.
A user-defined type is also called a class. A class definition looks like this:
class Point(object):
"""Represents a point in 2-D space."""
This header indicates that the new class is a Point, which is a kind of object, which is a built-in
type.
The body is a docstring that explains what the class is for. You can define variables and
functions inside a class definition.
Because Point is defined at the top level, its “full name” is main .Point. To create a Point object,
The return value is a reference to a Point object, which we assign to blank. Creating a new
object is called instantiation, and the object is an instance of the class.
When you print an instance, Python tells you what class it belongs to and where it is stored in
memory (the prefix 0x means that the following number is in hexadecimal).
15.2 Attributes
You can assign values to an instance using dot notation:
>>> blank.x = 3.0
>>> blank.y = 4.0
The following diagram shows the result of these assignments. A state diagram that shows an
object and its attributes is called an object diagram; see Figure 15.1.
The variable blank refers to a Point object, which contains two attributes. Each attribute refers
to a floating-point number.
You can read the value of an attribute using the same syntax:
>>> x = blank.x
>>> print (x) 3.0
The expression blank.x means, “Go to the object blank refers to and get the value of x.”. There
is no conflict between the variable x and the attribute x.
You can use dot notation as part of any expression. For example:
>>> print '(%g, %g)' % (blank.x, blank.y) (3.0, 4.0)
>>> distance = math.sqrt(blank.x**2 + blank.y**2)
>>> print distance 5.0
You can pass an instance as an argument in the usual way. For example:
def print_point(p):
print '(%g, %g)' % (p.x, p.y)
print_point takes a point as an argument and displays it in mathematical notation. To invoke it,
you can pass blank as an argument:
>>> print_point(blank) (3.0, 4.0)
Inside the function, p is an alias for blank, so if the function modifies p, blank changes.
15.3 Rectangles
Sometimes it is obvious what the attributes of an object should be, but other times you have to
make decisions.
For example, imagine you are designing a class to represent rectangles. What attributes would
you use to specify the location and size of a rectangle?.
You could specify one corner of the rectangle (or the center), the width, and the height. Here is
the class definition:
class Rectangle(object):
"""Represents a rectangle.
attributes: width, height, corner. """
The docstring lists the attributes: width and height are numbers; corner is a Point object that
specifies the lower-left corner.
To represent a rectangle, you have to instantiate a Rectangle object and assign values to the
attributes:
box = Rectangle() box.width = 100.0
box.height = 200.0 box.corner = Point() box.corner.x = 0.0
box.corner.y = 0.0
The expression box.corner.x means, “Go to the object box refers to and select the attribute
named corner; then go to that object and select the attribute named x.”
Figure 15.2 shows the state of this object. An object that is an attribute of another object is
embedded.
Here is an example that passes box as an argument and assigns the resulting Point to center:
>>> center = find_center(box)
>>> print_point(center) (50.0, 100.0)
You can also write functions that modify objects. For example, grow_rectangle takes a Rectangle
object and two numbers, dwidth and dheight, and adds the numbers to the width and height of
the rectangle:
def grow_rectangle(rect, dwidth, dheight): rect.width += dwidth
rect.height += dheight
15.6 Copying
The copy module contains a function called copy that can duplicate any object:
>>> p1 = Point()
>>> p1.x = 3.0
>>> p1.y = 4.0
p1 and p2 contain the same data, but they are not the same Point.
>>> print_point(p1) (3.0, 4.0)
>>> print_point(p2) (3.0, 4.0)
>>> p1 is p2 False
>>> p1 == p2
False
The is operator indicates that p1 and p2 are not the same object, which is what we expected. But
you might have expected == to yield True because these points contain the same data. In that
case, you will be disappointed to learn that for instances, the default behavior of the ==operator
is the same as the is operator; it checks object identity, not object equivalence. This behavior can
be changed—we’ll see how later.
If you use copy.copy to duplicate a Rectangle, you will find that it copies the Rectangle object but
not the embedded Point.
>>> box2 = copy.copy(box)
>>> box2 is box False
>>> box2.corner is box.corner True
Figure 15.3 shows what the object diagram looks like. This operation is called a shallow copy
because it copies the object and any references it contains, but not the embedded objects.
Fortunately, the copy module contains a method named deepcopy that copies not only the
object but also the objects it refers to, and the objects they refer to, and so on. You will not be
surprised to learn that this operation is called a deep copy.
>>> box3 = copy.deepcopy(box)
>>> box3 is box False
>>> box3.corner is box.corner False
box3 and box are completely separate objects.
15.7 Debugging
When you start working with objects, you are likely to encounter some new exceptions. If you
try to access an attribute that doesn’t exist, you get an AttributeError:
>>> p = Point()
>>> print p.z
AttributeError: Point instance has no attribute 'z'
If you are not sure what type an object is, you can ask:
>>> type(p)
<type ' main .Point'>
If you are not sure whether an object has a particular attribute, you can use the built-in
function hasattr:
>>> hasattr(p, 'x') True
>>> hasattr(p, 'z') False
The first argument can be any object; the second argument is a string that contains the name of
the attribute.
15.8 Glossary
class:
A user-defined type. A class definition creates a new class object.
class object:
An object that contains information about a user-defined type. The class object can be used to
create instances of the type.
instance:
An object that belongs to a class.
attribute:
One of the named values associated with an object.
embedded (object):
An object that is stored as an attribute of another object.
shallow copy:
To copy the contents of an object, including any references to embedded objects; implemented
by the copy function in the copy module.
deep copy:
To copy the contents of an object as well as any embedded objects, and any objects embedded
in them, and so on; implemented by the deepcopy function in the copy module.
object diagram:
A diagram that shows objects, their attributes, and the values of the attributes.
Instead of using the normal statements to access attributes, you can use the following functions
The getattr(obj, name[, default]) − to access the attribute of object.
The hasattr(obj,name) − to check if an attribute exists or not.
The setattr(obj,name,value) − to set an attribute. If attribute does not exist, then it would be
created.
The delattr(obj, name) − to delete an attribute.
Examples:
hasattr(emp1, 'salary') # Returns true if 'salary' attribute exists
getattr(emp1, 'salary') # Returns value of 'salary' attribute setattr(emp1,
'salary', 7000) # Set attribute 'salary' at 7000 delattr(emp1, 'salary') # Delete
attribute 'salary'
We can create a new Time object and assign attributes for hours, minutes, and seconds:
time = Time() time.hour = 11
time.minute = 59
time.second = 30
The state diagram for the Time object looks like Figure 16.1.
The function creates a new Time object, initializes its attributes, and returns a reference to the
new object. This is called a pure function because it does not modify any of the objects passed
to it as arguments and it has no effect, like displaying a value or getting user input, other than
returning a value.
The result, 10:80:00 might not be what you were hoping for. The problem is that this function
does not deal with cases where the number of seconds or minutes adds up to more than sixty.
Here’s an improved version:
def add_time(t1, t2): sum = Time()
sum.hour = t1.hour + t2.hour
sum.minute = t1.minute + t2.minute
sum.second = t1.second + t2.second
16.3 Modifiers
modifiers are the functions that modify the objects it gets as parameters and the changes are
visible to the caller.
increment, which adds a given number of seconds to a Time object, can be written naturally as
a modifier. Here is a rough draft:
def increment(time, seconds):
time.second += seconds
Is this function correct? What happens if the parameter seconds is much greater than sixty?
In that case, it is not enough to carry once; we have to keep doing it until time.second is less than
sixty. One solution is to replace the if statements with while statements. That would make the
function correct, but not very efficient.
Anything that can be done with modifiers can also be done with pure functions. In fact, some
programming languages only allow pure functions. There is some evidence that programs that
use pure functions are faster to develop and less error-prone than programs that use modifiers.
But modifiers are convenient at times, and functional programs tend to be less efficient.
In general, I recommend that you write pure functions whenever it is reasonable and resort to
modifiers only if there is a compelling advantage. This approach might be called a functional
programming style.
This approach can be effective, especially if you don’t yet have a deep understanding of the
problem. But incremental corrections can generate code that is unnecessarily complicated—
since it deals with many special cases—and unreliable—since it is hard to know if you have found
all the errors.
An alternative is planned development, in which high-level insight into the problem can make
the programming much easier. In this case, the insight is that a Time object is really a three-digit
number in base 60 (see http://en.wikipedia.org/wiki/Sexagesimal.)! The second attribute is the
“ones column,” the minute attribute is the “sixties column,” and the hour attribute is the “thirty-
six hundreds column.”
When we wrote add_time and increment, we were effectively doing addition in base 60, which
is why we had to carry from one column to the next.
This observation suggests another approach to the whole problem—we can convert Time
objects to integers and take advantage of the fact that the computer knows how to do integer
arithmetic.
Here is a function that converts Times to integers:
def time_to_int(time):
minutes = time.hour * 60 + time.minute
seconds = minutes * 60 + time.second
return seconds
And here is the function that converts integers to Times (recall that divmod divides the first
argument by the second and returns the quotient and remainder as a tuple).
def int_to_time(seconds): time = Time()
minutes, time.second = divmod(seconds, 60)
time.hour, time.minute = divmod(minutes, 60)
return time
One way to test them is to check that time_to_int(int_to_time(x)) == x for many values
of x. This is an example of a consistency check.
Once you are convinced they are correct, you can use them to rewrite add_time:
def add_time(t1, t2):
seconds = time_to_int(t1) + time_to_int(t2)
return int_to_time(seconds)
16.5 Debugging
A Time object is well-formed if the values of minute and second are between 0 and 60 (including
0 but not 60) and if hour is positive. hour and minute should be integral values, but we might
allow second to have a fraction part.
Requirements like these are called invariants because they should always be true. To put it a
different way, if they are not true, then something has gone wrong.
Writing code to check your invariants can help you detect errors and find their causes. For
example, you might have a function like valid_time that takes a Time object and returns False if
it violates an invariant:
def valid_time(time):
if time.hour < 0 or time.minute < 0 or time.second < 0:
return False
Then at the beginning of each function you could check the arguments to make sure they are
valid:
def add_time(t1, t2):
if not valid_time(t1) or not valid_time(t2):
raise ValueError('invalid Time object in add_time')
Or you could use an assert statement, which checks a given invariant and raises an exception if
it fails:
def add_time(t1, t2):
assert valid_time(t1) and valid_time(t2)
16.6 Glossary
prototype and patch:
A development plan that involves writing a rough draft of a program, testing, and correcting
errors as they are found.
planned development:
A development plan that involves high-level insight into the problem and more planning than
incremental development or prototype development.
pure function:
A function that does not modify any of the objects it receives as arguments. Most pure
functions are fruitful.
modifier:
A function that changes one or more of the objects it receives as arguments. Most modifiers are
fruitless.
invariant:
A condition that should always be true during the execution of a program.
divmod(x, y)
The divmod() method takes two numbers and returns a pair of numbers (a tuple) consisting of
their quotient and remainder.
Examples:
print('divmod(8, 3) = ', divmod(8, 3))
print('divmod(3, 8) = ', divmod(3, 8))
print('divmod(5, 5) = ', divmod(5, 5))
Answers:
divmod(8, 3) = (2, 2)
divmod(3, 8) = (0, 3)
divmod(5, 5) = (1, 0)
A method is a function that is associated with a particular class. We will define methods for
user-defined types.
Methods are semantically the same as functions, but there are two syntactic differences:
Methods are defined inside a class definition in order to make the relationship between the
class and the method explicit.
The syntax for invoking a method is different from the syntax for calling a function.
class Time(object):
"""Represents the time of day."""
def print_time(time):
print ('%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second))
To make print_time a method, all we have to do is move the function definition inside the class
definition. Notice the change in indentation.
class Time(object):
def print_time(time):
print '%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second)
Inside the method, the subject is assigned to the first parameter, so in this case start is assigned
to time.
By convention, the first parameter of a method is called self, so it would be more common to
write print_time like this:
This version assumes that time_to_int is written as a method, as in Exercise 1. Here’s how you
would invoke increment:
>>> start.print_time() 09:45:00
>>> end = start.increment(1337)
>>> end.print_time() 10:07:17
The subject, start, gets assigned to the first parameter, self. The argument, 1337, gets assigned to
the second parameter, seconds.
To use this method, you have to invoke it on one object and pass the other as an argument:
>>> end.is_after(start) True
It is common for the parameters of init to have the same names as the attributes.
The parameters are optional, so if you call Time with no arguments, you get the default values.
Examples:
>>> time = Time()
>>> time.print_time() 00:00:00
When you apply the + operator to Time objects, Python invokes add . When you print the result,
Python invokes str .
Changing the behavior of an operator so that it works with user-defined types is called operator
overloading. For more details see
http://docs.python.org/2/reference/datamodel.html#specialnames.
The built-in function isinstance takes a value and a class object, and returns True if the value is an
instance of the class.
If other is a Time object, add invokes add_time. Otherwise it assumes that the parameter is a
number and invokes increment. This operation is called a type-based dispatch because it
dispatches the computation to different methods based on the type of the arguments.
Here are examples that use the + operator with different types:
>>> start = Time(9, 45)
>>> duration = Time(1, 35)
>>> print (start + duration) 11:20:00
>>> print (start + 1337) 10:07:17
Unfortunately, this implementation of addition is not commutative. If the integer is the first
operand, you get
>>> print (1337 + start)
TypeError: unsupported operand type(s) for +: 'int' and 'instance'
The problem is, instead of asking the Time object to add an integer, Python is asking an integer
to add a Time object, and it doesn’t know how to do that.
But there is a clever solution for this problem: the special method radd , which stands for “right-
side add.” This method is invoked when a Time object appears on the right side of the +
operator. Here’s the definition:
# inside class Time:
17.9 Polymorphism
The word polymorphism means having many forms. In programming, polymorphism means same
function name (but different signatures) being uses for different types.
Many of the functions we wrote for strings will actually work for any kind of sequence. For
example, we used histogram to count the number of times each letter appears in a word.
def histogram(s):
d = dict() for c in s:
if c not in d:
d[c] = 1
else:
d[c] = d[c]+1 return d
This function also works for lists, tuples, and even dictionaries, as long as the elements of s are
hashable, so they can be used as keys in d.
>>> t = ['spam', 'egg', 'spam', 'spam', 'bacon', 'spam']
>>> histogram(t)
{'bacon': 1, 'egg': 1, 'spam': 4}
Functions that can work with several types are called polymorphic.
17.10 Debugging
It is legal to add attributes to objects at any point in the execution of a program. It is usually a
good idea to initialize all of an object’s attributes in the init method.
If you are not sure whether an object has a particular attribute, you can use the built-in
function hasattr.
Another way to access the attributes of an object is through the special attribute dict , which
is a dictionary that maps attribute names (as strings) and values:
>>> p = Point(3, 4)
>>> print p. dict
{'y': 4, 'x': 3}
For purposes of debugging, you might find it useful to keep this function handy:
def print_attributes(obj):
for attr in obj. dict :
print (attr, getattr(obj, attr))
print_attributes traverses the items in the object’s dictionary and prints each attribute name and
its corresponding value.
The built-in function getattr takes an object and an attribute name (as a string) and returns the
attribute’s value.
For example, in this chapter we developed a class that represents a time of day. Methods
provided by this class include time_to_int, is_after, and add_time.
We could implement those methods in several ways. The details of the implementation depend
on how we represent time. In this chapter, the attributes of a Time object are hour, minute, and
second.
As an alternative, we could replace these attributes with a single integer representing the number
of seconds since midnight. This implementation would make some methods, like is_after, easier
to write, but it makes some methods harder.
After you deploy a new class, you might discover a better implementation. If other parts of the
program are using your class, it might be time-consuming and error-prone to change the
interface.
But if you designed the interface carefully, you can change the implementation without changing
the interface, which means that other parts of the program don’t have to change.
Keeping the interface separate from the implementation means that you have to hide the
attributes. Code in other parts of the program (outside the class definition) should use methods
to read and modify the state of the object. They should not access the attributes directly. This
Ramesh Babu N, Assc. Prof.,
Dept. of CSE, AIEMS Page 23 of 72
Application Development using Python [18CS55]
17.12 Glossary
object-oriented language:
A language that provides features, such as user-defined classes and method syntax, that
facilitate object-oriented programming.
object-oriented programming:
A style of programming in which data and the operations that manipulate it are organized into
classes and methods.
method:
A function that is defined inside a class definition and is invoked on instances of that class.
subject:
The object a method is invoked on.
operator overloading:
Changing the behavior of an operator like + so it works with a user-defined type.
type-based dispatch:
A programming pattern that checks the type of an operand and invokes different functions for
different types.
polymorphic:
Pertaining to a function that can work with more than one type.
information hiding:
The principle that the interface provided by an object should not depend on its
implementation, in particular the representation of its attributes.
For instance, to evaluate the expression x + y, where x is an instance of a class that has an
add () method, x. add (y) is called.
isinstance(object, classinfo)
The isinstance() function checks if the object (first argument) is an instance or subclass of
classinfo class (second argument).
Return Value
The isinstance() returns:
True if the object is an instance or subclass of a class, or any element of the tuple
False otherwise
Example:
class Foo:
a = 5 fooInstance = Foo()
To represent a playing card, create a class card with attributes: rank and suit. Type of attributes: strings
or integers.
Encoding: this table shows the suits and the corresponding integer codes:
Spades 3
Hearts 2
Diamonds 1
Clubs 0
As usual, the init method takes an optional parameter for each attribute. The default card
is the 2 of Clubs.
To create a Card, you call Card with the suit and rank of the card you want.
queen_of_diamonds = Card(1, 12)
def __str__(self):
return '%s of %s' % (Card.rank_names[self.rank], Card.suit_names[self.suit])
Variables like suit_names and rank_names, are called class attributes because they are associated
with the class object Card.
suit and rank, are called instance attributes because they are associated with a particular instance.
Both kinds of attribute are accessed using dot notation. Every card has its own suit and rank, but there
is only one copy of suit_names and rank_names.
The first element of rank_names is None because there is no card with rank zero.
With the methods we have so far, we can create and print cards:
>>> card1 = Card(2, 11)
>>> print(card1)
Jack of Hearts
__lt__ takes two parameters, self and other, and returns True if self is strictly less than
other.
Assume suit is more important, so all of the Spades outrank all of the Diamonds, and so on.
t1 = self.suit, self.rank
t2 = other.suit, other.rank
return t1 < t2
18.4 Decks
A deck is made up of cards, it is natural for each Deck to contain a list of cards as an attribute.
The following is a class definition for Deck. The init method creates the attribute cards and
generates the standard set of fifty-two cards:
class Deck:
def __init__(self):
self.cards = []
Each iteration creates a new Card with the current suit and rank, and appends it to self.cards.
This method demonstrates an efficient way to accumulate a large string: building a list of strings and
then using the string method join.
Since we invoke join on a newline character, the cards are separated by newlines. Here’s what the result
looks like:
>>> deck = Deck()
>>> print(deck)
Ace of Clubs
2 of Clubs
3 of Clubs
…
To remove a card, we can use the list method pop(): removes the last element in the list.
#inside class Deck:
def pop_card(self):
return self.cards.pop()
A method that uses another method without doing much work is sometimes called a veneer.
18.7 Inheritance
Inheritance is the ability to define a new class that is a modified version of an existing class.
To define a new class that inherits from an existing class, you put the name of the existing
class in parentheses:
class Hand(Deck):
"""Represents a hand of playing cards."""
This definition indicates that Hand inherits from Deck; that means we can use methods like
pop_card and add_card for Hands as well as Decks.
When a new class inherits from an existing one, the existing one is called the parent and
the new class is called the child.
If we provide an init method in the Hand class, it overrides the one in the Deck class:
# inside class Hand:
def __init__(self, label=''):
self.cards = []
self.label = label
When you create a Hand, Python invokes this init method, not the one in Deck.
>>> hand = Hand('new hand')
>>> hand.cards
[]
>>> hand.label
'new hand'
The other methods are inherited from Deck, so we can use pop_card and add_card to deal
a card:
>>> deck = Deck()
>>> card = deck.pop_card()
>>> hand.add_card(card)
>>> print(hand)
King of Spades
move_cards takes two arguments, a Hand object and the number of cards to deal. It modifies
both self and hand, and returns None.
Advantages:
Programs that would be repetitive without inheritance can be written more elegantly with it.
Facilitates code reuse
Inheritance structure reflects the natural structure of the problem, which makes the
design easier to understand.
Disadvantages:
Inheritance can make programs difficult to read.
One class might inherit from another. This relationship is called IS-A, as in, “a Hand
is a kind of a Deck.”
One class might depend on another in the sense that objects in one class take objects
in the second class as parameters. This kind of relationship is called a dependency.
Ramesh Babu N, Assc. Prof.,
Dept. of CSE, AIEMS Page 30 of 72
Application Development using Python [18CS55]
The arrow with a hollow triangle head represents an IS-A relationship; in this case it indicates
that Hand inherits from Deck.
The standard arrow head represents a HAS-A relationship; in this case a Deck has references
to Card objects.
The star (*) near the arrow head is a multiplicity; it indicates how many Cards a Deck has.
A multiplicity can be a simple number, like 52, a range, like 5..7 or a star, which indicates
that a Deck can have any number of Cards.
Module 5:
[You will learn:
Web Scraping, Project: MAPIT.PY with the webbrowser Module, Downloading Files from the Web with
the requests Module, Saving Downloaded Files to the Hard Drive, HTML, Parsing HTML with the
BeautifulSoup Module, Project: “I’m Feeling Lucky” Google Search,Project: Downloading All XKCD
Comics, Controlling the Browser with the selenium Module,
Working with Excel Spreadsheets, Excel Documents, Installing the openpyxl Module, Reading Excel
Documents, Project: Reading Data from a Spreadsheet, Writing Excel Documents, Project: Updating a
Spreadsheet, Setting the Font Style of Cells, Font Objects, Formulas, Adjusting Rows and Columns,
Charts,
Working with PDF and Word Documents, PDF Documents, Project: Combining Select Pages from Many
PDFs, Word Documents,
Working with CSV files and JSON data, The csv Module, Project: Removing the Header from CSV Files,
JSON and APIs, The json Module, Project: Fetching Current Weather Data]
Textbook 1: Chapters 11 – 14
There are modules that make it easy to scrape web pages in Python.
webbrowser Comes with Python and opens a browser to a specific page.
Requests Downloads files and web pages from the Internet.
Beautiful Soup Parses HTML, the format that web pages are written in.
Selenium Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this
browser.
Usecase: To automatically launch the map in your browser using the contents of your clipboard/passed
as command line arguments.
Steps:
Step 1: Figure Out the URL
Step 2: Handle the Command Line Arguments
Step 3: Handle the Clipboard Content and Launch the Browser
Program:
#! python3
# mapIt.py - Launches a map in the browser using an address #from the command line or clipboard.
if len(sys.argv) > 1:
# Get address from command line.
address = ' '.join(sys.argv[1:])
else:
# Get address from clipboard.
address = pyperclip.paste()
webbrowser.open('https://www.google.com/maps/place/' + address)
Example:
>>> import requests
>>> res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
>>> type(res)
<class 'requests.models.Response'>
The URL goes to a text web page for the entire play of Romeo and Juliet. Response status code
res.status_code is OK then page was downloaded successfully(Error: 404 Not Found).
Example:
import requests
res = requests.get('http://inventwithpython.com/page_that_does_not_exist')
try:
res.raise_for_status()
except Exception as exc:
print('There was a problem: %s' % (exc)) # There was a problem: 404 Client Error: Not Found
Example:
>>> import requests
>>> res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
>>> res.raise_for_status()
>>> playFile = open('RomeoAndJuliet.txt', 'wb')
>>> for chunk in res.iter_content(100000): # 100000 bytes
playFile.write(chunk)
100000
78981
>>> playFile.close()
HTML
Hypertext Markup Language (HTML) is the format that web pages are written in.
A Quick Refresher
An HTML file is a plaintext file with the .html file extension.
The text in these files is surrounded by tags, which are words enclosed in angle brackets. The tags tell the browser
how to format the web page.
A starting tag and closing tag can enclose some text to form an element.
The text (or inner HTML) is the content between the starting and closing tags.
In Chrome, you can also bring up the developer tools by selecting ViewDeveloperDeveloper Tools.
To figure out which part of the HTML corresponds to the information on the web page you’re interested
in:
Right-click where it is on the page (or control-click on OS X) and select Inspect Element from the context
menu that appears. This will bring up the Developer Tools window, which shows you the HTML that
produces this particular part of the web page.
Example:
Program to pull weather forecast data from http://weather.gov/. Visit the site and search for the 94105
ZIP code, the site will take you to a page showing the forecast for that area.
The HTML responsible for the temperature part of the web page is <p class="myforecast-current-
lrg">59°F</p>.
While beautifulsoup4 is the name used for installation, to import Beautiful Soup you run import bs4.
File: example.html
<!-- This is the example.html example file. -->
<html>
<head><title>The Website Title</title></head>
<body>
<p>Download my <strong>Python</strong> book from
<a href="http://inventwithpython.com">my website</a>.
</p>
Once you have a BeautifulSoup object, you can use its methods to locate specific parts of an HTML
document.
Example:
>>> import bs4
>>> exampleFile = open('example.html')
>>> exampleSoup = bs4.BeautifulSoup(exampleFile.read())
>>> elems = exampleSoup.select('#author') #returns list of tag objects
>>> type(elems)
<class 'list'>
>>> len(elems)
1
>>> type(elems[0])
<class 'bs4.element.Tag'>
>>> str(elems[0])
'<span id="author">Al Sweigart</span>'
>>> elems[0].getText()
'Al Sweigart'
>>> elems[0].attrs
{'id': 'author'}
Using str() on pElems[0], pElems[1], and pElems[2] shows you each element as a string, and using
getText() on each element shows you its text.
Example:
>>> import bs4
>>> soup = bs4.BeautifulSoup(open('example.html'))
>>> spanElem = soup.select('span')[0]
>>> str(spanElem)
'<span id="author">Al Sweigart</span>'
>>> spanElem.get('id')
'author'
>>> spanElem.get('some_nonexistent_addr') == None
True
>>> spanElem.attrs
{'id': 'author'}
Here we use select() to find any <span> elements and then store the first matched element in spanElem.
Passing the attribute name 'id' to get() returns the attribute’s value, 'author'.
If you look up a little from the <a> element, though, there is an element like this: <h3 class="r">. Looking
through the rest of the HTML source, it looks like the r class is used only for search result links. Use the
selector '.r a' to find all <a> elements that are within an element that has the r CSS class.
The soup.select() call returns a list of all the elements that matched your '.r a' selector, so the number of
tabs you want to open is either 5 or the length of this list
Open the first five search results in new tabs using the webbrowser module.
Step 1: Get the Command Line Arguments and Request the Search Page
Step 2: Find All the Results
Step 3: Open Web Browsers for Each Result
#! python3
# t1_ch11_google_search.py - Opens several Google search results.
res.raise_for_status()
You’ll have a url variable that starts with the value 'http://xkcd.com' and repeatedly update it (in a for loop)
with the URL of the current page’s Prev link.
At every step in the loop, you’ll download the comic at url. You’ll know to end the loop when url ends
with '#'.
You will download the image files to a folder in the current working directory named xkcd. The call
os.makedirs() ensures that this folder exists, and the exist_ok=True keyword argument prevents the function
from throwing an exception if this folder already exists.
You can get the src attribute from this <img> element and pass it to requests.get() to download the comic’s
image file.
You’ll need a filename for the local image file to pass to open(). The comicUrl will have a value like
'http://imgs.xkcd.com/comics/heartbleed _explanation.png'—which you might have noticed looks a lot like a file path.
And in fact, you can call os.path.basename() with comicUrl, and it will return just the last part of the URL,
'heartbleed_explanation.png'. You can use this as the filename when saving the image to your hard drive.
Program:
#! python3
# t1_ch11_downloadXkcd.py - Downloads every single XKCD comic.
soup = bs4.BeautifulSoup(res.text)
print('Done.')
Output :
Downloading page http://xkcd.com...
Downloading image http://imgs.xkcd.com/comics/phone_alarm.png...
Downloading page http://xkcd.com/1358/...
….
Example:
>>> from selenium import webdriver #import selenium
>>> browser = webdriver.Firefox() #opens Firefox browser
>>> type(browser)
<class 'selenium.webdriver.firefox.webdriver.WebDriver'>
>>> browser.get('http://inventwithpython.com') #opens the url in Firefox browser
The find_element_* methods return a single WebElement object, representing the first element on the page
that matches your query.
The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.
Except for the *_by_tag_name() methods, the arguments to all the methods are case sensitive. If no
elements exist on the page that match what the method is looking for, the selenium module raises a
NoSuchElement exception.
Once you have the WebElement object, you can find out more about it by reading the attributes or calling
the methods.
Example:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://inventwithpython.com')
try:
elem = browser.find_element_by_class_name('bookcover')
print('Found <%s> element with that class name!' % (elem.tag_name))
except:
Output:
Found <img> element with that class name!
This method can be used to follow a link, make a selection on a radio button, click a Submit button, or
trigger whatever else might happen when the element is clicked by the mouse.
Example:
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.get('http://inventwithpython.com')
>>> linkElem = browser.find_element_by_link_text('Read It Online')
>>> type(linkElem)
<class 'selenium.webdriver.remote.webelement.WebElement'>
>>> linkElem.click() # follows the "Read It Online" link
This opens Firefox to http://inventwithpython.com/, gets the WebElement object for the <a> element with
the text Read It Online, and then simulates clicking that <a> element.
Example:
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.get('https://mail.yahoo.com')
>>> emailElem = browser.find_element_by_id('login-username')
>>> emailElem.send_keys('not_my_real_email')
>>> passwordElem = browser.find_element_by_id('login-passwd')
>>> passwordElem.send_keys('12345')
>>> passwordElem.submit()
Calling the submit() method on any element will have the same result as clicking the Submit button for the
form that element is in.
For example, if the cursor is not currently in a text field, pressing the home and end keys will scroll the
browser to the top and bottom of the page, respectively.
Example:
>>> from selenium import webdriver
>>> from selenium.webdriver.common.keys import Keys
>>> browser = webdriver.Firefox()
>>> browser.get('http://nostarch.com')
>>> htmlElem = browser.find_element_by_tag_name('html')
>>> htmlElem.send_keys(Keys.END) # scrolls to bottom
>>> htmlElem.send_keys(Keys.HOME) # scrolls to top
Free alternatives that run on Windows, OS X, and Linux. Both LibreOffice Calc and OpenOffice Calc work
with Excel’s .xlsx file format for spreadsheets, which means the openpyxl module can work on
spreadsheets from these applications as well.
Excel Documents
An Excel spreadsheet document is called a workbook.
A single workbook is saved in a file with the .xlsx extension.
Each workbook can contain multiple sheets (also called worksheets).
The sheet the user is currently viewing (or last viewed before closing Excel) is called the active sheet.
Each sheet has columns (addressed by letters starting at A) and rows (addressed by numbers starting at
1).
A box at a particular column and row is called a cell.
Each cell can contain a number or text value. The grid of cells with data makes up a sheet.
If the module was correctly installed, this should produce no error messages else you’ll get a NameError:
name 'openpyxl' is not defined error.
Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> type(wb)
<class 'openpyxl.workbook.workbook.Workbook'>
get_active_sheet() method of a Workbook object, returns the workbook’s active sheet (Workbook sheet
object).
Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> wb.get_sheet_names()
['Sheet1', 'Sheet2', 'Sheet3']
>>> sheet.title
'Sheet3'
>>> anotherSheet
<Worksheet "Sheet1">
Examlple:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.get_sheet_by_name('Sheet1')
>>> sheet['A1']
<Cell Sheet1.A1>
>>> sheet['A1'].value
datetime.datetime(2015, 4, 5, 13, 34, 2)
>>> c = sheet['B1']
>>> c.value
'Apples'
>>> 'Row ' + str(c.row) + ', Column ' + c.column_letter + ' is ' + c.value
'Row 1, Column B is Apples'
>>> sheet['C1'].value
73
OpenPyXL will automatically interpret the dates in column A and return them as datetime values rather
than strings.
Example:
>>> sheet.cell(row=1, column=2)
<Cell Sheet1.B1>
>>> sheet.cell(row=1, column=2).value
'Apples'
>>> for i in range(1, 8, 2):
print(i, sheet.cell(row=i, column=2).value)
Ramesh Babu N, Assc. Prof.,
Dept. of CSE, AIEMS Page 49 of 72
Application Development using Python [18CS55]
1 Apples
3 Pears
5 Apples
7 Strawberries
get_highest_row() and get_highest_column() methods of Worksheet object determines the size of the
sheet. (has been replaced with max_row and max_column)
Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.get_sheet_by_name('Sheet1')
>>> sheet.max_row
7
>>> sheet.max_column
3
openpyxl.utils.cell.get_column_letter(idx)
Convert a column index into a column letter (3 -> ‘C’)
Example:
""" t1_ch12_convert_colnames_numbers.py """
import openpyxl
from openpyxl.utils.cell import get_column_letter, column_index_from_string
get_column_letter(1) #'A'
get_column_letter(2) #'B'
get_column_letter(27) #'AA'
get_column_letter(900) #'AHP'
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
#get_column_letter(sheet.get_highest_column()) #'C'
get_column_letter(sheet.max_column) #'C'
column_index_from_string('A') #1
column_index_from_string('AA') #27
Example:
""" t1_ch12_getting_rows_columns.py """
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
tuple(sheet['A1':'C3'])
"""
((<Cell Sheet1.A1>, <Cell Sheet1.B1>, <Cell Sheet1.C1>), (<Cell Sheet1.A2>,<Cell Sheet1.B2>, <Cell
Sheet1.C2>), (<Cell Sheet1.A3>, <Cell Sheet1.B3>, <Cell Sheet1.C3>))"""
Output:
A1 2015-04-05 13:34:02
B1 Apples
C1 73
--- END OF ROW ---
A2 2015-04-05 03:41:23
B2 Cherries
C2 85
--- END OF ROW ---
A3 2015-04-06 12:46:51
B3 Pears
C3 14
--- END OF ROW ---
sheet[‘A1’ : ‘C3’] returns a Generator object containing the Cell objects in that area.
We can use tuple() on it to display its Cell objects in a tuple. This tuple contains three tuples: one for
each row, from the top of the desired area to the bottom.
columns
Produces all cells in the worksheet, by column
Example:
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_active_sheet()
"""
Apples
Cherries
Pears
Oranges
Apples
Bananas
Strawberries
"""
Workbooks, Sheets, Cells
As a quick review, here’s a rundown of all the functions, methods, and data types involved in reading a
cell out of a spreadsheet file:
1. Import the openpyxl module.
2. Call the openpyxl.load_workbook() function.
3. Get a Workbook object.
4. Call the get_active_sheet() or get_sheet_by_name() workbook method.
5. Get a Worksheet object.
6. Use indexing or the cell() sheet method with row and column keyword arguments.
7. Get a Cell object.
8. Read the Cell object’s value attribute.
In this project, you’ll write a script that can read from the census spreadsheet file and calculate statistics
for each county in a matter of seconds.
--snip—
If the previous dictionary were stored in countyData, the following expressions would evaluate like this:
>>> countyData['AK']['Anchorage']['pop']
291826
>>> countyData['AK']['Anchorage']['tracts']
55
More generally, the countyData dictionary’s keys will look like this:
countyData[state abbrev][county]['tracts']
countyData[state abbrev][county]['pop']
Program:
#! python3
# readCensusExcel.py - Tabulates population and number of census tracts for each county.
countyData = {}
# Make sure the key for this county in this state exists.
countyData[state].setdefault(county, {'tracts': 0, 'pop': 0})
# Open a new text file and write the contents of countyData to it.
print('Writing results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(countyData))
resultFile.close()
print('Done.')
The workbook will start off with a single sheet named Sheet. You can change the name of the sheet by
storing a new string in its title attribute.
Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.get_active_sheet()
>>> sheet.title = 'Spam Spam Spam'
>>> wb.save('example_copy.xlsx')
remove_sheet(worksheet)
Remove worksheet from this workbook.
Parameters:
worksheet : a Worksheet object, not a string of the sheet name, as its argument.
Note: Deprecated: Use wb.remove(worksheet) or del wb[sheetname]
Example:
>>> import openpyxl
>>> wb = openpyxl.Workbook()
>>> wb.get_sheet_names()
['Sheet']
>>> wb.create_sheet()
<Worksheet "Sheet1">
>>> wb.get_sheet_names()
['Sheet', 'Sheet1']
>>> wb.create_sheet(index=0, title='First Sheet')
<Worksheet "First Sheet">
>>> wb.get_sheet_names()
['First Sheet', 'Sheet', 'Sheet1']
>>> wb.create_sheet(index=2, title='Middle Sheet')
<Worksheet "Middle Sheet">
>>> wb.get_sheet_names()
['First Sheet', 'Sheet', 'Middle Sheet', 'Sheet1']
>>> wb.get_sheet_names()
['First Sheet', 'Sheet', 'Middle Sheet', 'Sheet1']
>>> wb.remove_sheet(wb.get_sheet_by_name('Middle Sheet'))
>>> wb.remove_sheet(wb.get_sheet_by_name('Sheet1'))
>>> wb.get_sheet_names()
['First Sheet', 'Sheet']
Example:
>>> import openpyxl
>>> wb = openpyxl.Workbook()
>>> sheet = wb.get_sheet_by_name('Sheet')
>>> sheet['A1'] = 'Hello world!'
>>> sheet['A1'].value
'Hello world!'
Each row represents an individual sale. The columns are the type of produce sold (A), the cost per pound
of that produce (B), the number of pounds sold (C), and the total revenue from the sale (D). The TOTAL
column is set to the Excel formula =ROUND(B3*C3, 2), which multiplies the cost per pound by the
number of pounds sold and rounds the result to the nearest cent. With this formula, the cells in the
TOTAL column will automatically update themselves if there is a change in column B or C.
To update the cost per pound for any garlic, celery, and lemon rows.
Step 1: Set Up a Data Structure with the Update Information: use dictionary
Step 2: Check All Rows and Update Incorrect Prices
Program:
#! python3
# updateProduce.py - Corrects costs in produce sales spreadsheet.
import openpyxl
wb = openpyxl.load_workbook('produceSales.xlsx')
sheet = wb.get_sheet_by_name('Sheet')
wb.save('updatedProduceSales.xlsx')
Output:
To customize font styles in cells, important, import the Font() and Style() functions from the openpyxl.styles
module.
A cell’s style can be set by assigning the Style object to the style attribute.
Example:
>>> import openpyxl
>>> from openpyxl.styles import Font, Style
>>> wb = openpyxl.Workbook()
>>> sheet = wb.get_sheet_by_name('Sheet')
>>> italic24Font = Font(size=24, italic=True)
>>> styleObj = Style(font=italic24Font)
>>> sheet['A1'].style = styleObj
>>> sheet['A1'] = 'Hello world!'
>>> wb.save('styles.xlsx')
Font(size=24, italic=True) returns a Font object, which is stored in italic24Font. The keyword arguments
to Font(), size and italic, configure the Font object’s style attributes. This Font object is then passed into
the Style(font=italic24Font) call, which returns the value you stored in styleObj. And when styleObj is
assigned to the cell’s style attribute, all that font styling information gets applied to cell A1.
Font Objects
To set font style attributes, you pass keyword arguments to Font().
Example:
""" t1_ch12_setting_font.py """
import openpyxl
from openpyxl.styles import Font
wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'].font = fontObj1
sheet['A1'] = 'Bold Times New Roman'
sheet['B3'].font = fontObj2
sheet['B3'] = '24 pt Italic'
wb.save('styles.xlsx')
Formulas
Formulas, which begin with an equal sign, can configure cells to contain values calculated from other
cells.
Example:
You can also read the formula in a cell just as you would any value. However, if you want to see the
result of the calculation for the formula instead of the literal formula, you must pass True for the data_only
keyword argument to load_workbook().
Examlpe:
>>> import openpyxl
>>> wbFormulas = openpyxl.load_workbook('writeFormula.xlsx')
>>> sheet = wbFormulas.get_active_sheet()
>>> sheet['A3'].value
'=SUM(A1:A2)'
>>> wbDataOnly = openpyxl.load_workbook('writeFormula.xlsx', data_only=True)
>>> sheet = wbDataOnly.get_active_sheet()
>>> sheet['A3'].value
500
In row_dimensions, you can access one of the objects using the number of the row (in this case, 1 or 2).
In column_dimensions, you can access one of the objects using the letter of the column (in this case, A
or B).
Row height can be set to an integer or float value between 0 and 409. (1 points = 1/72 of an inch. The
default row height is 12.75.)
The column width can be set to an integer or float value between 0 and 255. This value represents the
number of characters at the default font size (11 point). The default column width is 8.43 characters.
The argument to merge_cells() is a single string of the top-left and bottom-right cells of the rectangular
area to be merged: 'A1:D3' merges 12 cells into a single cell.
To set the value of these merged cells, simply set the value of the top-left cell of the merged group.
Freeze Panes
For spreadsheets too large to be displayed all at once, it’s helpful to “freeze” a few of the top rows or
leftmost columns onscreen. Frozen column or row headers, are always visible to the user even as they
scroll. These are known as freeze panes.
In OpenPyXL, each Worksheet object has a freeze_panes attribute that can be set to a Cell object or a string of
a cell’s coordinates. Note that all rows above and all columns to the left of this cell will be frozen.
Example:
>>> import openpyxl
>>> wb = openpyxl.load_workbook('produceSales.xlsx')
>>> sheet = wb.get_active_sheet()
>>> sheet.freeze_panes = 'A2'
>>> wb.save('freezeExample.xlsx')
Charts
OpenPyXL supports creating bar, line, scatter, and pie charts using the data in a sheet’s cells.
Reference objects are created by calling the openpyxl.charts.Reference() function and passing three arguments:
1. The Worksheet object containing your chart data.
2. A tuple of two integers, representing the top-left cell of the rectangular selection of cells containing
your chart data: The first integer in the tuple is the row, and the second is the column. Note that 1 is
the first row, not 0.
3. A tuple of two integers, representing the bottom-right cell of the rectangular selection of cells
containing your chart data: The first integer in the tuple is the row, and the second is the column.
Example:
""" t1_ch12_chart.py """
from openpyxl import Workbook
from openpyxl.chart import Reference, Series, BarChart
wb = Workbook()
sheet = wb.active
for i in range(1, 11): # create some data in column A
sheet['A' + str(i)] = i
sheet.add_chart(chartObj)
wb.save('sampleChart.xlsx')
Output:
PDF Documents
PDF and Word documents are binary files, which makes them much more complex than plaintext files.
In addition to text, they store lots of font, color, and layout information.
PDF stands for Portable Document Format and uses the .pdf file extension.
The module you’ll use to work with PDFs is PyPDF2. To install it, run pip install PyPDF2 from the command
line.
getPage(pageNumber)
Retrieves a page by number from this PDF file.
Parameters: pageNumber (int) – The page number to retrieve (pages begin at zero)
Returns: a PageObject instance.
Return type: PageObject
numPages
Read-only property that accesses the getNumPages() function.
Example:
""" t1_ch13_read_pdf.py """
import PyPDF2
# Open the file in binary mode
pdfFileObj = open('meetingminutes.pdf', 'rb')
"""
OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of
March 7, 2014
The Board of Elementary and Secondary Education shall provide leadership and ...
"""
Decrypting PDFs
Some PDF documents are password protected.
All PdfFileReader objects have an isEncrypted attribute that is True if the PDF is encrypted and False if it isn’t.
To read an encrypted PDF, call the decrypt() function and pass the password as a string.
Example:
""" t1_ch13_decrypt_pdf.py """
import PyPDF2
pdfReader = PyPDF2.PdfFileReader(open('encrypted.pdf', 'rb'))
print(pdfReader.isEncrypted) # True
#pdfReader.getPage(0)
# PdfReadError: file has not been decrypted
pageObj = pdfReader.getPage(0)
"""
PDF Text: OOFFFFIICCIIAALL BBOOAARRDD MMIINNUUTTEESS Meeting of
March 7
, 2014...
"""
Creating PDFs
PyPDF2’s PDF-writing capabilities are limited to copying pages from other PDFs, rotating pages,
overlaying pages, and encrypting files.
PyPDF2 doesn’t allow you to directly edit a PDF. Instead, you have to create a new PDF and then copy
content over from an existing document.
PdfFileWriter API
class PyPDF2.PdfFileWriter
This class supports writing PDF files.
write(stream)
Writes the collection of pages added to this object out as a PDF file.
Parameters: stream – An object to write the file to.
addPage(page)
Adds a page to this PDF file (at end). The page is usually acquired from a PdfFileReader instance.
Parameters: page (PageObject) – The page to add to the document.
rotateCounterClockwise(angle)
Rotates a page counter-clockwise by increments of 90 degrees.
Parameters: angle (int) – Angle to rotate the page. Must be an increment of 90 deg.
mergePage(page2)
Merges the content streams of two pages into one.
Parameters: page2 (PageObject) – The page to be merged into this one.
Ramesh Babu N, Assc. Prof.,
Dept. of CSE, AIEMS Page 67 of 72
Application Development using Python [18CS55]
Copying Pages
You can use PyPDF2 to copy pages from one PDF document to another.
This allows you to combine multiple PDF files, cut unwanted pages, or reorder pages.
Example:
""" t1_ch13_copying_pdf.py """
import PyPDF2
Rotating Pages
The pages of a PDF can also be rotated in 90-degree increments with the rotateClockwise() and
rotateCounterClockwise() methods. Pass one of the integers 90, 180, or 270 to these methods.
Example:
""" t1_ch13_rotating_pdf.py """
import PyPDF2
Overlaying Pages
PyPDF2 can also overlay the contents of one page over another, which is useful for adding a logo,
timestamp, or watermark to a page.
Example:
""" t1_ch13_merge_pdf.py """
import PyPDF2
# Save PDF
resultPdfFile = open('watermarkedCover.pdf', 'wb')
pdfWriter.write(resultPdfFile)
# Close files
minutesFile.close()
resultPdfFile.close()
Encrypting PDFs
A PdfFileWriter object can also add encryption to a PDF document.
PdfFileWriter method
encrypt(user_pwd, owner_pwd=None)
Encrypt this PDF file with the PDF Standard encryption handler.
Parameters:
user_pwd (str) – The “user password”, which allows for opening and reading the PDF file with
the restrictions provided.
owner_pwd (str) – The “owner password”, which allows for opening the PDF files without any
restrictions. By default, the owner password is the same as the user password.
Example:
""" t1_ch13_encrypt_pdf.py """
import PyPDF2
For this project, open a new file editor window and save it as t1_ch13_combined.pdfs.py.
Step 1: Find All PDF Files
Step 2: Open Each PDF
Step 3: Add Each Page
Step 4: Save the Results
Program:
#! python3
# combinePdfs.py - Combines all the PDFs in the current working directory into a single PDF.
import PyPDF2, os
os.chdir("E:\\Notes\\Python18CS55\\temp\\ch13_combine_pdf")
pdfWriter = PyPDF2.PdfFileWriter()
# Loop through all the pages (except the first) and add them.
for pageNum in range(0, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
References:
1. Al Sweigart,“Automate the Boring Stuff with Python”,1stEdition, No Starch Press, 2015.
(Available under CC-BY-NC-SA license at https://automatetheboringstuff.com/)
2. Allen B. Downey, “Think Python: How to Think Like a Computer Scientist”, 2nd Edition, Green Tea
Press, 2015. (Available under CC-BY-NC license at
http://greenteapress.com/thinkpython2/thinkpython2.pdf)
3. https://openpyxl.readthedocs.io/en/stable/api/openpyxl.cell.cell.html
4. https://openpyxl.readthedocs.io/en/stable/api/openpyxl.utils.cell.html
5. https://pythonhosted.org/PyPDF2/PdfFileWriter.html
6. https://python-docx.readthedocs.org/