The Python Master
The Python Master
The Python Master
Python Master
* * * * *
This is a Leanpub book. Leanpub empowers authors and publishers with the
Lean Publishing process. Lean Publishing is the act of publishing an in-progress
ebook using lightweight tools and many iterations to get reader feedback, pivot
until you have the right book and build traction once you do.
* * * * *
© 2013 - 2019 Robert Smallshire and Austin Bingham ISBN for EPUB version:
978-82-93483-08-3
You can read The Python Master either as a standalone Python tutorial, or as the
companion volume to the corresponding Advanced Python video course,
depending on which style of learning suits you best. In either case we assume
that you’re up to speed with the material covered in the preceding books or
courses.
Some of our examples show code saved in files, and others — such as the one
above — are from interactive Python sessions. In such interactive cases, we
include the prompts from the Python session such as the triple-arrow (>>>) and
triple-dot (...) prompts. You don’t need to type these arrows or dots. Similarly,
for operating system shell-commands we will use a dollar prompt ($) for Linux,
macOS and other Unixes, or where the particular operating system is
unimportant for the task at hand:
$ python3 words.py
For code blocks which need to be placed in a file, rather than entered
interactively, we show code without any leading prompts:
def write_sequence(filename, num):
"""Write Recaman's sequence to a text file."""
with open(filename, mode='wt', encoding='utf-8') as f:
f.writelines("{0}\n".format(r)
for r in islice(sequence(), num + 1))
We’ve worked hard to make sure that our lines of code are short enough so that
each single logical line of code corresponds to a single physical line of code in
your book. However, the vagaries of publishing e-books to different devices and
the very genuine need for occasional long lines of code mean we can’t guarantee
that lines don’t wrap. What we can guarantee, however, is that where a line does
wrap, the publisher has inserted a backslash character \ in the final column. You
need to use your judgement to determine whether this character is legitimate part
of the code or has been added by the e-book platform.
>>> print("This is a single line of code which is very long. Too long, in fact, to fit on\
a single physical line of code in the book.")
If you see a backslash at the end of the line within the above quoted string, it is
not part of the code, and should not be entered.
Occasionally, we’ll number lines of code so we can refer to them easily from the
narrative next. These line numbers should not be entered as part of the code.
Numbered code blocks look like this:
1 def write_grayscale(filename, pixels):
2 height = len(pixels)
3 width = len(pixels[0])
4
5 with open(filename, 'wb') as bmp:
6 # BMP Header
7 bmp.write(b'BM')
8
9 # The next four bytes hold the filesize as a 32-bit
10 # little-endian integer. Zero placeholder for now.
11 size_bookmark = bmp.tell()
12 bmp.write(b'\x00\x00\x00\x00')
Sometimes we need to present code snippets which are incomplete. Usually this
is for brevity where we are adding code to an existing block, and where we want
to be clear about the block structure without repeating all existing contents of the
block. In such cases we use a Python comment containing three dots # ... to
indicate the elided code:
class Flight:
# ...
Here it is implied that some other code already exists within the Flight class
block before the make_boarding_cards() function.
Finally, within the text of the book, when we are referring to an identifier which
is also a function we will use the identifier with empty parentheses, just as we
did with make_boarding_cards() in the preceding paragraph.
Welcome!
Welcome to The Python Master. This the third in Sixty North’s trilogy of books
which cover the core Python language, and it builds directly on the knowledge
we impart in the first two
Our books follow a thoughtfully designed spiral curriculum. We visit the same,
or closely related, topics several times in increasing depth, sometimes multiple
times in the same book. For example, in The Python Apprentice we cover single
class inheritance. Then in The Python Journeyman we cover multiple class
inheritance. In this book we’ll cover metaclasses to give you the ultimate power
over class construction.
The Python Master covers some aspects of Python which you may use relatively
infrequently. Mastery of the Python language calls for felicity with these features
— at least to the extent that you can identify their use in existing code and
appreciate their intent. We’ll go all the way in this book to show you how to use
the most powerful of Python’s language features to greatest
Knowing how to use advanced features is what we will teach in this book.
Knowing when to use advanced features — demonstrating a level of skilfulness
that can only be achieved through time and experience — is the mark of the true
master.
That’s a lot to cover, but over the course of this book you’ll begin to see how
many of these pieces fit together.
Prerequisites
At a minimum, you need to be able to run a Python 3 REPL. You can, of course,
use an IDE if you wish, but we won’t require anything beyond what comes with
the standard Python distribution.
If you don’t have much Python experience yet, by all means read on. But be
aware that some concepts (e.g. metaclasses) often don’t make complete sense
until you’ve personally encountered a situation where they could be helpful.
Take what you can from this book for now, continue programming, and refer
back to here when you encounter such a situation.
The Road Goes On Forever
When Alexander saw the breadth of his domain, he wept for there were no
more worlds to conquer.
This book completes our trilogy on core Python, and after reading it you will
know a great deal about the language. Unlike poor Alexander the Great, though,
you can save your tears: there is still a lot to Python that we haven’t covered, so
you can keep learning for years to come!
The material in The Python Master can be tricky, and much of it, by its nature,
requires experience to use well. So take your time, and try to correlate what you
learn with your past and ongoing Python programming experience. And once
you think you understand an idea, take the ultimate test: teach it to others. Give
conference talks, write blog posts (or books!), speak at user groups, or just show
what you’ve learned to a colleague.
If writing these books has taught us one thing, it’s that explaining things to
others is the best way to find out what you don’t understand. This book can help
you on the way to Python mastery, but going the full distance is up to you. Enjoy
the journey!
Chapter 1 - Advanced Flow Control
In this chapter we’ll be looking at some more advanced, and possibly even
obscure, flow-control techniques used in Python programs with which you
should be familiar at the advanced level.
while .. else
We’ll start out by saying that Guido van Rossum, inventor and benevolent
dictator for life of Python, has admitted that he would not include this feature if
he developed Python again.23 Back in our timeline though, the feature is there,
and we need to understand it.
Apparently, the original motivation for using the else keyword this way in
conjunction with while loops comes from Donald Knuth in early efforts to rid
structured programming languages of goto. Although the choice of keyword is at
first perplexing, it’s possible to rationalise the use of else in this way through
comparison with the if...else construct:
if condition:
execute_condition_is_true()
else:
execute_condition_is_false()
In the if...else structure, the code in the if clause is executed if the condition
evaluates to True when converted to bool — in other words, if the condition is
‘truthy’. On the other hand, if the condition is ‘falsey’, the code in the else
clause is executed instead.
We can, perhaps, glimpse the logic behind choosing the else keyword. The else
clause will be executed when, and only when, the condition evaluates to False.
The condition may already be False when execution first reaches the while
statement, so it may branch immediately into the else clause. Or there may be
any number of cycles of the while-loop before the condition becomes False and
execution transfers to the else block.
Fair enough, you say, but isn’t this equivalent to putting the else code after the
loop, rather than in a special else block, like this? :
while condition:
execute_condition_is_true()
execute_condition_is_false()
You would be right in this simple case. But if we place a break statement within
the loop body it becomes possible to exit the loop without the loop conditional
ever becoming false. In that case the execute_condition_is_false() call
happens even though condition is not False:
while condition:
flag = execute_condition_is_true()
if flag:
break
execute_condition_is_false()
To fix this, we could use a second test with an if statement to provide the
desired behaviour:
while condition:
flag = execute_condition_is_true()
if flag:
break
if not condition:
execute_condition_is_false()
The drawback with this approach is that the test is duplicated, which violates the
Don’t Repeat Yourself — or DRY — guideline, which hampers maintainability.
Now, the else block only executes when the main loop condition evaluates to
False. If we jump out of the loop another way, such as with the break statement,
execution jumps over the else clause. There’s no doubt that, however you
rationalise it, the choice of keyword here is confusing, and it would have been
better all round if a nobreak keyword had been used to introduce this block. In
lieu of such a keyword, we heartily recommend that, if you are tempted to use
this obscure and little-used language feature, you include such a nobreak
comment like this:
while condition:
flag = execute_condition_is_true()
if flag:
break
else: # nobreak
execute_condition_is_false()
We must admit that neither of the authors of this book have used while..else in
practice. Almost every example we’ve seen could be implemented better by
another, more easily understood, construct, which we’ll look at later. That said,
let’s look at an example in evaluator.py:
def is_comment(item):
return isinstance(item, str) and item.startswith('#')
def execute(program):
"""Execute a stack program.
Args:
program: Any stack-like collection where each item in the stack
is a callable operator or non-callable operand. The top-most
items on the stack may be strings beginning with '#' for
the purposes of documentation. Stack-like means support for:
print("Finished")
if __name__ == '__main__':
import operator
program = list(reversed((
"# A short stack program to add",
"# and multiply some constants",
9,
13,
operator.mul,
2,
operator.add)))
execute(program)
This code evaluates simple ‘stack programs’. Such programs are specified as a
stack of items where each item is either a callable function (for these we use any
regular Python function) or an argument to that function. So to evaluate 5 + 2,
we would set up the stack like this:
5
2
+
When the plus operator is evaluated, its result is pushed onto the stack. This
allows us to perform more complex operations such as (5 + 2) * 3 :
5
2
+
3
*
As the stack contains the expression in reverse Polish notation, the parentheses
we needed in the infix version aren’t required. In reality, the stack will be a
Python list. The operators will be callables from the Python standard library
operators module, which provides named-function equivalents of every Python
infix operator. Finally, when we use Python lists as stacks the top of the stack is
at the end of the list, so to get everything in the right order we’ll need to reverse
our list:
program = list(reversed([5, 2, operator.add, 3, operator.mul]))
For added interest, our little stack language also supports comments as strings
beginning with a hash symbol, just like Python. However, such comments are
only allowed at the beginning of the program, which is at the top of the stack:
import operator
program = list(reversed((
"# A short stack program to add",
"# and multiply some constants",
5,
2,
operator.add,
3,
operator.mul)))
We’d like to run our little stack program by passing it to a function execute(),
like this:
execute(program)
Let’s see what such a function might look like and how it can use the
while..else construct to good effect. The first thing our execute() function
needs to do is pop all the comment strings from the top of the stack and discard
them. To help with this, we’ll define a simple predicate which identifies stack
items as comments:
def is_comment(item):
return isinstance(item, str) and item.startswith('#')
Notice that this function relies on an important Python feature called boolean short-circuiting.
If item is not a string then the call to startswith() raises an AttributeError. However, when
evaluating the boolean operators and and or Python will only evaluate the second operand if it
is necessary for computing the result. When item is not as string (meaning the first operand
evaluates to False) then the result of the boolean and must also be False; in this case there’s
no need to evaluate the second operand.
Given this useful predicate, we’ll now use a while-loop to clear comment items
from the top of the stack:
while program:
item = program.pop()
if not is_comment(item):
program.append(item)
break
else: # nobreak
print("Empty program!")
return
The conditional expression for the while statement is the program stack object
itself. Remember that using a collection in a boolean context like this evaluates
to True if the collection is non-empty or False if it is empty. Or put another way,
empty collections are ‘falsey’. So this statement reads as “While there are items
remaining in the program.”
The while-loop has an associated else clause where execution will jump if the
while condition should ever evaluate to False. This happens when there are no
more items remaining in the program. In this clause we print a warning that the
program was found to be logically empty, then returning early from the
execute() function.
Within the while block, we pop() an item from the stack — recall that regular
Python lists have this method which removes and returns the last item from a
list. We use logical negation of our is_comment() predicate to determine if the
just-popped item is not a comment. If the loop has reached a non-comment item,
we push it back onto the stack using a call to append(), which leaves the stack
with the first non-comment item on top, and then break from the loop.
Remember that the while-loop else clause is best thought of as the “no break”
clause, so when we break from the loop execution skips the else block and
proceeds with the first statement after.
This loop executes the else block in the case of search failure — in this example
if we fail to locate the first non-comment item because there isn’t one. Search
failure handling is probably the most widespread use of loop else clauses.
Now we know that all remaining items on the stack comprise the actual program.
We’ll use another while-loop to evaluate it:
pending = []
while program:
item = program.pop()
if callable(item):
try:
result = item(*pending)
except Exception as e:
print("Error: ", e)
break
program.append(result)
pending.clear()
else:
pending.append(item)
else: # nobreak
print("Program successful.")
print("Result: ", pending)
Before this loop we set up an empty list called pending. This will be used to
accumulate arguments to functions in the stack, which we’ll look at shortly.
As before, the condition on the while-loop is the program stack itself, so this
loop will complete, and control will be transferred to the while-loop else-clause,
when the program stack is empty. This will happen when program execution is
complete.
Within the while-loop we pop the top item from the stack and inspect it with the
built-in callable() predicate to decide if it is a function. For clarity, we’ll look
at the else clause first. That’s the else clause associated with the if, not the
else clause associated with the while!
If the popped item is not callable, we append it to the pending list, and go
around the loop again if the program is not yet empty.
If the item is callable, we try to call it, passing any pending arguments to the
function using the star-args extended call syntax. Should the function call fail,
we catch the exception, print an error message, and break from the while-loop.
Remember this will bypass the loop else clause. Should the function call
succeed, we assign the return value to result, push this value back onto the
program stack, and clear the list of pending arguments.
When the program stack is empty the else block associated with the while-loop
is entered. This prints “Program successful” followed by any contents of the
pending list. This way a program can “return” a result by leaving non-callable
values at the bottom of the stack; these will be swept up into the pending list and
displayed at the end.
for-else loops
Now we understand the while..else construct we can look at the analogous
for..else construct. The for..else construct may seem even more odd than
while..else, given the absence of an explicit condition in the for statement, but
you need to remember that the else clause is really the no-break clause. In the
case of the for-loop, that’s exactly when it is called — when the loop is exited
without breaking. This includes the case when the iterable series over which the
loop is iterating is empty.4
for item in iterable:
if match(item):
result = item
break
else: # nobreak
# No match found
result = None
The typical pattern of use is like this: We use a for-loop to examine each item of
an iterable series, and test each item. If the item matches, we break from the
loop. If we fail to find a match the code in the else block is executed, which
handles the ‘no match found’ case.
For example, here is a code fragment which ensures that a list of integers
contains at least one integer divisible by a specified value. If the supplied list
does not contain a multiple of the divisor, the divisor itself is appended to the list
to establish the invariant:
items = [2, 25, 9, 37, 28, 14]
divisor = 12
We set up a list of numeric items and a divisor, which will be 12 in this case. Our
for-loop iterates through the items, testing each in turn for divisibility by the
divisor. If a multiple of the divisor is located, the variable found is set to the
current item, and we break from the loop — skipping over the loop-else clause
— and print the list of items. Should the for-loop complete without encountering
a multiple of 12, the loop-else clause will be entered, which appends the divisor
itself to the list, thereby ensuring that the list contains an item divisible by the
divisor.
For-else clauses are more common than while-else clauses, although we must
emphasise that neither are common, and both are widely misunderstood. So
although we want you to understand them, we can’t really recommend using
them unless you’re sure that everyone who needs to read your code is familiar
with their use.
Almost any time you see a loop else clause you can refactor it by extracting the
loop into a named function, and instead of break-ing from the loop, prefer to
return directly from the function. The search failure part of the code, which was
in the else clause, can then be dedented a level and placed after the loop body.
Doing so, our new ensure_has_divisible() function would look like this:
def ensure_has_divisible(items, divisor):
for item in items:
if item % divisor == 0:
return item
items.append(divisor)
return divisor
This is easier to understand, because it doesn’t use any obscure and advanced
Python flow-control techniques. It’s easier to test because it is extracted into a
standalone function. It’s reusable because it’s not mixed in with other code, and
we can give it a useful and meaningful name, rather than having to put a
comment in our code to explain the block.
Although rarely seen in the wild, it is useful, particularly when you have a series
of operations which may raise the same exception type, but where you only want
to handle exceptions from the first such, operation, as commonly happens when
working with files:
try:
f = open(filename, 'r')
except OSError: # OSError replaces IOError from Python 3.3 onwards
print("File could not be opened for read")
else:
# Now we're sure the file is open
print("Number of lines", sum(1 for line in f))
f.close()
In this example, both opening the file and iterating over the file can raise an
OSError, but we’re only interested in handling the exception from the call to
open().
It’s possible to have both an else clause and a finally clause. The else block
will only be executed if there was no exception, whereas the finally clause will
always be executed.
Emulating switch
Most imperative programming languages include a switch or case statement
which implements a multiway branch based on the value of an expression.
Here’s an example for the C programming language, where different functions
are called depending on the value of a menu_option variable. There’s also
handing for the case of ‘no such option’:
switch (menu_option) {
case 1: single_player(); break;
case 2: multi_player(); break;
case 3: load_game(); break;
case 4: save_game(); break;
case 5: reset_high_score(); break;
default:
printf("No such option!");
break;
}
Although switch can be emulated in Python by a chain of if..elif..else
blocks, this can be tedious to write, and it’s error prone because the condition
must be repeated multiple times.
We’ll look at a simple adventure game you cannot win in kafka.py, which we’ll
refactor from using if..elif..else to using dictionaries of callables. Along the
way, we’ll also use try..else:
"""Kafka - the adventure game you cannot win."""
def play():
position = (0, 0)
alive = True
while position:
command = input()
i, j = position
if command == "N":
position = (i, j + 1)
elif command == "E":
position = (i + 1, j)
elif command == "S":
position = (i, j - 1)
elif command == "W":
position = (i - 1, j)
elif command == "L":
pass
elif command == "Q":
position = None
else:
print("I don't understand")
print("Game over")
if __name__ == '__main__':
play()
The game-loop uses two if..elif..else chains. The first prints information
dependent on the players current position. Then, after accepting a command
from the user, the second if..elif..else chain takes action based upon the
command.
Let’s refactor this code to avoid those long if..elif..else chains, both of
which feature repeated comparisons of the same variable against different
values.
The first chain describes our current location. Fortunately in Python 3, although
not in Python 2, print() is a function, and can therefore be used in an
expression. We’ll leverage this to build a mapping of position to callables called
locations:
locations = {
(0, 0): lambda: print("You are in a maze of twisty passages, all alike."),
(1, 0): lambda: print("You are on a road in a dark forest. To the north you can see a\
tower."),
(1, 1): lambda: print("There is a tall tower here, with no obvious door. A path leads\
east.")
}
We’ll look up a callable using our position in locations as a key, and call the
resulting callable in a try block:
try:
locations[position]()
except KeyError:
print("There is nothing here.")
In fact, we don’t really intend to be catching KeyError from the callable, only
from the dictionary lookup, so this also gives us opportunity to narrow the scope
of the try block using the try..else construct we learned about earlier. Here’s
the improved code:
try:
location_action = locations[position]
except KeyError:
print("There is nothing here.")
else:
location_action()
We separate the lookup and the call into separate statements, and move the call
into the else block.
Similarly, we can refactor the if..elif..else chain which handles user input
into dictionary lookup for a callable. This time, though, we used named
functions rather than lambdas to avoid the restriction that lambdas can only
contain expressions and not statements. Here’s the branching construct:
actions = {
'N': go_north,
'E': go_east,
'S': go_south,
'W': go_west,
'L': look,
'Q': quit,
}
try:
command_action = actions[command]
except KeyError:
print("I don't understand")
else:
position = command_action(position)
Again we split the lookup of the command action from the call to the command
action.
def go_east(position):
i, j = position
new_position = (i + 1, j)
return new_position
def go_south(position):
i, j = position
new_position = (i, j - 1)
return new_position
def go_west(position):
i, j = position
new_position = (i - 1, j)
return new_position
def look(position):
return position
def quit(position):
return None
Notice that using this technique forces us into a more functional style of
programming. Not only is our code broken down into many more functions, but
the bodies of those functions can’t modify the state of the position variable.
Instead, we pass in this value explicitly and return the new value. In the new
version mutation of this variable only happens in one place, not five.
Although the new version is larger overall, we’d claim it’s much more
maintainable. For example, if a new piece of game state, such as the players
inventory, were to be added, all command actions would be required to accept
and return this value. This makes it much harder to forget to update the state than
it would be in chained if..elif..else blocks.
Let’s add a new “rabbit hole” location which, when the user unwittingly moves
into it, leads back to the starting position of the game. To make such a change,
we need to change all of our callables in the location mapping to accept and
return a position and a liveness status. Although this may seem onerous, we
think it’s a good thing. Anyone maintaining the code for a particular location can
now see what state needs to be maintained. Here are the location functions:
def labyrinth(position, alive):
print("You are in a maze of twisty passages, all alike.")
return position, alive
try:
location_action = locations[position]
except KeyError:
print("There is nothing here.")
else:
position, alive = location_action(position, alive)
We must also update the call to location_action() to pass the current state and
receive the modified state.
Now let’s make the game a little more morbid by adding a deadly lava pit
location which returns False for the alive status. Here’s the function for the
lava pit location:
def lava_pit(position, alive):
print("You fall into a lava pit.")
return position, False
We’ll also add an extra conditional block after we visit the location to deal with
deadly situations:
if not alive:
print("You're dead!")
break
Now when we die, we break out of the while loop which is the main game loop.
This gives us an opportunity to use a while..else clause to handle non-lethal
game loop exits, such as choosing to exit the game. Exits like this set the
position variable to None, which is ‘falsey’:
while position:
# ...
else: # nobreak
print("You have chosen to leave the game.")
Now when we quit deliberately, setting position to None and causing the while-
loop to terminate, we see the message from the else block associated with the
while-loop:
You are in a maze of twisty passages, all alike.
E
You are on a road in a dark forest. To the north you can see a tower.
N
There is a tall tower here, with no obvious door. A path leads east.
Q
You have chosen to leave the game.
Game over
But when we die by falling into the lava alive gets set to False. This causes
execution to break from the loop, but we don’t see the “You have chosen to
leave” message as the else block is skipped:
You are in a maze of twisty passages, all alike.
E
You are on a road in a dark forest. To the north you can see a tower.
N
There is a tall tower here, with no obvious door. A path leads east.
N
You fall into a lava pit.
You're dead!
Game over
Dispatching on Type
To “dispatch” on type means that the code which will be executed depends in
some way on the type of an object or objects. Python dispatches on type
whenever we call a method on an object; there may be several implementations
of that method in different classes, and the one that is selected depends on the
type of the self object.
Ordinarily, we can’t use this sort of polymorphism with regular functions. One
solution is to resort to switch-emulation to route calls to the appropriate
implementation by using type objects as dictionary keys. This is ungainly, and
it’s tricky to make it respect inheritance relationships as well as exact type
matches.
singledispatch
The singledispatch decorator, which we’ll introduce in this section, provides a
more elegant solution to this problem.
class Circle(Shape):
def __init__(self, center, radius, *args, **kwargs):
super().__init__(*args, **kwargs)
self.center = center
self.radius = radius
def draw(self):
print("\u25CF" if self.solid else "\u25A1")
class Parallelogram(Shape):
def draw(self):
print("\u25B0" if self.solid else "\u25B1")
class Triangle(Shape):
def draw(self):
print("\u25B2" if self.solid else "\u25B3")
def main():
shapes = [Circle(center=(0, 0), radius=5, solid=False),
Parallelogram(pa=(0, 0), pb=(2, 0), pc=(1, 1), solid=False),
Triangle(pa=(0, 0), pb=(1, 2), pc=(2, 0), solid=True)]
if __name__ == '__main__':
main()
Each class has an initializer and a draw() method. The initializers store any
geometric information peculiar to that type of shape. They pass any further
arguments up to the Shape base class, which stores a flag indicating whether the
shape is solid.
When we say:
shape.draw()
This is all very well and is the way much object-oriented software is constructed,
but this can lead to poor class design because it violates the single responsibility
principle. Drawing isn’t a behaviour inherent to shapes, still less drawing to a
particular type of device. In other words, shape classes should be all about
shape-ness, not about things you can do with shapes, such as drawing, serialising
or clipping.
What we’d like to do is move the responsibilities which aren’t intrinsic to shapes
out of the shape classes. In our case, our shapes don’t do anything else, so they
become containers of data with no behaviour, like this:
class Circle(Shape):
class Parallelogram(Shape):
class Triangle(Shape):
With the drawing code removed there are several ways to implement the drawing
responsibility outside of the classes. We could us a chain of if..elif..else
tests using isinstance():
def draw(shape):
if isinstance(shape, Circle):
draw_circle(shape)
elif isinstance(shape, Parallelogram):
draw_parallelogram(shape)
elif isinstance(shape, Triangle):
draw_triangle(shape)
else:
raise TypeError("Can't draw shape")
In this version of draw() we test shape using up to three calls to isinstance()
against Circle, Parallelogram and Triangle. If the shape object doesn’t match
any of those classes, we raise a TypeError. This is awkward to maintain and is
rightly considered to be very poor programming style.
try:
drawer = drawers(type(shape))
except KeyError as e:
raise TypeError("Can't draw shape") from e
else:
drawer(shape)
Here we lookup a drawer function by obtaining the type of shape in a try block,
translating the KeyError to a TypeError if the lookup fails. If we’re on the
happy-path of no exceptions, we invoke the drawer with the shape in the else
clause.
This looks better, but is actually much more fragile because we’re doing exact
type comparisons when we do the key lookup. This means that a subclass of,
say, Circle wouldn’t result in a call to draw_circle().
Recall that decorators wrap the function to which they are applied and bind the
resulting wrapper to the name of the original function. So in this case, the
wrapper returned by the decorator is bound to the name draw. The draw wrapper
has an attribute called register which is also a decorator; register() can be
used to provide extra versions of the original function which work on different
types. This is function overloading.
Since our overloads will all be associated with the name of the original function,
draw, it doesn’t matter what we call the overloads themselves. By convention we
call them _, although this is by no means required. Here’s an overload for
Circle, another for Parallelogram, and a third for Triangle:
@draw.register(Circle):
def _(shape):
print("\u25CF" if shape.solid else "\u25A1")
@draw.register(Parallelogram)
def _(shape):
print("\u25B0" if shape.solid else "\u25B1")
@draw.register(Triangle)
def _(shape):
# Draw a triangle
print("\u25B2" if shape.solid else "\u25B3")
@singledispatch
def intersects(self, shape):
raise TypeError("Don't know how to compute intersection with {!r}".format(shape))
@intersects.register(Circle)
def _(self, shape):
return circle_intersects_circle(self, shape)
@intersects.register(Parallelogram)
def _(self, shape):
return circle_intersects_parallelogram(self, shape)
@intersects.register(Triangle)
def _(self, shape):
return circle_intersects_triangle(self, shape)
At first sight, this looks like a reasonable approach, but there are a couple of
problems here.
The first problem problem is that we can’t register the type of the class currently
being defined with the intersects generic function, because we have not yet
finished defining it.
The second problem is more fundamental: Recall that singledispatch
dispatches based only on the type of the first argument:
do_intersect = my_circle.intersects(my_parallelogram)
When we’re calling our new method like this it’s easy to forget that
my_parallelogram is actually the second argument to Circle.intersects.
my_circle is the first argument, and it’s what gets bound to the self parameter.
Because self will always be a Circle in this case our intersect() call will
always dispatch to the first overload, irrespective of the type of the second
argument.
This behaviour prevents the use of singledispatch with methods. All is not lost
however. The solution is to move the generic function out of the class, and
invoke it from a regular method which swaps the arguments. Let’s take a look:
class Circle(Shape):
@singledispatch
def intersects_with_circle(shape, circle):
raise TypeError("Don't know how to compute intersection of {!r} with {!r}"
.format(circle, shape))
@intersects_with_circle.register(Circle)
def _(shape, circle):
return circle_intersects_circle(circle, shape)
@intersects.register(Parallelogram)
def _(shape, circle):
return circle_intersects_parallelogram(circle, shape)
@intersects.register(Triangle)
def _(shape, circle):
return circle_intersects_triangle(circle, shape)
We move the generic function intersects() out to the global scope and rename
it to intersects_with_circle(). The replacement intersects() method of
Circle, which accepts the formal arguments self and shape, now delegates to
intersect_with_circle() with the actual arguments swapped to shape and
self.
To complete this example, we would need to implement two other generic
functions, intersects_with_parallelogram() and
intersects_with_triangle(), although will leave that as an exercise.
This way, the function is selected based on the types of both shape and
other_shape, without the shape classes themselves having any knowledge of
each other, keeping coupling in the system manageable.
Summary
That just about wraps up this chapter on advanced flow control in Python 3.
Let’s summarise what we’ve covered:
Bitwise operators
Let’s start right at the bottom with the bitwise operators. These will seem
straightforward enough, but our exploration of them will lead us into some
murky corners of Python’s integer implementation.
Operator Description
& Bitwise AND
| Bitwise OR
^ Bitwise XOR
~ Bitwise NOT
<< Left shift
>> Right shift
We’ll demonstrate how each of these work, but this will necessitate an
interesting detour into Python’s integer representation.
Recall that we can specify binary literals using the 0b prefix and we can display integers in
binary using the built-in bin() function:
>>> 0b11110000
240
>>> bin(240)
'0b11110000'
Exclusive-or
The exclusive-OR operator behaves exactly as you would expect, setting bits in
the output value if exactly one of the corresponding operand bits is set:
>>> bin(0b11100100 ^ 0b00100111)
'0b11000011'
Two’s-complement
1. We start with the signed decimal -58, the value we want to represent.
2. We take the absolute value, which is 58, and represent this in 8-bit binary,
which is 00111010. The same binary pattern can be interpreted as the
unsigned decimal 58.
3. Because -58 is negative, we must apply additional steps 4 and 5, below.
Positive numbers have the same representation in signed and unsigned
representations, so don’t require these additional steps.
4. Now we flip all the bits — a bitwise-not operation — to give 11000101.
This is where the “two” in “two’s complement” comes from: the
complement operation is done in base two (or binary). This gives a bit-
pattern that would be interpreted as 197 as an unsigned integer, although
that’s not too important here.
5. Finally, we add one, which gives the bit pattern 11000110 which, although
it could be interpreted as the unsigned integer 198, is outside the range -128
to +127 allowed for signed 8-bit integers. This is the two’s complement 8-
bit representation for -58.
Recall that Python 3 uses arbitrary precision integers. That is, Python integers
are not restricted to one, two, four or eight bytes, but can use as many bytes as
are necessary to store integers of any magnitude.
This means we don’t easily get to see the internal bit representation of negative
numbers using bin() — in fact, we can’t even determine what internal
representation scheme is used! This can make it tricky to inspect the behaviour
of code which uses the bitwise operators.
When our use of the bitwise operators — such as the bitwise-not operator —
results in a bit pattern that would normally represent a negative integer in two’s
complement format, Python displays that value in sign- magnitude format,
obscuring the result we wanted. To get at the actual bit pattern, we need to do a
little extra work. Let’s return to our earlier example:
So much for the explanation, but how do we see the flipped bits arising from our
application of bitwise-not in a more intuitive way? One approach is to manually
take the two’s-complement of the magnitude, or absolute value of the number:
-0b11110001
flip 00001110
add 1 00001111
Unfortunately, this also uses the bitwise-not operator, and we end up chasing our
tail trying to escape the cleverness of Python’s arbitrary precision integers.
1. Take the signed interpretation of the two’s complement binary value, in this
case -241.
2. Add to it 2 raised to the power of the number of bits used in the
representation, excluding the leading ones. In this case that’s 28 or 256, and
-241 + 256 is 15
3. 15 has the 00001111 bit pattern we’re looking for — the binary value we
expected to get when we applied bitwise-not to 11110000.
giving what we wanted. See how asking for a 9 bit result reveals the leading 1 of
the two’s complement representation:
>>> bin(~0b11110000 & 0b111111111)
'0b100001111'
In fact, since Python 3.2, we can ask Python how many bits are required to
represent the integer value using the bit_length() method of the integer type,
although notice that this excludes the sign:
>>> int(32).bit_length()
6
>>> int(240).bit_length()
8
>>> int(-241).bit_length()
8
>>> int(256).bit_length()
9
If you want to use the native bytes order, you can retrieve the sys.byteorder
value:
>>> import sys
>>> sys.byteorder
'little'
>>> little_cafebabe = int(0xcafebabe).to_bytes(length=4, byteorder=sys.byteorder)
>>> little_cafebabe
b'\xbe\xba\xfe\xca'
Of course, given some bytes we can also turn them back into an integer, using
the complementary class method, from_bytes():
>>> int.from_bytes(little_cafebabe, byteorder=sys.byteorder)
3405691582
>>> hex(_)
'0xcafebabe'
However, if we set the optional signed argument to True, rather than its default
value of False we can get a two’s-complement representation back:
>>> int(-241).to_bytes(2, byteorder='little', signed=True)
b'\x0f\xff'
This indicates another way to answer the question that started this quest, by
indexing into the result bytes object to retrieve the least significant byte, and
converting that to a binary representation:
>>> bin((~0b11110000).to_bytes(2, byteorder='little', signed=True)[0])
'0b1111'
Literals bytes`
At this point you should be comfortable with the bytes literal, which uses the b
prefix. The default Python source code encoding is UTF-8, so the the characters
used in a literal byte string are restricted to printable 7-bit ASCII characters —
that is those with codes from 0 to 127 inclusive which aren’t control codes:
>>> b"This is OK because it's 7-bit ASCII"
b"This is OK because it's 7-bit ASCII"
Seven-bit control codes or characters which can’t be encoded in 7-bit ASCII
result in a SyntaxError:
>>> b"Norwegian characters like Å and Ø are not 7-bit ASCII"
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
To represent other bytes with values equivalent to ASCII control codes and byte
values from 128 to 255 inclusive we must use escape sequences:
>>> b"Norwegian characters like \xc5 and \xd8 are not 7-bit ASCII"
b'Norwegian characters like \xc5 and \xd8 are not 7-bit ASCII'
Notice that Python echoes these back to us as escape sequences too. This is just a
sequence of bytes, not a sequence of characters. If we want a sequence of
Unicode code points we must decode the bytes into a text sequence of the str
type, and for this we need to know the encoding. In this case I used Latin 1:
>>> norsk = b"Norwegian characters like \xc5 and \xd8 are not 7 bit ASCII"
>>> norsk.decode('latin1')
'Norwegian characters like Å and ø are not 7 bit ASCII'
>>> bytes(5)
b'\x00\x00\x00\x00\x00'
It’s up to you to ensure that the values are non-negative and less than 256 to
prevent a ValueError being raised:
>>> bytes([63, 127, 255, 511])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)
One option if you need to construct a bytes object by encoding a Unicode str
object is to use the two-argument form of the bytes constructor, which accepts a
str in the first argument and and encoding for the second:
Finally there is a class method fromhex() which is a factory method for creating
a bytes object from a string consisting of concatenated two digit hexadecimal
numbers:
>>> bytes.fromhex('54686520717569636b2062726f776e20666f78')
b'The quick brown fox'
Being mutable we can use any of the mutable sequence operations to modify the
bytearray, in place:
>>> pangram.upper()
bytearray(b'THE QUICK BROWN FOX JUMPS OVER THE LAZY GOD')
>>>
>>> words = pangram.split()
>>> words
[bytearray(b'The'), bytearray(b'quick'), bytearray(b'brown'), bytearray(b'fox'), bytearra\
y(b'jumps'), bytearray(b'over'), bytearray(b'the'), bytearray(b'lazy'), bytearray(b'god')\
]
>>>
>>> bytearray(b' ').join(words)
bytearray(b'The quick brown fox jumps over the lazy god')
#include <stdio.h>
struct Vector {
float x;
float y;
float z;
};
struct Color {
unsigned short int red;
unsigned short int green;
unsigned short int blue;
};
struct Vertex {
struct Vector position;
struct Color color;
};
if (file == NULL) {
return -1;
}
return 0;
}
As you can see, we declare a Vector to be comprised of three float values. It’s
important to realise that a C float is actually a single-precision floating point
value represented with 32 bits, which is different from a Python float, which is
a double-precision floating point value represented with 64 bits.
In the main() function, the program creates an array of four Vertex structures,
and writes them to a file called colors.bin before exiting.
We’ll now compile this C program into an executable. The details of how you do
this are heavily system dependent, and require that you at least have access to a
C99 compiler. On our macOS system with the XCode development tools
installed, we can simply use make from the command line:
$ make colorpoints
cc colorpoints.c -o colorpoints
def main():
with open('colors.bin', 'rb') as f:
buffer = f.read()
print(repr(items))
if __name__ == '__main__':
main()
In the main function, we open the file for read, being careful to remember to do
so in binary mode. We then use the read() method of file objects to read the
entire contents of the file in to a bytes object.
The leading @ character specifies that native byte order and alignment are to be
used. Other characters can be used to force particular byte orderings, such as <
for little-endian, and > for big-endian. It’s also possible to choose between native
and no alignment, a topic we’ll be revisiting shortly. If no byte-order character is
specified, then @ is assumed.
There are also code letters corresponding to all of the common C data types,
which are mostly variations on different precisions of signed and unsigned
integers, together with 32- and 64-bit floating point numbers, byte arrays,
pointers and fixed-length strings.
In our example, each of the three ‘f’ characters tells struct to expect a single-
precision C float, and each of the ‘H’ characters tells struct to expect an
unsigned short int, which is a 16-bit type.
We can see that struct.unpack_from returns a tuple. The reason our values
don’t look exactly the same as they did in the source code to our C program, is
because we specified values in decimal in the source, and the values we chose
are not representable exactly in binary. There has also been a conversion from
the single-precision C float to the double-precision Python float, which is why
the values we get back have so many more digits. Of course, the 16 bit unsigned
short int values from C can be represented exactly as Python int objects.
Tuple unpacking
One obvious improvement to our program, given that unpack_from() returns a
tuple, is to use tuple unpacking to place the values into named variables:
We can also shorten our format string slightly by using repeat counts. For
example 3f means the same as fff:
x, y, z, red, green, blue = struct.unpack_from('@3f3H', buffer)
That’s not a big win in this case, but it can be very useful for larger data series.
Reading all of the vertices
Finally, of course, we’d like to read all four Vertex structures from our file.
We’ll also make the example a bit more realistic by reading the data into Python
classes which are equivalents of the Vector, Color, and Vertex structs we had
in C. Here they are:
class Vector:
def __repr__(self):
return 'Vector({}, {}, {})'.format(self.x, self.y, self.z)
class Color:
def __repr__(self):
return 'Color({}, {}, {})'.format(self.red, self.green, self.blue)
class Vertex:
def __repr__(self):
return 'Vertex({!r}, {!r})'.format(self.vector, self.color)
We’ll also make a factory function to construct a type Vertex, which aggregates a
Vector and a Color, being careful to use an argument order that is compatible
with what we get back from the unpack function:
def make_colored_vertex(x, y, z, red, green, blue):
return Vertex(Vector(x, y, z),
Color(red, green, blue))
vertices = []
for x, y, z, red, green, blue in struct.iter_unpack('@3f3H', buffer):
vertex = make_colored_vertex(x, y, z, red, green, blue)
vertices.append(vertex)
pp(vertices)
In fact, we can unwind one our earlier refactorings, and simply unpack the tuple
directly into the arguments of make_colored_vertex() using extended call
syntax:
def main():
with open('colors.bin', 'rb') as f:
buffer = f.read()
vertices = []
for fields in struct.iter_unpack('@3f3H', buffer):
vertex = make_colored_vertex(*fields)
vertices.append(vertex)
pp(vertices)
In this code, fields will will be the tuple of three float and three int values
returned for each structure. Rather than unpacking into named variables, we use
extended call syntax to unpack the fields tuple directly into the arguments of
make_colored_vertex(). When we’ve accumulated all vertices into a list, we
pretty-print the resulting data structure. Let’s try it!:
$ python3 reader.py
Traceback (most recent call last):
File "examples/reader.py", line 59, in <module>
main()
File "examples/reader.py", line 52, in main
for fields in struct.iter_unpack('@3f3H', buffer):
struct.error: iterative unpacking requires a bytes length multiple of 18
When we run now, we get this output before the stack trace:
$ python3 reader.py
buffer: 80 bytes
b"\xd1\xb2OE\xd9\x11\xcdE\xed\x1c\x12F\xe0\x0bw\x861\xd4\x00\x00\xdb?\xeeE\xb2\xe3\x1eE||\
\x19F\xe0\x7f\xde\x14\x11\t\x00\x00\xe5N\xd2Ed\xb3\x12E'\xd5UE\xcf\xb0\xc3\x8d]\x8f\x00\x\
00\xf8\x80\xc6E\xc7\x81VE\xee\x8c\x18F\x9e\xdb\x04\x8f\xce+\x00\x00"
Traceback (most recent call last):
File "example/reader.py", line 56, in <module>
main()
File "example/reader.py", line 48, in main
for fields in struct.iter_unpack('@3f3H', buffer):
struct.error: iterative unpacking requires a bytes length multiple of 18
Our diagnostic print statements above are a good start, but it’s really awkward to
read the standard bytes representation, especially when it contains a mix of
ASCII characters and escape sequences. We can’t directly convert a binary
sequence into a readable hex string, but Python 3 has some tools in the standard
library to help, in the form of the binascii module, which contains the oddly
named hexlify() function. Let’s import it:
from binascii import hexlify
In this code, we hexlify then encode to an ASCII string. We then join() the
successive two-digit slices with spaces, using a range() expression with a step
of 2. We then print the hex pairs:
d1 b2 4f 45 d9 11 cd 45 ed 1c 12 46 e0 0b 77 86 31 d4 00 00 db 3f ee 45 b2 e3 1e 45 7c 7c\
19 46 e0 7f de 14 11 09 00 00 e5 4e d2 45 64 b3 12 45 27 d5 55 45 cf b0 c3 8d 5d 8f 00 0\
0 f8 80 c6 45 c7 81 56 45 ee 8c 18 46 9e db 04 8f ce 2b 00 00
This is a big improvement, but still leaves us counting bytes on the display. Let’s
precede our line of data with an integer count:
indexes = ' '.join(str(n).zfill(2) for n in range(len(buffer)))
print(indexes)
We generate integers using a range, convert then to strings and pad each number
with leading zeros to a width of two; good enough for the first hundred bytes of
our data.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
db 3f ee 45 b2 e3 1e 45 7c 7c 19 46 e0 7f de 14 11 09 00 00
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
e5 4e d2 45 64 b3 12 45 27 d5 55 45 cf b0 c3 8d 5d 8f 00 00
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
f8 80 c6 45 c7 81 56 45 ee 8c 18 46 9e db 04 8f ce 2b 00 00
Now we’ve got a useful way of viewing our bytes object, let’s get back to
diagnosing our problem of why we have 80 bytes rather than 72. Looking
carefully, we can see that the first bytes at indices 0 to 17 inclusive contain
legitimate data — and we know this to be the case because we decoded it earlier.
Looking at bytes 18 and 19 though, we see two zero bytes. From bytes 20 to 37,
we have another run of what looks like legitimate data, again followed by
another two zero bytes at indices 38 and 39.
This pattern continues to the end of the file. What we’re seeing is ‘padding’
added by the C compiler to align structures on four-byte boundaries. Our 18 byte
structure needs to be padded with two bytes to take it to 20 bytes which is
divisible by four. In order to skip this padding we can use ‘x’ which is the format
code for pad bytes. In this case there are two pad bytes per structure, so we can
add ‘xx’ to our format string:
vertices = []
for fields in struct.iter_unpack('@3f2Hxx', buffer):
vertex = make_colored_vertex(*fields)
vertices.append(vertex)
With this change in place, we can successfully read our structures from C into
Python!:
buffer: 80 bytes
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
d1 b2 4f 45 d9 11 cd 45 ed 1c 12 46 e0 0b 77 86 31 d4 00 00
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
db 3f ee 45 b2 e3 1e 45 7c 7c 19 46 e0 7f de 14 11 09 00 00
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
e5 4e d2 45 64 b3 12 45 27 d5 55 45 cf b0 c3 8d 5d 8f 00 00
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
f8 80 c6 45 c7 81 56 45 ee 8c 18 46 9e db 04 8f ce 2b 00 00
[Vertex(Vector(3323.176025390625, 6562.23095703125, 9351.2314453125),
Color(3040, 34423, 54321)),
Vertex(Vector(7623.98193359375, 2542.23095703125, 9823.12109375),
Color(32736, 5342, 2321)),
Vertex(Vector(6729.86181640625, 2347.2119140625, 3421.322021484375),
Color(45263, 36291, 36701)),
Vertex(Vector(6352.12109375, 3432.111083984375, 9763.232421875),
Color(56222, 36612, 11214))]
class Vector:
def __repr__(self):
return 'Vector({}, {}, {})'.format(self.x, self.y, self.z)
class Color:
def __repr__(self):
return 'Color({}, {}, {})'.format(self.red, self.green, self.blue)
class Vertex:
def __repr__(self):
return 'Vertex({!r}, {!r})'.format(self.vector, self.color)
def main():
with open('colors.bin', 'rb') as f:
buffer = f.read()
print("buffer: {} bytes".format(len(buffer)))
indexes = ' '.join(str(n).zfill(2) for n in range(len(buffer)))
print(indexes)
hex_buffer = hexlify(buffer).decode('ascii')
hex_pairs = ' '.join(hex_buffer[i:i+2] for i in range(0, len(hex_buffer), 2))
print(hex_pairs)
vertices = []
for fields in struct.iter_unpack('@3f3Hxx', buffer):
vertex = make_colored_vertex(*fields)
vertices.append(vertex)
pp(vertices)
if __name__ == '__main__':
main()
From this point onwards, we’ll no longer need the diagnostic print statements which helped us
get this far, so consider them removed from this point onwards.
So did we!
Eventually, we traced this mismatch to the fact that our Python interpreter —
which is itself implemented in C — and our little C program for writing vertices
to a file, were compiled using different C compilers with different structure
padding conventions. This just goes to show that when dealing with binary data
you need to be very careful if you want your programs to be portable between
systems and compilers.
Memory Views
Python has a built-in type called memoryview which wraps any existing,
underlying collection of bytes and which supports something called the buffer
protocol. The buffer protocol is implemented at the C level inside the Python
interpreter, and isn’t a protocol in the same sense that we use the word when
talking about the Python-level sequence and mapping protocols. In fact, the
memoryview type implements the Python-level sequence protocol, allowing us to
view the underlying byte buffer as a sequence of Python objects.
Our previous example required that we read the data from the file into a byte
array, and translate it with struct.unpack() into a tuple of numeric objects,
effectively duplicating data. We’re going to change that example now to use
memoryviews, avoiding the duplication.
We can construct memoryview instances by passing any object that supports the
buffer protocol C API to the constructor. The only built-in types which support
the buffer protocol are bytes and bytearray. We’ll construct a memory view
from the buffer just after our diagnostic print statements, with this line of code:
mem = memoryview(buffer)
vertices = []
for fields in struct.iter_unpack('@3f3Hxx', buffer):
vertex = make_colored_vertex(*fields)
vertices.append(vertex)
pp(vertices)
We use a call to the locals() built-in function to get a reference to the current
namespace. Now, when we run our program, we get a REPL prompt at which we
can access our new mem object:
$ python3 reader.py
>>> mem
<memory at 0x10214e5c0>
The memoryview object supports indexing, so retrieving the byte at index 21 and
converted to hexadecimal gives us 3f, as we might expect:
>>> hex(mem[21])
'0x3f'
We know that the bytes in the [12:18] slice represent three unsigned short
int values, so by passing ‘H’ to the cast() method we can interpret the values
that way:
>>> mem[12:18].cast('H')
<memory at 0x10214e688>
Notice that this also returns a memoryview, but this time one that knows the type
of its elements:
>>> mem[12:18].cast('H')[0]
3040
>>> mem[12:18].cast('H')[1]
34423
>>> mem[12:18].cast('H')[2]
54321
When you’ve finished with the interactive session, you can send an end-of-file
character just as you normally would to terminate a REPL session with Ctrl-D
on Unix or Ctrl-Z on Windows; your program will then continue executing from
the first statement after code.interact().
Before moving on, remove the code.interact() line, so our program runs uninterrupted
again.
@property
def x(self):
return self._mem[0]
@property
def y(self):
return self._mem[1]
@property
def z(self):
return self._mem[2]
def __repr__(self):
return 'Vector({}, {}, {})'.format(self.x, self.y, self.z)
Our old instance attributes are replaced by properies which perform the
appropriate lookups in the memoryview. Since we can use properties to replace
attributes, our __repr__() implementation can remain unmodified.
The modified Color class works exactly the same way, except now we check
that we’re wrapping unsigned integer types:
class Color:
@property
def red(self):
return self._mem[0]
@property
def green(self):
return self._mem[1]
@property
def blue(self):
return self._mem[2]
def __repr__(self):
return 'Color({}, {}, {})'.format(self.red, self.green, self.blue)
Our Vertex class, which simply combines a Vector and Color, can remain as
before, although our make_colored_vertex() factory function needs to be
changed to accept a memoryview — specifically one that is aligned with the
beginning of a Vertex structure:
def make_colored_vertex(mem_vertex):
mem_vector = mem_vertex[0:12].cast('f')
mem_color = mem_vertex[12:18].cast('H')
return Vertex(Vector(mem_vector),
Color(mem_color))
The function now slices the vertex memoryview into two parts, for the vector and
color respectively, and casts each to a typed memoryview. These are used to
construct the Vector and Color objects, which are then passed on to the Vertex
constructor.
Back in our main function after our creation of the mem instance, we’ll need to
rework our main loop. We’ll start by declaring a couple of constants describing
the size of a Vertex structure, and the stride between successive Vertex
structures, this allows us to take account of the two padding bytes between
structures:
VERTEX_SIZE = 18
VERTEX_STRIDE = VERTEX_SIZE + 2
This time, rather than an explicit for-loop to build the list of vertices, we’ll use a
list comprehension to pass each vertex memory view in turn to
make_colored_vertex():
Running this program, we can see we get exactly the same results as before,
except that now our Vector and Color objects are backed by the binary data we
loaded from the file, with much reduced copying.
Here’s the complete program as it now stands:
from binascii import hexlify
from pprint import pprint as pp
import struct
class Vector:
@property
def x(self):
return self._mem[0]
@property
def y(self):
return self._mem[1]
@property
def z(self):
return self._mem[2]
def __repr__(self):
return 'Vector({}, {}, {})'.format(self.x, self.y, self.z)
class Color:
@property
def red(self):
return self._mem[0]
@property
def green(self):
return self._mem[1]
@property
def blue(self):
return self._mem[2]
def __repr__(self):
return 'Color({}, {}, {})'.format(self.red, self.green, self.blue)
class Vertex:
def make_colored_vertex(mem_vertex):
mem_vector = mem_vertex[0:12].cast('f')
mem_color = mem_vertex[12:18].cast('H')
return Vertex(Vector(mem_vector),
Color(mem_color))
VERTEX_SIZE = 18
VERTEX_STRIDE = VERTEX_SIZE + 2
def main():
with open('colors.bin', 'rb') as f:
buffer = f.read()
mem = memoryview(buffer)
VERTEX_SIZE = 18
VERTEX_STRIDE = VERTEX_SIZE + 2
vertices = [make_colored_vertex(vertex_mem)
for vertex_mem in vertex_mems]
pp(vertices)
if __name__ == '__main__':
main()
Memory-mapped files
There’s still one copy happening though: the transfer of bytes from the file into
our buffer bytes object. This is not a problem for our trifling 80 bytes, but for
very large files this could be prohibitive.
By using an operating system feature called memory-mapped files we can use the
virtual memory system to make large files appear as if they are in memory.
Behind the scenes the operating system will load, discard and sync pages of data
from the file. The details are operating system dependent, but the pages are
typically only 4 kilobytes in size, so this can be memory efficient if you need
access to relatively small parts of large files.
We then need to modify our main() function to retrieve the file handle or
descriptor, passing it to the mmap class constructor. Like other file-like objects,
mmaps must be closed when we’re done with them. We can either call the
close() method explicity or, more conveniently, use the mmap object as a context
manager. We’ll go with the latter. Here’s the resulting main() method:
def main():
with open('colors.bin', 'rb') as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as buffer:
mem = memoryview(buffer)
VERTEX_SIZE = 18
VERTEX_STRIDE = VERTEX_SIZE + 2
vertices = [make_colored_vertex(vertex_mem)
for vertex_mem in vertex_mems]
pp(vertices)
The only difference here is that buffer is now a memory-mapped file rather than
bytes sequence as it was previously. We’ve avoided reading the file into
memory twice — once into the operating system file cache and once more into
our own collection — by directly working on the operating system’s view of the
file.
The cause is that at the point the mmap object is closed we still have a chain of
extant memoryview objects which ultimately depend on the mmap. A reference
counting mechanism in the buffer-protocol has tracked this, and knows that the
mmap still has memoryview instances pointing to it.
There are a couple of approaches we could take here. We could arrange for the
memoryview.release() method to be called on the memoryview objects inside
our Vector and Color instances. This method deregisters the memoryview from
any underlying buffers and invalidates the memoryview so any further operations
with it raise a ValueError. This would just move the problem though, now we’d
have zombie Vector and Color instances containing invalid memoryviews.
Better, we think, to respect the constraint in our design that the lifetime of our
memory-mapped file-backed objects must be shorter than the lifetime of our
memory mapping.
Thinking about our live object-graph, there are two local variables which
ultimately hold references to the memory-mapped file: mem which is our lowest
level memoryview, and vertices which is the list of Vertex objects.
By explicitly removing these name bindings, using two invocations of the del
statement, we can clean up the memoryviews, so the memory map can be torn
down safely:
del mem
del vertices
With this change in place, the main function looks like this:
def main():
with open('colors.bin', 'rb') as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as buffer:
mem = memoryview(buffer)
VERTEX_SIZE = 18
VERTEX_STRIDE = VERTEX_SIZE + 2
vertices = [make_colored_vertex(vertex_mem)
for vertex_mem in vertex_mems]
del vertices
del mem
And our program runs flawlessly, with minimal memory usage, even for huge
sets of vertex objects:
[Vertex(Vector(3323.176025390625, 6562.23095703125, 9351.2314453125),
Color(3040, 34423, 54321)),
Vertex(Vector(7623.98193359375, 2542.23095703125, 9823.12109375),
Color(32736, 5342, 2321)),
Vertex(Vector(6729.86181640625, 2347.2119140625, 3421.322021484375),
Color(45263, 36291, 36701)),
Vertex(Vector(6352.12109375, 3432.111083984375, 9763.232421875),
Color(56222, 36612, 11214))]
Summary
Let’s summarize what we’ve covered in this chapter:
def __repr__(self):
return "{}({}, {})".format(
type(self).__name__, self.x, self.y)
In the list returned by dir() we see the two named attributes x and y along with
many of Python’s special attributes, quite a few of which we’ve explained
previously in The Python Apprentice and The Python Journeyman. One attribute
in particular is of interest to us today, and that is __dict__. Let’s see what it is:
>>> v.__dict__
{'x': 5, 'y': 3}
As its name indicates, __dict__ is indeed a dictionary, one which contains the
names of our object’s attributes as keys, and the values of our object’s attributes
as, well, values. Here’s further proof, if any were needed, that __dict__ is a
Python dictionary:
>>> type(v.__dict__)
<class 'dict'>
Modify values:
>>> v.__dict__['x'] = 17
>>> v.x
17
Although all of these direct queries and manipulations of __dict__ are possible,
for the most part you should prefer to use the built-in functions getattr(),
hasattr(), delattr() and setattr():
Direct access to __dict__ does have legitimate uses though, so it’s essential to
be aware of its existence, and how and when to use it for advanced Python
programming.
Our Vector class, like most vector classes, has hardwired attributes called x and
y to store the two components of the vector.
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(
k=k,
v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
In this code, we accept a keyword argument, which is received into the coords
dictionary, the contents of which we use to update the entries in __dict__.
Remember that dictionaries are unordered8, so there’s no way to ensure that the
coordinates are stored in the order they are specified. Our __repr__()
implementation must iterate over the dictionary, sorting by key, for convenience.
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(
k=k[1:],
v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
But now the attributes are stored in “private” attributes called _p and _q:
>>> dir(v)
['__class__', '__delattr__', '__dict__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_p', '_q']
Here’s our Vector definition with __getattr__() added. We’ve just added a
simple stub that prints the attribute name:
class Vector:
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
But when we request an attribute that exists, we simply get the attribute value,
indicating that __getattr__ isn’t being called:
>>> v._q
9
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
but there are some serious problems lurking here. The first, is that we can still
assign to p and q:
>>> v.p = 13
Remember, there wasn’t really an attribute called p, but Python has no qualms
about creating it for us on demand. Worse, because we have unwittingly brought
p into existence, __getattr__() is not longer invoked for requests of p, even
though our hidden attribute _p is still there behind the scenes, with a different
value:
>>> v.p
13
>>> v._p
5
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
Look what happens when we try to access an attribute for which there is no
faked support:
>>> v.x
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "examples/vector/vector.py", line 13, in __getattr__
return getattr(self, private_name)
File "examples/vector/vector.py", line 13, in __getattr__
return getattr(self, private_name)
File "examples/vector/vector.py", line 13, in __getattr__
return getattr(self, private_name)
...
File "examples/vector/vector.py", line 13, in __getattr__
return getattr(self, private_name)
RuntimeError: maximum recursion depth exceeded while calling a Python object
This happens because our request for attribute x causes __getattr__() to look
for an attribute _x, which doesn’t exist, which invokes __getattr__() again, to
lookup attribute __x, which doesn’t exist. And so on recursively, until the Python
interpreter exceeds its maximum recursion depth and raises a RuntimeError.
To prevent this happening, you might be tempted to check for existence of the
private attribute using hasattr(), like this:
class Vector:
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
Unfortunately, this doesn’t work either, since it turns out that hasattr() also
ultimately calls __getattr__() in search of the attribute! What we need to do, is
directly check for the presence of our attribute in __dict__:
class Vector:
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
Giving:
>>> v = Vector(p=1, q=2)
>>> v
Vector(p=1, q=2)
>>> v.p
1
>>> v.x
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "examples/vector/vector.py", line 16, in __getattr__
raise AttributeError('{!r} object has no attribute {!r}'.format(type(self).__name__, \
name))
AttributeError: 'Vector' object has no attribute 'x'
Of course, the EAFP version has the same behaviour as the LBYL version.
or even:
>>> del v._q
>>>
>>> v
Vector()
def __repr__(self):
return "{}({})".format(
type(self).__name__,
', '.join("{k}={v}".format(k=k[1:], v=self.__dict__[k])
for k in sorted(self.__dict__.keys())))
Here’s how that code behaves when trying do delete either public or private
attributes:
>>> v = Vector(p=9, q=12)
>>>
>>> v
Vector(p=9, q=12)
>>>
>>> del v.q
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "example/vector/vector.py", line 19, in __delattr__
raise AttributeError("Can't delete attribute {!r}".format(name))
AttributeError: Can't delete attribute 'q'
>>> del v._q
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "example/vector/vector.py", line 19, in __delattr__
raise AttributeError("Can't delete attribute {!r}".format(name))
AttributeError: Can't delete attribute '_q'
>>> cv
ColoredVector(p=9, q=14, olor=[50, 44, 238])
This method override works by removing the attribute name ‘color’ from the list
of keys before using the same logic as the superclass to produce the string of
sorted co-ordinates. The color channel values are accessed in the normal way,
which will invoke __getattr__().
It’s worth bearing in mind that this example demonstrates the awkwardness of
inheriting from classes which were not deliberately designed as base classes. Our
code serves it’s purpose in demonstrating customised attribute access, but we
couldn’t recommend such use of inheritance in production code.
we can write:
vars(obj)
Arguably, this is more Pythonic than accessing __dict__ directly, for much the
same reason that calling:
len(collection)
That said, the length returned by len() is always immutable, whereas __dict__
is a mutable dictionary. In our opinion it us much clearer that the internal state of
an object is being modified when we directly modify the __dict__ attribute, like
this:
self.__dict__['color'] = [red, green, blue]
Whichever you use – and we don’t feel strongly either way – you should be
aware of this use of vars() with an argument.
Overriding __getattribute__()
Recall that __getattr__() is called only in cases when “normal attribute lookup
fails”; it is our last chance for us to intervene before the Python runtime raises an
AttributeError. But what if we want to intercept all attribute access? In that
case, we can override __getattribute__(). I use the term “override” advisedly,
because it is the implementation of __getattribute__() in the ultimate base
class object that is responsible for the normal lookup behaviour, including
calling __getattr__(). This level of control is seldom required, and you should
always consider whether __getattr__() is sufficient for your needs. That said,
__getattribute__() does have its uses.
try:
value = getattr(target, name)
except AttributeError as e:
raise AttributeError("{} could not forward request {} to {}"
.format(
super().__getattribute__('__class__').__name__,
name,
target)
) from e
print("Retrieved attribute {!r} = {!r} from {!r}"
.format(name, value, target))
return value
So far, so good. But what happens when we write to an attribute through the
proxy? In this example both writes appear to be accepted without error, although
only one of them should be:
>>> cw.p = 19
>>> cw.red = 5
What’s happening here is that our attribute writes to the cw proxy are invoking
__setattr__() on the object base class, which is actually creating new
attributes in the LoggingProxy instance __dict__. However, reads through the
proxy correctly bypass this __dict__ and are redirected to the target. In effect,
the proxy __dict__ has become write-only!
try:
value = getattr(target, name)
except AttributeError as e:
raise AttributeError("{} could not forward request {} to {}"
.format(
super().__getattribute__('__class__').__name__,
name,
target)
) from e
print("Retrieved attribute {!r} = {!r} from {!r}"
.format(name, value, target))
return value
try:
setattr(target, name, value)
except AttributeError as e:
raise AttributeError("{} could not forward request {} to {}"
.format(
super().__getattribute__('__class__').__name__,
name,
target)
)
print("Set attribute {!r} = {!r} on {!r}"
.format(name, value, target))
If we call the __repr__() method directly, the call is routed via the proxy and is
dispatched successfully:
>>> cw.__repr__()
Retrieved attribute '__repr__' = <bound method ColoredVector.__repr__ of ColoredVector(re\
d=39, green=22, blue=89, s=45, t=12)> from ColoredVector(red=39, green=22, blue=89, s=45,\
t=12)
'ColoredVector(red=39, green=22, blue=89, s=45, t=12)'
What this means in practice, is that if you want to write a proxy object such as
LoggingProxy which transparently proxies an object including its repr or other
special methods, it’s up to you to provide an implementation of __repr__() that
forwards the call appropriately:
def __repr__(self):
target = super().__getattribute__('target')
repr_callable = getattr(target, '__repr__')
return repr_callable()
This now works when called via the built-in repr() function:
>>> from vector import *
>>> from loggingproxy import *
>>>
>>> cv = ColoredVector(red=39, green=22, blue=89, s=45, t=12)
>>> cv
ColoredVector(red=39, green=22, blue=89, s=45, t=12)
>>>
>>> cw = LoggingProxy(cv)
>>>
>>> repr(cw)
'ColoredVector(red=39, green=22, blue=89, s=45, t=12)'
The answer is that methods are attributes of another object, the class object
associated with our instance. As we already know, we can get to the class object
via the __class__ attribute, and sure enough it too has a __dict__ attribute
which contains references to the callable objects which are the manifestations of
the methods of our class.
Of course we can retrieve the callable and pass our instance to it:
>>> v.__class__.__dict__['__repr__'](v)
'Vector(x=3, y=7)'
It’s well worth spending some experimental time on your own poking around
with these special attributes to get a good sense of how the Python object model
hangs together.
It’s worth noting that the __dict__ attribute of a class object is not a regular
dict, but is instead of type mappingproxy, a special mapping type used
internally in Python, which does not support item assignment:
>>> v.__class__.__dict__['a_vector_class_attribute'] = 5
TypeError: 'mappingproxy' object does not support item assignment
The machinery of setattr() knows how to insert attributes into the class
dictionary:
>>> v.__class__.__dict__
mappingproxy({'__weakref__': <attribute '__weakref__' of 'Vector' objects>,
'__setattr__': <function Vector.__setattr__ at 0x101a5fa60>,
'a_vector_class_attribute': 5, '__doc__': None, '__repr__':
<function Vector.__repr__ at 0x101a5fb70>, '__init__':
<function Vector.__init__ at 0x101a5f620>, '__getattr__':
<function Vector.__getattr__ at 0x101a5f6a8>, '__module__':
'vector', '__dict__': <attribute '__dict__' of 'Vector' objects>,
'__delattr__': <function Vector.__delattr__ at 0x101a5fae8>})
Slots
We’ll finish off this part of the course with a brief look at a mechanism in Python
for reducing memory use: slots. As we’ve seen, each and every object stores its
attributes in a dictionary. Even an empty Python dictionary is quite a hefty
object, weighing in at 288 bytes:
>>> d = {}
>>> import sys
>>> sys.getsizeof(d)
288
If you have thousands or millions of objects this quickly adds up, causing your
programs to need megabytes or gigabytes of memory. Given contemporary
computer architectures this tends to lead to reduced performance as CPU caches
can hold relatively few objects.
Techniques to solve the high memory usage of Python programs can get pretty
involved, such as implementing Python objects in lower-level languages such as
C or C++, but fortunately Python provides the slots mechanism which can
provide some big wins for low effort, with tradeoffs that will be acceptable in
most cases.
Let’s take a look! Consider the following class to describe the type of electronic
component called a resistor:
class Resistor:
It’s difficult to determine the size in memory of Python objects, but with care,
we can use the getsizeof() function in the sys module. To get the size of an
instance of Resistor, we need to account for the size of the Resistor object
itself, and the size of its __dict__:
>>> from resistor import *
>>> r10 = Resistor(10, 5, 0.25)
>>>
>>> import sys
>>> sys.getsizeof(r10) + sys.getsizeof(r10.__dict__)
152
This a quite a big object – especially when you consider that the equivalent
struct in the C programming language would weight in at no more than 64
bytes with very generous precision on the number types.
Let’s see if we can improve on this using slots. To use slots we must declare a
class attribute called __slots__ to which we assign a sequence of strings
containing the fixed names of the attributes we want all instances of the class to
contain:
class Resistor:
Now let’s look at the space performance of this new class. We can instantiate
Resistor just as before:
However, it’s size is much reduced, from 152 bytes down to 64 bytes, less than
half the size:
>>> import sys
>>> sys.getsizeof(r10)
64
For most applications, slots won’t be required, and you shouldn’t use them
unless measurements indicate that they may help, as slots can interact with other
Python features and diagnostic tools in surprising ways. In an ideal world, slots
wouldn’t be necessary and in our view they’re quite an ugly language feature,
but at the same time we’ve worked on applications where the simple addition of
a __slots__ attribute has made the difference between the pleasure of
programming in Python, and the pain of programming in a lower-level, but more
efficient language. Use wisely!
Summary
Let’s summarise what we’ve covered in this chapter:
A review of properties
As promised, we’ll start with a very brief review of properties, our entry point
into the world of descriptors. To explain descriptors, we’ll be building a simple
class to model planets, focusing on particular physical attributes such as size,
mass, and temperature.
Let’s start with this basic class definition for a planet in planet.py, consisting of
little more than an initializer. There are no properties here yet, we’ll add them in
a moment:
# planet.py
class Planet:
def __init__(self,
name,
radius_metres,
mass_kilograms,
orbital_period_seconds,
surface_temperature_kelvin):
self.name = name
self.radius_metres = radius_metres
self.mass_kilograms = mass_kilograms
self.orbital_period_seconds = orbital_period_seconds
self.surface_temperature_kelvin = surface_temperature_kelvin
This is simple enough to use9:
>>> pluto = Planet(name='Pluto', radius_metres=1184e3,
... mass_kilograms=1.305e22, orbital_period_seconds=7816012992,
... surface_temperature_kelvin=55)
>>> pluto.radius_metres
1184000.0
# planet.py
class Planet:
def __init__(self,
name,
radius_metres,
mass_kilograms,
orbital_period_seconds,
surface_temperature_kelvin):
self.name = name
self.radius_metres = radius_metres
self.mass_kilograms = mass_kilograms
self.orbital_period_seconds = orbital_period_seconds
self.surface_temperature_kelvin = surface_temperature_kelvin
@property
def name(self):
return self._name
@name.setter
def name(self, value):
if not value:
raise ValueError("Cannot set empty Planet.name")
self._name = value
@property
def radius_metres(self):
return self._radius_metres
@radius_metres.setter
def radius_metres(self, value):
if value <= 0:
raise ValueError("radius_metres value {} is not "
"positive.".format(value))
self._radius_metres = value
@property
def mass_kilograms(self):
return self._mass_kilograms
@mass_kilograms.setter
def mass_kilograms(self, value):
if value <= 0:
raise ValueError("mass_kilograms value {} is not "
"positive.".format(value))
self._mass_kilograms = value
@property
def orbital_period_seconds(self):
return self._orbital_period_seconds
@orbital_period_seconds.setter
def orbital_period_seconds(self, value):
if value <= 0:
raise ValueError("orbital_period_seconds value {} is not "
"positive.".format(value))
self._orbital_period_seconds = value
@property
def surface_temperature_kelvin(self):
return self._surface_temperature_kelvin
@surface_temperature_kelvin.setter
def surface_temperature_kelvin(self, value):
if value <= 0:
raise ValueError("surface_temperature_kelvin value {} is not "
"positive.".format(value))
self._surface_temperature_kelvin = value
From a robustness standpoint, this code is much better. For example, we can no
longer construct massless planets:
>>> planet_x = Planet(name='X', radius_metres=10e3, mass_kilograms=0,
... orbital_period_seconds=-7293234, surface_temperature_kelvin=-5)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "examples/descriptors/properties.py", line 12, in __init__
self.mass_kilograms = mass_kilograms
File "examples/descriptors/properties.py", line 43, in mass_kilograms
raise ValueError("mass_kilograms value {} is not positive.".format(value))
ValueError: mass_kilograms value 0 is not positive.
The trade-off though, is that the amount of code has exploded, and worse, there
is a lot of duplicated code checking that all those numeric attribute values are
non-negative.
Descriptors will ultimately provide a way out of this, but first we need to do a
little more unravelling of properties to aid our understanding.
The property object also has an attribute called setter which is in fact another
decorator. When a setter method is decorator by the setter, the original property
object is modifed to bind an attribute called fset to the setter method.
The property object effectively aggregates the getter and setter into a single
property object which behaves like an attribute, and it behaves like an attribute
because it is a descriptor. Shortly, we’ll learn how it is able to appear so
attribute-like, but first let’s unravel properties a bit more.
Remember that function decorators are just regular functions which process an
existing function and return a new object — usually a new function which wraps
the decorated function.
Given that decorators are functions, let’s rework our code to apply property
explicitly using regular function-call syntax, avoiding the special decorator
application syntax using the @ symbol. When doing this, it’s important to note
that the property() function supports several arguments for simultaneously
supplying the getter, setter and deleter, functions along with a docstring value. In
fact, help(property) makes this quite clear:
>>> help(property)
class property(object)
| property(fget=None, fset=None, fdel=None, doc=None) -> property attribute
|
| fget is a function to be used for getting an attribute value, and likewise
| fset is a function for setting, and fdel a function for del'ing, an
| attribute. Typical use is to define a managed attribute x:
|
| class C(object):
| def getx(self): return self._x
| def setx(self, value): self._x = value
| def delx(self): del self._x
| x = property(getx, setx, delx, "I'm the 'x' property.")
|
| Decorators make defining new properties or modifying existing ones easy:
|
| class C(object):
| @property
| def x(self):
| "I am the 'x' property."
| return self._x
| @x.setter
| def x(self, value):
| self._x = value
| @x.deleter
| def x(self):
| del self._x
As you can see, we can separately define our getter and setter functions, then call
the property() constructor within the class definition to produce a class
attribute. Let’s use this form in our Planet class for the numerical attributes.
We’ll leave the name attribute using the decorator form so you can compare
side-by-side.
class Planet:
def __init__(self,
name,
radius_metres,
mass_kilograms,
orbital_period_seconds,
surface_temperature_kelvin):
self.name = name
self.radius_metres = radius_metres
self.mass_kilograms = mass_kilograms
self.orbital_period_seconds = orbital_period_seconds
self.surface_temperature_kelvin = surface_temperature_kelvin
@property
def name(self):
return self._name
@name.setter
def name(self, value):
if not value:
raise ValueError("Cannot set empty Planet.name")
self._name = value
def _get_radius_metres(self):
return self._radius_metres
radius_metres = property(fget=_get_radius_metres,
fset=_set_radius_metres)
def _get_mass_kilograms(self):
return self._mass_kilograms
mass_kilograms = property(fget=_get_mass_kilograms,
fset=_set_mass_kilograms)
def _get_orbital_period_seconds(self):
return self._orbital_period_seconds
orbital_period_seconds = property(fget=_get_orbital_period_seconds,
fset=_set_orbital_period_seconds)
def _get_surface_temperature_kelvin(self):
return self._surface_temperature_kelvin
surface_temperature_kelvin = property(fget=_get_surface_temperature_kelvin,
fset=_set_surface_temperature_kelvin)
If you think this form of property set up is a retrograde step compared to the
decorator form, we’d agree with you; there’s a good reason we introduce
properties as decorators first. Nevertheless, seeing this alternative style
reinforces the notion that property is simply a function, which returns an object
called a descriptor, which is in turn bound to a class attribute.
The runtime behaviour of this code hasn’t changed at all. We can still create
objects, retrieve attribute values through properties, and attempt to set attributes
values through properties with rejection of nonsensical values:
>>> pluto = Planet(name='Pluto', radius_metres=1184e3,
... mass_kilograms=1.305e22, orbital_period_seconds=7816012992,
... surface_temperature_kelvin=55)
>>> pluto.radius_metres
1184000.0
>>> pluto.radius_metres = -13
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "examples/descriptors/properties.py", line 31, in _set_radius_metres
raise ValueError("radius_metres value {} is not positive.".format(value))
ValueError: radius_metres value -13 is not positive.
Implementing a descriptor
We’ve seen that property is a descriptor which wraps three functions. Let’s
create a more specialised descriptor useful for modelling the strictly positive
numeric values in our Planet class.
class Positive:
def __init__(self):
self._instance_data = WeakKeyDictionary()
The descriptor class implements the three functions which comprise the
descriptor protocol: __get__(), __set__(), and __delete__(). These are called
when we get a value from a descriptor, set a value through a descriptor, or delete
a value through a descriptor, respectively. In addition, the Positive class
implements __init__() to configure new instances of the descriptor. Before we
look in more detail at each of these methods, let’s make use of our new
descriptor to refactor our Planet class.
We remove the setters and getters for radius_meters and replace the call to the
property constructor with a call to the Positive constructor. We do the same
for the mass_kilograms, orbital_period_seconds and
surface_temperature_kelvin quantities:
class Planet:
def __init__(self,
name,
radius_metres,
mass_kilograms,
orbital_period_seconds,
surface_temperature_kelvin):
self.name = name
self.radius_metres = radius_metres
self.mass_kilograms = mass_kilograms
self.orbital_period_seconds = orbital_period_seconds
self.surface_temperature_kelvin = surface_temperature_kelvin
@property
def name(self):
return self._name
@name.setter
def name(self, value):
if not value:
raise ValueError("Cannot set empty Planet.name")
self._name = value
radius_metres = Positive()
mass_kilograms = Positive()
orbital_period_seconds = Positive()
surface_temperature_kelvin = Positive()
With the Positive descriptor on hand, the Planet class shrinks by a huge
amount!
At first sight, this may appear confusing. It looks like we’re assigning to
radius_metres twice — once in the initializer and once in the body of the class.
In fact, the call in the body of the class is binding a instance of a Positive
descriptor object to a class attribute of Planet. The call in __init__ is then
apparently assigning to an instance attribute — although as we’ll see in a
moment, this assignment is actually invoking a method on the descriptor object.
At this point the new Planet instance isn’t yet bound to the name pluto, only to
self.
At first, this seems easy. The __get__() call is handed a reference to the
instance - pluto in this case - so why not store the value in pluto’s __dict__?
This apparent shortcoming of descriptors, not knowing the name of the class attribute to which
they are bound, is also evident in the fact that our value validation error message in __set__ no
longer mentions the attribute name; this is a clear regression in capabilities. This is fixable, but
the solution will have to wait to the next chapter when we look at metaclasses.
So, how to associate the values with this instances? Let’s look at the solution
pictorially first, then we’ll reiterate with the code.
We use a special collection type from the Python Standard Library’s weakref
package called WeakKeyDictionary. This works pretty much like a regular
dictionary except that it won’t retain value objects which are referred to only by
the dictionary key references; we say the references are weak. A
WeakKeyDictionary owned by each descriptor instance is used to associate
planet instances with the values of the quantity represented by that descriptor —
although the descriptor itself doesn’t know which quantity is being represented.
Because the dictionary keys are weak references, if a planet instance is destroyed
— let’s pretend the Earth is vapourised to make way for a hyperspace bypass —
the corresponding entries in all the weak-key-dictionaries are also removed.
Examining the implementation
Let’s look at how this is all put together in code. A WeakKeyDictionary instance
called _instance_data is created in the descriptor initializer:
class Positive:
def __init__(self):
self._instance_data = WeakKeyDictionary()
As a result, our Planet class indirectly aggregates four such dictionaries, one in
each of the four descriptors:
class Planet:
# . . .
radius_metres = Positive()
mass_kilograms = Positive()
orbital_period_seconds = Positive()
surface_temperature_kelvin = Positive()
Within the __set__() method we associate the attribute value with the planet
instance by inserting a mapping from the instance as key to the attribute as
value:
class Positive:
. . .
def __set__(self, instance, value):
if value <= 0:
raise ValueError("Value {} is not positive".format(value))
self._instance_data[instance] = value
As such, a single dictionary will contain all the radius_metres values for all
Planet instances. Another dictionary will contain all mass_kilograms values for
all Planet instances, and so on. We’re storing the instance attribute values
completely outside the instances, but in such a way that we can reliably retrieve
them in __get__.
In such cases, the instance argument of __get__() will be set to None. Because
we cannot create a weak reference to None, this causes a failure with the
WeakKeyDictionary used for attribute storage.
By testing the instance argument against None we can detect when a descriptor
value is being retrieved via a class attribute.
def __init__(self):
self._instance_data = WeakKeyDictionary()
def __get__(self, instance, owner):
if instance is None:
return self
return self._instance_data[instance]
If you need to query or manipulate the class which contains the descriptor
objects you can get hold of this through the owner argument of the __get__
method, which in this case will contain a reference to the Planet class. In many
cases though, you won’t need to use owner, so you can do as we’ve done and just
ignore it.
class DataDescriptor:
class NonDataDescriptor:
class Owner:
a = DataDescriptor()
b = NonDataDescriptor()
Let’s try this in a REPL session. After we’ve created an instance of Owner, we’ll
retrieve the attribute a, set an item in the instance dictionary with the same name,
and retrieve a again:
>>> from descriptors import *
>>> obj = Owner()
>>> obj.a
DataDescriptor.__get__(
<precedence.DataDescriptor object at 0x102071748>,
<precedence.Owner object at 0x102071550>,
<class 'precedence.Owner'>)
>>> obj.__dict__['a'] = 196883
>>> obj.a
DataDescriptor.__get__(
<precedence.DataDescriptor object at 0x102071748>,
<precedence.Owner object at 0x102071550>,
<class 'precedence.Owner'>)
Since a is a data descriptor, the first rule applies and the data descriptor takes
precedence when we reference obj.a.
Now let’s try that with the non-data attribute b:
>>> obj.b
NonDataDescriptor.__get__(
<precedence.NonDataDescriptor object at 0x102071780>,
<precedence.Owner object at 0x102071550>,
<class 'precedence.Owner'>)
>>> obj.__dict__['b'] = 744
>>> obj.b
744
The first time we access obj.b there is no entry of the same name in the instance
dictionary. As a result, the non-data descriptor takes precedence. After we’ve
added a b entry into __dict__, the second rule applies and the dictionary entry
takes precedence over the non-data descriptor.
Summary
We have seen that the descriptor protocol is itself very simple, which can lead to
very concise and declarative code, which hugely reduces duplication. At the
same time, implementing descriptors correctly can be tricky and requires careful
testing.
Instance Creation
So what does happen when you create an object?
To illustrate, we’ll an 8×8 chess board. Consider this simple class which
represents a coordinate on a chess board consisting of a file (column) letter from
‘a’ to ‘h’ inclusive and a rank (row) number from 1 to 8 inclusive:
class ChessCoordinate:
if len(file) != 1:
raise ValueError("{} component file {!r} does not have a length of one."
.format(type(self).__name__, file))
self._file = file
self._rank = rank
@property
def file(self):
return self._file
@property
def rank(self):
return self._rank
def __repr__(self):
return "{}(file={}, rank={})".format(
type(self).__name__, self.file, self.rank)
def __str__(self):
return '{}{}'.format(self.file, self.rank)
This class implements an immutable value type; the initialiser establishes the
invariants and the property accessors prevent inadvertent modification of the
encapsulated data.
Note also that __init__() doesn’t return anything; it simply mutates the
instance it has been given.
>>> dir(ChessCoordinate)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'file',
'rank']
But what is the signature of __new__()? Don’t bother looking in the help()
because, frankly, the answer isn’t very helpful!:
>>> help(object.__new__)
Help on built-in function __new__:
if len(file) != 1:
raise ValueError("{} component file {!r} does not have a length of one."
.format(type(self).__name__, file))
self._file = file
self._rank = rank
Notice that __new__() appears to be implicitly a class method. It accepts cls as it’s first
argument, rather than self. In fact __new__() is a specially-cased static method that happens to
take the type of the class as its first argument — but the distinction isn’t important here.
The cls argument is the class of the new object which will be be allocated. In
our case, that will be ChessCoordinate, but in the presence of inheritance, it
isn’t necessarily the case that the cls argument will be set to the class enclosing
the __new__() definition.
In general, besides the class object, __new__() also accepts whatever arguments
have been passed to the constructor. In this case, we’ve soaked up any such
arguments using *args and **kwargs although we could have used specific
argument names here, just as we have with __init__(). We print these
additional argument values to the console.
Remember that the purpose of __new__() is to allocate a new object. There’s no
special command for that in Python — all object allocation must ultimately be
done by object.__new__(). Rather than call the object.__new__()
implementation directly, we’ll call it via super(). Should our immediate base
class change in the future, this is more maintainable. The return value from the
call to object.__new__() is a new, uninitialised instance of ChessCoordinate.
We print its id() (remember we can’t expect repr() to work yet) and then
return.
Customising allocation
We’ve shown the essential mechanics of overriding __new__(): we accept the
class type and the constructor arguments and return an instance of the correct
type. Ultimately the only means we have for creating new instances is by calling
object.__new__(). This is all well and good, but what are some practical uses?
One use for controlling instance creation is a technique called interning, which
can dramatically reduce memory consumption. We’ll demonstrate this by
extending our program to allocate some chess boards in the start-of-game
configuration. In our implementation each board is represented as a dictionary
mapping the name of a piece to one of our ChessCoordinate objects. For fun,
we’ve used the Unicode chess code points to represent our pieces. In our
program '♛♜' means “black queen’s rook” and '♔♗♙' mean “white king’s
bishop’s pawn”. We need to be this specific because, as you’ll remember,
dictionaries require that keys are distinct.
Let’s go to the REPL and create a board using our new function:
>>> board = starting_board()
After creating a single board this way, our system reports that our Python
process is using about 7 MB of memory.10 Creating 10,000 chess-boards utilises
some 75 MB of memory to store the 320,000 instances of ChessCoordinate
contained by the 10,000 dictionaries that have been created to represent all the
boards:
>>> boards = [starting_board() for _ in range(10000)]
Bear in mind though, that there are only 64 distinct positions on a chess board,
and given that our ChessCoordinate objects are deliberately immutable value
types, we should never need more than 64 instances. In our specific case, in fact,
we should never need more that 32 instances.
_interned = {}
def __new__(cls, file, rank):
if len(file) != 1:
raise ValueError("{} component file {!r} does not have a length of one."
.format(cls.__name__, file))
return cls._interned[key]
We’re now using named positional arguments for the file and rank arguments
to __new__(), and we’ve moved the validation logic from __init__() to
__new__().
Once the arguments are validated, we use them to create a single tuple object
from file and rank. We use this tuple as a key and check if there is an entry
against this key tuple in a dictionary called _interned which we’ve attached as a
class-attribute. Only if the tuple key is not present in the dictionary do we
allocate a new instance; we do this by calling object.__new__(). We then
configure the new instance — doing the remainder of the work that used to be
done in __init__() — and insert the newly minted instance into the dictionary.
Of course, the instance we return is an element in the dictionary, either the one
we just created or one created earlier.
Our __init__() method is now empty, and in fact we could remove it entirely.
With these changes in place allocating 10,000 boards takes significantly less
memory than before — we’re down to about 23 MB.
Interning is a powerful tool for managing memory usage — in fact Python uses
it internally for integers and strings — but it should only be used with immutable
value types such as our ChessCoordinate where instances can safely be shared
between data structures.
Now that we understand __new__(), in the next chapter we can move on to
another advanced Python topic: metaclasses.
Summary
We’ve covered an crucial topic in this short chapter: the distinction between the
allocation and initialization of instances. Let’s summarise what we covered:
We showed how the static method __new__() is called to allocate and return
a new instance.
__new__() is implicitly a static method which accepts the class of the
new instance as its first argument, and it doesn’t require the either the
@classmethod or @staticmethod decorators
Ultimately, object.__new__() is responsible for allocating all instances.
One use for overriding __new__() is to support instance interning which
can be useful when certain values of immutable value types are very
common, or when the domain of values is small and finite, such as with the
squares of a chess board.
This isn’t the whole story though, and Python offers yet more control over
instance creation at the class level. Before we get to that, though, we need to
understand metaclasses.
Chapter 6 - Metaclasses
We’ve mentioned metaclasses several times. We owe you an explanation!
We all know that in Python, the type of an instance is its class. So what is the
type of its class?:
>>> type(Widget)
<class 'type'>
Given that the type of the Widget class is type we can say that the metaclass of
Widget is type. In general, the type of any class object in Python is its metaclass,
and the default metaclass is type.
We can get the same results by drilling down through special attributes:
>>> w.__class__
<class '__main__.Widget'>
>>> w.__class__.__class__
<class 'type'>
>>> w.__class__.__class__.__class__
<class 'type'>
The name, bases and namespace arguments contain the information collected
during execution of the class definition — normally the class attributes and
method definitions inside the class block — although in our case the class-block
is logically empty.
@classmethod
def __prepare__(mcs, name, bases, **kwargs):
print("TracingMeta.__prepare__(name, bases, **kwargs)")
print("mcs =", mcs)
print("name =", name)
print("bases =", bases)
print("kwargs =", kwargs)
namespace = super().__prepare__(name, bases)
print("<-- namespace =", namespace)
print()
return namespace
Notice that although __new__() is implictly a class-method, we must explicitly decorate the
__prepare__() class-method with the appropriate decorator.
Now we’ll define a class containing a simple method called action() which
prints a message and a single class attribute the_answer with the cosmically
inevitable value 42. We’ll do this at the REPL so you can see clearly when the
metaclass machinery is invoked:
>>> from tracing import TracingMeta
>>> class Widget(metaclass=TracingMeta):
... def action(message):
... print(message)
... the_answer = 42
...
TracingMeta.__prepare__(name, bases, **kwargs)
mcs = <class 'tracing.TracingMeta'>
name = Widget
bases = ()
kwargs = {}
<-- namespace = {}
The instant we complete the class definition we can see from the tracing output
that Python executes the __prepare__(), __new__() and __init__() methods in
turns.
The name argument contains the name of our Widget class as a string.
The bases argument is an empty tuple. We didn’t declare any base classes for
Widget, and the ultimate object base-class is implicit.
The kwargs argument is an empty dictionary. We’ll cover the significance of this
shortly.
The most important aspect of __prepare__ is that when it calls its superclass
implementation in type, the return value is a dictionary — or more generally a
mapping type. In this case it’s a regular empty dictionary. This dictionary will be
the namespace associated with the nascent class.
The __module__ attribute is mapped to the name of the module in which the
class was defined. Because we used the REPL, this is builtins in this case. The
__qualname__ attribute contains the fully-qualified name of the class including
parent modules and packages. In this case, it contains just the class name, as the
builtins module used by the REPL — being the last namespace in the LEGB
lookup hierarchy — is available everywhere, .
As you can see, the arguments are dutifully forwarded to the metaclass methods,
and the argument values could have been used to configure the class object. This
allows the class statement to be used as a kind of class factory.
class EntriesMeta(type):
Print kwargs
Extract the num_entries value from kwargs
Print num_entries
Use a dictionary comprehension to generate a diction mapping letters to
number values
Update the namespace dictionary with our dictionary items
Pass the modified namespace object on to type.__new__() via a call to
super()
The problem we’ve set up here is that both __init__() and __new__() must
accept any additional arguments — they must have the same signature. We need
to add a do-nothing __init__() to keep Python happy:
def __init__(cls, name, bases, namespace, **kwargs):
super().__init__(name, bases, namespace)
We can see the num_entries item arrive in kwargs, and by the time we get to
__init__() we can see the namespace object with additional entries.
class EntriesMeta(type):
@classmethod
def __prepare__(mcs, name, bases, **kwargs):
print("TracingMeta.__prepare__(name, bases, **kwargs)")
print("mcs =", mcs)
print("name =", name)
print("bases =", bases)
print("kwargs =", kwargs)
namespace = super().__prepare__(name, bases)
print("<-- namespace =", namespace)
print()
return namespace
def metamethod(cls):
print("TracingMeta.metamethod(cls)")
print("cls = ", cls)
print()
It turns out that regular methods of the metaclass can be accessed similarly to
class methods of the Widget class, and they will have the class passed to them as
the first (and implicit) argument:
>>> Widget.metamethod()
TracingMeta.metamethod(cls)
cls = <class 'Widget'>
Class method definitions in the regular class and its base classes will take
precedence over looking up a method in its metaclass.
Regular methods of the metaclass accept cls as their first argument. This makes
sense because cls (the class) is the ‘instance’ of the metaclass; it is analagous to
self. On the other hand, class- methods of the metaclass accept mcs as their first
argument — the metaclass — analagous to the cls argument of a class method
in an actual class.
We have learned that behind the scenes this will call Widget.__new__() to
allocate a Widget followed by Widget.__init__() to do any further
initialisation. Let’s pull back the curtain, and see exactly what is “behind the
scenes”.
@classmethod
def __prepare__(mcs, name, bases, **kwargs):
print("TracingMeta.__prepare__(name, bases, **kwargs)")
print(" mcs =", mcs)
print(" name =", name)
print(" bases =", bases)
print(" kwargs =", kwargs)
namespace = super().__prepare__(name, bases)
print("<-- namespace =", namespace)
print()
return namespace
def metamethod(cls):
print("TracingMeta.metamethod(cls)")
print(" cls = ", cls)
print()
Notice that when we import the module into the REPL, the metaclass trifecta
__prepare__(), __new__(), and __init__() are invoked when the
TracingClass is defined:
Look carefully at the control flow here. Our call to the constructor invokes
__call__() on the metaclass, which receives the arguments we passed to the
constructor in addition to the type we’re trying to construct.
It’s very rare to see the __call__() metamethod overridden. It’s pretty low level
in Python terms and provides some of the most basic Python machinery. That
said, it can be powerful.
class ConstrainedToKeywords(metaclass=KeywordsOnlyMeta):
We’ve covered a lot of the theory and practice behind metaclasses. Now we’ll
build on those ideas with some useful applications.
def method(self):
return "first definition"
def method(self):
return "second definition"
In fact, the second definition takes precedence because it overwrites the first
entry in the namespace dictionary as the class definition is processed:
>>> from duplicatesmeta import *
>>> dodgy = Dodgy()
>>> dodgy.method()
'second definition
Let’s write a metaclass which detects and prevents this unfortunate situation
from occurring. To do this, rather than using a regular dictionary as the
namespace object used during class construction, we need a dictionary which
raises an error when we try to assign to an existing key.
We can now design a very simple metaclass which uses OneShotDict for the
namespace object:
class ProhibitDuplicatesMeta(type):
@classmethod
def __prepare__(mcs, name, bases):
return OneShotDict()
If we try to define a class with duplicate methods using this metaclass, we get an
error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 4, in Dodgy
File "/Users/sixtynorth/sandbox/metaclasses/duplicatesmeta.py", line 12, in __setitem__
raise ValueError("Cannot assign to existing key {!r} in {!r}".format(key, type(self).\
__name__))
ValueError: Cannot assign to existing key 'method' in 'OneShotDict'
The main shortcoming here is that the error message isn’t hugely informative.
Unfortunately, we don’t have access to the part of the runtime machinery which
reads our class definition and populates the dictionary, so we can’t intercept the
KeyError and emit a more useful error instead. The best we can do — rather
than using a general purpose collection like OneShotDict — is to create a
functional equivalent called something like OneShotClassNamespace with a
more specific error message. This has the benefit that we can pass in additional
diagnostic information, such as the name of the class currently being defined,
into the namespace object on construction, which helps us emit a more useful
message:
class OneShotClassNamespace(dict):
class ProhibitDuplicatesMeta(type):
@classmethod
def __prepare__(mcs, name, bases):
return OneShotClassNamespace(name)
When we try to execute the module containing the class with the duplicate
method definition, we get a much more useful error message “Cannot reassign
existing class attribute ‘method’ of ‘Dodgy’”:
>>> from duplicatesmeta import *
Traceback (most recent call last):
File "/Users/sixtynorth/sandbox/metaclasses/duplicatesmeta.py", line 38, in <module>
class Dodgy(metaclass=ProhibitDuplicatesMeta):
File "/Users/sixtynorth/sandbox/metaclasses/duplicatesmeta.py", line 43, in Dodgy
def method(self):
File "/Users/sixtynorth/sandbox/metaclasses/duplicatesmeta.py", line 27, in __setitem__
raise TypeError("Cannot reassign existing class attribute {!r} of {!r}".format(key, s\
elf._name))
TypeError: Cannot reassign existing class attribute 'method' of 'Dodgy'
Much better!
class Positive:
def __init__(self):
self._instance_data = WeakKeyDictionary()
class Planet:
def __init__(self,
name,
radius_metres,
mass_kilograms,
orbital_period_seconds,
surface_temperature_kelvin):
self.name = name
self.radius_metres = radius_metres
self.mass_kilograms = mass_kilograms
self.orbital_period_seconds = orbital_period_seconds
self.surface_temperature_kelvin = surface_temperature_kelvin
@property
def name(self):
return self._name
@name.setter
def name(self, value):
if not value:
raise ValueError("Cannot set empty Planet.name")
self._name = value
radius_metres = Positive()
mass_kilograms = Positive()
orbital_period_seconds = Positive()
surface_temperature_kelvin = Positive()
Recall that we implemented a new descriptor type called Positive which would
only admit positive numeric values. This saved a lot of boilerplate code in the
definition of our Planet class, but we lost an important capability along the way,
because there is no way for a descriptor instance to know to which class attribute
it has been bound. One of the instances of Positive is bound to
Planet.radius_metres, but it has no way of knowing that. The default Python
machinery for processing class definitions just doesn’t set up that association.
The error message doesn’t — and in fact can’t — tell us which attribute
triggered the exception.
Now we’ll show how we can modify the class creation machinery by defining
metaclasses which should be able to intervene in the process of defining the
Planet class in order to give descriptor instances the right name.
We’ll start by introducing a new base-class for our descriptors called Named. This
is very simple and just has name as a public instance attribute. The constructor
defines a default value of None because we won’t be in a position to assign the
attribute value until after the descriptor object has been constructed:
class Named:
We’ve modfied the argument list to __init__(), ensured that the superclass
initialiser is called, and made use of the new name attribute in the error message
raised by __set__() when we try to assign a non-positive number to the
descriptor.
Now we need a metaclass which can detect the presence of descriptors which are
Named and assign class attribute names to them:
class DescriptorNamingMeta(type):
Again, this is fairly straightforward. In __new__ we iterate over the names and
attributes in the namespace dictionary, and if the attribute is an instance of Named
we assign the name of the current item to its public name attribute.
Having modified the contents of the namespace, we then call the superclass
implementation of __new__() to actually allocate the new class object.
The only change we need to make to our Planet class is to refer to the metaclass
on the opening line. There’s not need for us to modify our uses of the Positive
descriptor, the optional name argument will default to None when the class
definition is read, before metaclass __new__() is invoked:
class Planet(metaclass=DescriptorNamingMeta):
def __init__(self,
name,
radius_metres,
mass_kilograms,
orbital_period_seconds,
surface_temperature_kelvin):
self.name = name
self.radius_metres = radius_metres
self.mass_kilograms = mass_kilograms
self.orbital_period_seconds = orbital_period_seconds
self.surface_temperature_kelvin = surface_temperature_kelvin
@property
def name(self):
return self._name
@name.setter
def name(self, value):
if not value:
raise ValueError("Cannot set empty Planet.name")
self._name = value
radius_metres = Positive()
mass_kilograms = Positive()
orbital_period_seconds = Positive()
surface_temperature_kelvin = Positive()
By trying to set a non-positive mass for the planet Mercury, we can see that each
descriptor objects now knows the name of the attribute to which it has been
bound, so can it emit much more helpful diagnostic message:
>>> from planet import *
>>> mercury, venus, earth, mars = make_planets()
>>> mercury.mass_kilograms
3.3022e+23
>>> mercury.mass_kilograms = -10000
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/sixtynorth/sandbox/metaclasses/planet_08.py", line 23, in __set__
raise ValueError("Attribute value {} {} is not positive".format(self.name, value))
ValueError: Attribute value mass_kilograms -10000 is not positive
In metainheritance.py let’s define two metaclasses related only by the fact that
they both subclass type:
# metainheritance.py
class MetaA(type):
pass
class MetaB(type):
pass
We’ll also define two regular classes, A and B which use the MetaA and MetaB as
their respective metaclasses:
class A(metaclass=MetaA):
pass
class B(metaclass=MetaB):
pass
Now we’ll introduce a third regular class D which derives from class A:
class D(A):
pass
The metaclass of class D is MetaA, which was inherited from regular class A. So,
metaclasses are inherited.
What happens if we try to create a new class C which inherits from both regular
classes A and B with their different metaclasses? Let’s give it a go:
class C(A, B):
pass
When we try to execute this code by importing it at the REPL, we get a type
error:
>>> from metainheritance import *
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_import_hook.py", line 21, \
in do_import
module = self._system_import(name, *args, **kwargs)
File "/Users/rjs/training/p4/courses/pluralsight/advanced-python/source/advanced-python\
-m06-metaclasses/examples/metainheritance.py", line 22, in <module>
class C(A, B):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) su\
bclass of the metaclasses of all its bases
With the message “metaclass conflict: the metaclass of a derived class must be a
(non-strict) subclass of the metaclasses of all its bases” Python is telling us that it
doesn’t know what to do with the unrelated metaclasses. Which metaclass
__new__() should be used to allocate the class object?
To resolve this we need a single metaclass — let’s call it MetaC — which we can
create by inheriting from both MetaA and MetaB:
class MetaC(MetaA, MetaB):
pass
In the definition of C we must override the metaclass to specify MetaC:
class C(A, B, metaclass=MetaC):
pass
With these changes we can successfully import C and check that it’s metaclass is
MetaC:
So we’ve persuaded Python to accept our code, but our metaclasses are empty so
they combine trivially.
class ProhibitDuplicatesAndKeyWordsOnlyMeta(
ProhibitDuplicatesMeta,
KeyWordsOnlyMeta):
pass
The Planet class definition is executed when we import the module. This is
when the metaclasses to their work, and we can see that the tracing works as
expected:
>>> from planet import *
TracingMeta.__prepare__(name, bases, **kwargs)
mcs = <class 'tracing.TracingMeta'>
name = TracingClass
bases = ()
kwargs = {}
<-- namespace = {}
Summary
We’ve covered a lot of ground in this chapter, and you should now know more
than the majority of Python developers about the customisation of class creation
using metaclasses.
All classes have a metaclass which is the type of the class object. The
default type of class objects is type.
The metaclass is responsible for processing the class definition from
parsing from the source code into a class object.
The __prepare__() metaclass method must return a mapping object which
the Python runtime will populate with namespace items collected from
parsing and executing the class definition.
The __new__() metaclass method must allocate and return a class object
and configure it using the contents of the class namespace, the list of base
classes passed from the definition, and any additional keyword arguments
passed in the definition.
The __init__() metaclass method can be used to further configure a class
object, and must have the same signature as __new__().
The __call__() metaclass method in effect implements the constructor for
class instances, and is invoked when we construct an instance.
An important use case for metaclasses is to support so-called named
descriptors, whereby we can configure descriptor objects such as properties
with the name of the class attribute to which they are assigned.
Strict rules control how muliple metaclasses interact in the presence of
inheritance relationships between the regular classes which use them.
Judicial metaclass design using super() to delegate via the MRO can yield
metaclasses which compose gracefully.
<code></code><code class="nd">@my_class_decorator</code>
<code class="go">__weakref__</code>
<code class="go">get_kelvin</code>
<code class="go">set_kelvin</code>
<code class="go">__dict__</code>
<code class="go">__module__</code>
<code class="go">__init__</code>
<code class="sd"> the object on which the method was called will be passed.
</code>
<code class="sd"> The predicate should evaluate to True if the class
invariant</code>
<code class="sd"> A class decorator for checking the class invariant tested
by</code>
<code class="n">_wrap_method_with_invariant_checking_proxy</code>
<code class="p">(</code><code class="bp">cls</code><code class="p">,
</code> <code class="n">name</code><code class="p">,</code> <code
class="n">predicate</code><code class="p">)</code>
<code class="nd">@invariant</code><code class="p">(</code><code
class="n">not_below_absolute_zero</code><code class="p">)</code>
<code class="nd">@property</code>
<code class="nd">@celsius.setter</code>
<code class="nd">@property</code>
<code class="k">def</code> <code class="nf">fahrenheit</code><code
class="p">(</code><code class="bp">self</code><code class="p">):</code>
<code class="nd">@fahrenheit.setter</code>
<code class="go">-231.14999999999998</code>
<code class="go">-100.0</code>
<code class="sd"> the object on which the method was called will be passed.
</code>
<code class="sd"> A class decorator for checking the class invariant tested
by</code>
<code class="n">_wrap_method_with_invariant_checking_proxy</code>
<code class="p">(</code><code class="bp">cls</code><code class="p">,
</code> <code class="n">name</code><code class="p">,</code> <code
class="n">predicate</code><code class="p">)</code>
<code class="go">37.77777777777783</code>
<code class="k">def</code> <code class="nf">get_kelvin</code><code
class="p">(</code><code class="bp">self</code><code class="p">):</code>
<code class="nd">@property</code>
<code class="nd">@celsius.setter</code>
<code class="nd">@property</code>
<code class="nd">@fahrenheit.setter</code>
<code class="go">>>></code>
<code class="n">_wrap_property_with_invariant_checking_proxy</code>
<code class="p">(</code><code class="bp">cls</code><code class="p">,
</code> <code class="n">name</code><code class="p">,</code> <code
class="n">predicate</code><code class="p">)</code>
For a solution to this problem we’ll use Python’s abstract base-class mechanism,
a topic we’ll explore in the next chapter.
Summary
Abstract refers to the fact that the class cannot be instantiated in isolation; that is,
it makes no sense to create an object of the type of the base-class alone. It only
makes sense to instantiate the class as part of an object of a derived type. Ideally,
it should should not be possible to instantiate an abstract base-class directly. The
opposite of abstract is concrete, and in this example the printer driver is a
concrete class, so it makes sense to instantiate it.
In Python, this is true both in theory and practice, but determining whether a
particular object supports the required interface in advance of exercising that
interface can be quite awkward. For example, what does it mean in Python to be
a mutable sequence? We know that list is a mutable sequence, but we cannot
assume that all mutable sequences are lists. In fact, the mutable sequence
protocol requires at least sixteen methods are implemented. When relying on
duck-typing it can be difficult to be sure that you’ve met the requirements, and if
clients do need to determine whether a particular object is admissable as a
mutable sequence with a look-before-you-leap approach, the check is messy and
awkward to perform robustly.
For instance, we can determine that list is mutable sequence using the built-in
issubclass() function:
This much may not be surprising, but let’s look at the base-class of list. In fact,
we’ll look at the transitive base-classes of list by examining it’s method
resolution order using the __mro__ attribute:
>>> list.__mro__
(<class 'list'>, <class 'object'>)
This reveals that list has only one base-class, object, and that
MutableSequence is nowhere to be seen. Further reflection — if you’ll excuse
the pun — might lead you to wonder how it is that such a fundamental type as
Python’s list can be a subclass of a type defined in a library module.
This fails with a useful TypeError explaining the five methods we need to
implement the Mutable Sequence protocol. The reason we don’t need to
implement all sixteen is that eleven of them can be implemented in terms of the
other five, and the MutableSequence abstract base-class contains code to do
exactly this. Note, however, that these implementations may not be the most
efficient since they can’t exploit knowledge of the concrete class, but must
instead work entirely through the interface of the abstract class.
issubclass(list, MutableSequence)
A simple example
The __subclasscheck__() method on the metaclass of the virtual base class can
do pretty much anything it likes to determine whether its argument is to be
considered a subclass. Consider the code in weapons.py:
class SwordMeta(type):
def __subclasscheck__(cls, sub):
return (hasattr(sub, 'swipe') and callable(sub.swipe)
and
hasattr(sub, 'sharpen') and callable(sub.sharpen))
class Sword(metaclass=SwordMeta):
pass
class BroadSword:
def swipe(self):
print("Swoosh!")
def sharpen(self):
print("Shink!")
class SamuraiSword:
def swipe(self):
print("Slice!")
def sharpen(self):
print("Shink!")
class Rifle:
def fire(self):
print("Bang!")
In this module we have defined a Sword class with a metaclass SwordMeta.
SwordMeta defines the __subclasscheck__() method to check for the existence
of callable swipe and sharpen attributes on the class. In this situation Sword will
play the role of a virtual base-class. A few simple tests at the REPL confirm that
BroadSword and SamuraiSword are indeed considered subclasses of Sword even
though there is no explicit relationship through inheritance:
>>> issubclass(BroadSword, Sword)
True
>>> issubclass(SamuraiSword, Sword)
True
>>> issubclass(Rifle, Sword)
False
__instancecheck__()
This isn’t the whole story though, as tests of instances using isinstance() will
return inconsistent results:
>>> samurai_sword = SamuraiSword()
>>> isinstance(samurai_sword, Sword)
False
This is because the isinstance() machinery checks for the existence of the
__instancecheck__() metamethod which we have not yet implemented. Let’s
do so now:
class SwordMeta(type):
This surprising technique is used in Python for some of the collection abstract
base-classes, including Sized:
>>> from collections.abc import Sized
>>>
>>> class SizedCollection:
... def __init__(self, size):
... self._size = size
... def __len__(self):
... return self._size
...
>>>
>>> issubclass(SizedCollection, Sized)
True
One glaring example from Python revolves around the Hashable virtual base-
class from collections.abc:
>>> from collections.abc import Hashable
>>> issubclass(object, Hashable)
True
>>> issubclass(list, object)
True
>>> issubclass(list, Hashable)
False
Some further investigation reveals that the list class sets the __hash__ attribute
to None. The Hashable.__subclasscheck__() implementation checks for this
eventuality and uses it to signal non-hashability.
This example is also interesting because it demonstrates the fact that even the
ultimate base-class object can be considered a subclass of Hashable —
underlying the lack of symmetry between superclass and subclass relationships
in Python.
class Sword(metaclass=SwordMeta):
def thrust(self):
print("Thrusting...")
For this reason, it’s not possible to call virtual base-class methods using super()
since super() works by searching the MRO.
class Sword(metaclass=ABCMeta):
pass
ABCMeta doesn’t, of course, know what it means to be a sword, so the test that
was previously in SwordMeta.__subclasscheck__() needs to be relocated
elsewhere. The ABCMeta.__subclasscheck__() method calls the special
__subclasshook__() method on our actual class to perform the test.
>>> object.__subclasshook__()
NotImplemented
Let’s try implementing __subclasshook__() for our Sword class with pretty
much the same definition we used for __subclasscheck__() previously:
class Sword(metaclass=ABCMeta):
@classmethod
def __subclasshook__(cls, sub):
return (hasattr(sub, 'swipe') and callable(sub.swipe)
and
hasattr(sub, 'sharpen') and callable(sub.sharpen))
Notice that the register() metamethod returns the class which was registered
— a point we’ll return to in a moment. Now we can even retrofit base-classes
(albeit virtual ones) to the built-in types:
>>> issubclass(str, Text)
True
>>> isinstance("Is this text?", Text)
True
Here were demonstrate that the built-in str class is now considered a subclass of
our our Text class defined at the REPL, and str objects are instances of Text.
Using register as a decorator
Because the register() metamethod returns its argument, we can even use it as
a class decorator. Let’s register a class Prose as a virtual subclass of Text:
>>> @Text.register
... class Prose:
... pass
...
>>> issubclass(Prose, Text)
True
To see this in action, let’s add a LightSaber which has no sharpen() method to
our example. This class won’t satisfy the __subclasshook__() test we defined
in Sword, but we still want it identified as a virtual subclass of Sword, so we’ve
registered it using the decorator form of Sword.register:
@Sword.register
class LightSaber:
def swipe(self):
print("Ffffkrrrrshhzzzwooooom..woom..woooom..")
Even though we’ve registered LightSaber with Sword the subclass test returns
False:
With this change in place — which exploits shortcut evaluation of the logical
operators — subclass detection now works as expected for implicitly detected
subclasses, explicitly registered subclasses, and non-subclasses:
>>> issubclass(BroadSword, Sword)
True
>>> issubclass(LightSaber, Sword)
True
>>> issubclass(Rifle, Sword)
False
class ABC(metaclass=ABCMeta):
"""Helper class that provides a standard way to create an ABC using
inheritance.
"""
pass
This makes it even easier to declare abstract base-classes without having to put
the metaclass mechanism on show. This may be an advantage when coding for
audiences who haven’t been exposed to the concept of metaclasses. Using ABC
our Sword class becomes:
from abc import ABC
class Sword(ABC):
@classmethod
def __subclasshook__(cls, sub):
return ((hasattr(sub, 'swipe') and callable(sub.swipe)
and
hasattr(sub, 'sharpen') and callable(sub.sharpen))
or NotImplemented)
@abstractmethod
def an_abstract_method(self):
raise NotImplementedError # Method body syntactically required.
Let’s add abstract methods for swiping and thrusting to our Sword abstract base-
class. We’ll also update __subclasshook__() to match:
from abc import ABC, abstractmethod
class Sword(ABC):
@classmethod
def __subclasshook__(cls, sub):
return ((hasattr(sub, 'swipe') and callable(sub.swipe)
and
hasattr(sub, 'thrust') and callable(sub.thrust)
and
hasattr(sub, 'parry') and callable(sub.parry)
and
hasattr(sub, 'sharpen') and callable(sub.sharpen))
or NotImplemented)
@abstractmethod
def swipe(self):
raise NotImplementedError
@abstractmethod
def thrust(self):
print("Thrusting...")
@abstractmethod
def parry(self):
raise NotImplementedError
We’ll take this opportunity to remind you of the distinction between NotImplemented and
NotImplementedError. NotImplemented is a value returnable from predicate functions which
are unable to make a determination of True or False. On the other hand,
NotImplementedError is an exception type to be raised in place of missing code.
class BroadSword(Sword):
def swipe(self):
print("Swipe!")
def sharpen(self):
print("Shink!")
We must implement all abstract methods for the class to be considered concrete:
class BroadSword(Sword):
def swipe(self):
print("Swoosh!")
def thrust(self):
super().thrust()
def parry(self):
print("Parry!")
def sharpen(self):
print("Shink!")
@staticmethod
@abstractmethod
def an_abstact_static_method():
raise NotImplementedError
@classmethod
@abstractmethod
def an_abstract_class_method(cls):
raise NotImplementedError
@property
@abstractmethod
def an_abstract_property(self):
raise NotImplementedError
@an_abstract_property.setter
@abstractmethod
def an_abstract_property(self, value):
raise NotImplementedError
@abstractmethod
def __get__(self, instance, owner):
# ...
pass
@abstractmethod
def __set__(self, instance, value):
# ...
pass
@abstractmethod
def __delete__(self, instance):
# ...
pass
@property
def __isabstractmethod__(self):
return True # or False if not abstract
Let’s see this in action with a very simple example. We’ll define a class called
AbstractBaseClass which inherits from ABC. Within this class we’ll define two
properties called abstract_property and concrete_property using the
appropriate combinations of decorators:
>>> from abc import (ABC, abstractmethod)
>>> class AbstractBaseClass(ABC):
... @property
... @abstractmethod
... def abstract_property(self):
... raise NotImplementedError
... @property
... def concrete_property(self):
... return "sand, cement, water"
...
>>>
The problem here is that our class decorator is detecting specifically property
instances with this fragment:
property_names = [name for name, attr
in vars(cls).items()
if isinstance(attr, property)]
for name in property_names:
_wrap_property_with_invariant_checking_proxy(cls, name, predicate)
@abstractmethod
def __get__(self, instance, owner):
raise NotImplementedError
@abstractmethod
def __set__(self, instance, value):
raise NotImplementedError
@abstractmethod
def __delete__(self, instance):
raise NotImplementedError
@property
@abstractmethod
def __isabstractmethod__(self):
raise NotImplementedError
Having defined an abstract base class, we now need some subclasses. The first
will be a virtual subclass — the built-in property class — which we’ll register
with the base-class:
PropertyDataDescriptor.register(property)
The second subclass will be a real subclass. We’ll modify our existing property
proxy InvariantCheckingPropertyProxy to inherit from
PropertyDataDescriptor, which will also require that we override the
__isabstractmethod__ property:
class InvariantCheckingPropertyProxy(PropertyDataDescriptor):
@property
def __isabstractmethod__(self):
return self._referent.__isabstractmethod__
def invariant(predicate):
"""Create a class decorator which checks a class invariant.
Args:
predicate: A callable to which, after every method invocation,
the object on which the method was called will be passed.
The predicate should evaluate to True if the class invariant
has been maintained, or False if it has been violated.
Returns:
A class decorator for checking the class invariant tested by
the supplied predicate function.
"""
def invariant_checking_class_decorator(cls):
"""A class decorator for checking invariants."""
return invariant_checking_class_decorator
With these changes in place both invariants are enforced on property writes:
>>> t = Temperature(42)
>>>
>>> t.celsius = -300
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "class_decorators.py", line 86, in __set__
result = self._referent.__set__(instance, value)
File "class_decorators.py", line 88, in __set__
raise RuntimeError("Class invariant {!r} violated for {!r}".format(self._predicate.__\
doc__, instance))
RuntimeError: Class invariant 'Temperature not below absolute zero' violated for <class_d\
ecorators.Temperature object at 0x103012cc0>
>>>
>>> t.celsius = 1e34
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "class_decorators.py", line 88, in __set__
raise RuntimeError("Class invariant {!r} violated for {!r}".format(self._predicate.__\
doc__, instance))
RuntimeError: Class invariant 'Temperature below absolute hot' violated for <class_decora\
tors.Temperature object at 0x103012cc0>
There’s a lot going on in this code with decorators, metaclasses, abstract base-
classes, and descriptors, and it may seem somewhat complicated. All this
complexity is well encapsulated in the invariant class decorator, however, so
take a step back and enjoy the simplicity of the client code in the Temperature
class.
Summary
In this chapter we’ve explained Python’s system of abstract base-classes, which
is rather somewhat more flexible that similar concepts in other languages. In
particlar we covered these topics:
Subclass/instance checking
How the behaviour of the built-in issubclass() and isinstance()
functions can be specialised for a base-class by defining the
__subclasscheck__() and __instancecheck__() methods on the
metaclass of that base-class.
Specialised subclass checks allow us to centralize the definition of
what it means to be a subclass by gathering look-before-you-leap
protocol checks into one place. Any class which implements the
required protocol will become at least a virtual subclass of a virtual
base-class.
The standard library abc module contains tools for assisting in the
definition of abstract base-classes.
Most important amongst those tools is the ABCMeta metaclass which
can be used as the metaclass for abstract base-classes.
Slightly more conveniently, you can simply inherit from the ABC class
which has ABCMeta as its metaclass.
ABCMeta provides default implementations of both
__subclasscheck__() and __instancecheck__() which support two
means of identifying subclasses: A special __subclasshook__()
classmethod on abstract base-classes and a registration method.
__subclasshook__() accepts a candidate subclass as its only argument
and should return True or NotImplemented. False should only be
returned if it is desired to disable subclass registration.
Passing any class — even a built-in class — to the register()
metamethod of an abstract base-class will register the argument as a
virtual subclass of the base-class.
An @abstractmethod decorator can be used to prevent instantiation of
abstract classes. It requires methods marked as such to be overridden
in real — although not virtual — subclasses.
The @abstractmethod decorator can be combined with other
decorators such as @staticmethod, @classmethod, and @property, but
@abstractmethod should always be the innermost decorator.
Descriptors should propagate abstractness from underlying methods by
exposing the __isabstractmethod__ attribute.
Afterword: Continue the journey
Python is a large and complicated language with many moving parts. We find it
remarkable that much of this complexity is hidden so well in Python. We hope in
this book we’ve given you deeper insight into some important mechanisms in
Python which, while a bit trickier to understand, can deliver great expressive
power. You have reached the end of this book on advanced Python, and now is
the time to take what you have learned and apply these powerful techniques in
your work and play with Python. No matter where your journey goes, though,
remember that, above all else, it’s great fun to write Python software, so enjoy
yourselves!
Notes
1 At least Python 3.5, though any version greater than that will work as well
(e.g. 3.6).↩
2 See https://mail.python.org/pipermail/python-ideas/2009-
October/006157.html↩
3 See http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.103.6084&rep=rep1&type=pdf↩
5 Note we are required to pass the length parameter to specify how many bytes
we want in the result and the byteorder parameter to specify whether we want
the bytes returned in big-endian order with the most significant byte first, or
little-endian order with the least significant byte first↩
6 We’ve taken additional steps in this book to make the output even more
readable by wrapping the interleaved lines of indices and buffer data to make it
more readable. The code we present outputs all indices on one line, and all
buffer bytes on the following line.↩
9 You’ll notice that in our view, whatever the International Astronomical Union
says, Pluto is a planet!↩
10 Of course your system will likely report different numbers. A lot of factors
are involved in memory usage. The real point is that allocating 10,000 board
should show a marked increase in memory usage.↩
11 Recall from chapter 4 that a descriptor is any object supporting any of the
__get__(), __set__(), or __delete__() methods.↩