Pydon'ts: Write Beautiful Python Code

Download as pdf or txt
Download as pdf or txt
You are on page 1of 110
At a glance
Powered by AI
The document discusses various Python programming concepts and best practices.

The document discusses elegant and beautiful Python code.

The document discusses recursion in Python and its limitations.

Pydon’ts

Write beautiful Python code

Rodrigo Girão Serrão

11-04-2021
Contents

Foreword 2

Pydon’t disrespect the Zen of Python 4

Chaining comparison operators 7

Assignment expressions and the walrus operator := 14

Truthy, Falsy, and bool 19

Deep unpacking 28

Unpacking with starred assignments 35

EAFP and LBYL coding styles 40

Zip up 47

Enumerate me 56

str and repr 68

Structural pattern matching tutorial 73

Structural pattern matching anti-patterns 86

Watch out for recursion 97

1
Foreword

I’m glad that you are reading these words.


For a long time now, teaching and sharing knowledge has been a passion of
mine. If you are reading this book, then it probably means that you hope to find
new knowledge in here, knowledge that I am sharing with you. This brings me
great joy.
I have been writing Python since I was 15 years old, which means I have now writ-
ten Python for more than a third of my life (if you do the maths, that means I am
at least 22.5 years old at the time of writing). This is not necessarily something
that you find impressive, but I do appreciate that fact about myself.
Python was not my first programming language, and I remember picking it up
as a friend of mine recommended it to me. Now, many years later, I still enjoy
writing Python code, whether for work-related reasons or for my own projects
(just take a look at https://mathspp.com/blog/tag:python). In programming,
much like in mathematics – my main area of expertise –, there is a sense of
elegance in the code (or proofs) we write. As I learned more and more about
programming in general and Python in particular, I developed a sense for what
I consider to be elegant Python programs. This is one of the things I intend to
share in this book: tips on how to write beautiful Python programs.
Of course, the notion of elegance is a subjective one, so it may very well be the
case that what I find elegant is not what you find elegant, and that is perfectly
fine. In general, neither one of us will be wrong.
Tied to my effort of sharing my interpretation of what elegant Python programs
look like, I also want you to learn about all the nooks and crannies of the core
language. Python is a very, very, rich language, and the more you learn about it,
the more well equipped you will be to use it to its full potential. That is why every
chapter focuses on exploring a single feature of the core language of Python,
which is always accompanied by usage examples of said feature. Some times
we will look at how Python’s own Standard Library makes use of that feature,
other times I will show some of my own code, and other times I will even come
up with random examples.
For now, this book is being expanded as I write one chapter per week and publish

2
it to https://mathspp.com/blog/pydonts, where you can read the contents of
this book for free.
If you would like to share your feedback, let me know of any typos or errors you
found, or otherwise just get in touch, feel free to drop a line to rodrigo@maths
pp.com.
— Rodrigo, https://mathspp.com

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 3


Pydon’t disrespect the Zen
of Python

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/pydont-


disrespect-the-zen-of-python.)
To kick-off the Pydon’t series we start with a set of guidelines that all Pythonistas
should be aware of: the Zen of Python.
The Zen of Python is like a meta style guide. While you have things like PEP 8

4
that tell you how you should format your code, how to name your variables, etc.,
the Zen of Python provides you with the guidelines that you should follow when
thinking about (Python) code and when designing a program.

Zen of Python
You can read the Zen of Python by executing import this in your REPL, which
should print the following text:
The Zen of Python, by Tim Peters

Beautiful is better than ugly.


Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Take a look at those guidelines and try to appreciate their meaning. If you want
to write truly Pythonic code, then you should try to embrace these guidelines as
much as possible.
Digging in the reference of PEP 20 – The Zen of Python shows that Tim Peters
(a major contributor to Python in its earlier days) thinks that these guidelines
are “fundamental idiomatic recommendations for operating within the spirit of
the [Python] language”, which goes to show that these recommendations are
serious and should not be taken lightly - if you are willing to go the extra mile.
If you’ve seen the Kung Fu Panda, think of it this way: the Zen of Python is to
Python programmers what the Dragon Scroll is to kung fu practitioners: Po was
only able to take his kung fu skills to the next level, becoming truly amazing,
after embracing the Dragon Scroll. You will only become a true Pythonista after
you embrace the Zen of Python.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 5


My advice would be to read this from time to time, and to try and remember the
Zen of Python while you code and while you go over code that has already been
written (by you or someone else). I don’t know about you, but whenever I write
a (text) document, like a letter or a blog post, I never get it right on the first try.
I usually write a first draft and then go over it, editing as I see fit: sometimes
reworking whole sections. Writing code is the same: chances are, the first thing
you write can be greatly improved upon.
This Pydon’t was more of a “meta” Pydon’t, with subjective advice on how to
code. This might seem useless to you at first, but the more you dwell on it the
more helpful it will become. The next Pydon’ts will show you objective, practical
tips on how to write more Pythonic code.

References
• PEP 20 – The Zen of Python, https://www.python.org/dev/peps/pep-0020/
• “The Way of Python” mailing thread, https://groups.google.com/g/comp.l
ang.python/c/B_VxeTBClM0/m/L8W9KlsiriUJ
• Tim Peters (software engineer), Wikipedia https://en.wikipedia.org/wiki/
Tim_Peters_(software_engineer)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 6


Chaining comparison
operators

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/chain


ing-comparison-operators.)

Introduction
In this Pydon’t we will go over the chaining of comparison operators:
• how they work;
• useful usages; and

7
• weird cases to avoid.

Chaining of comparison operators


One of the things I enjoy about Python is that some of its features make so much
sense that you don’t even notice that you are using a feature until someone
points out that such code wouldn’t work in other languages. One such example
is comparison chaining! Look at this snippet of code and tell me if it doesn’t
look natural:
>>> a = 1
>>> b = 2
>>> c = 3
>>> if a < b < c:
... print("Increasing seq.")
...
Increasing seq.
When Python sees two comparison operators in a row, like in a < b < c, it be-
haves as if you had written something like a < b and b < c, except that b only
gets evaluated once (which is relevant if b is an expression like a function call).
In my opinion, this features makes a lot of sense and does not look surprising.
Instead, now I feel kind of sad that most languages do not have support for this
behaviour.
Another example usage is for when you want to make sure that three values are
all the same:
>>> a = b = 1
>>> c = 2
>>> if a == b == c:
... print("all same")
... else:
... print("some are diff")
...
some are diff
>>> c = 1
>>> if a == b == c:
... print("all same")
... else:
... print("some are diff")
...
all same
!!! Did you know that you can actually chain an arbitrary number of comparison
operators? !!! For example, a == b == c == d == e checks if all five variables

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 8


are the same, while !!! a < b < c < d < e checks if you have a strictly increas-
ing sequence.

Pit-falls
Even though this feature looks very sensible, there are a couple of pit-falls you
have to look out for.

Non-transitive operators
We saw above that we can use a == b == c to check if a, b and c are all the
same. How would you check if they are all different?
If you thought about a != b != c, then you just fell into the first pit-fall!
Look at this code:
>>> a = c = 1
>>> b = 2
>>> if a != b != c:
... print("a, b, and c all different:", a, b, c)
a, b, and c all different: 1 2 1
The problem here is that a != b != c is a != b and b != c, which checks
that b is different from a and from c, but says nothing about how a and c relate.
From the mathematical point of view, != isn’t transitive, i.e., knowing how a
relates to b and knowing how b relates to c doesn’t tell you how a relates to c.
As for a transitive example, you can take ==: if a == b and b == c then it is also
true that a == c.

Non-constant expressions or side-effects


Recall that in a chaining of comparisons, like a < b < c, the expression b in
the middle is only evaluated once, whereas if you were to write the expanded
expression, a < b and b < c, then b would get evaluated twice.
If b contains an expression with side-effects or if it is something that isn’t con-
stant, then the two expressions are not equivalent and you should think about
what you are doing.
This snippet shows the difference in number of evaluations of the expression in
the middle:
>>> def f():
... print("hey")
... return 3
...

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 9


>>> if 1 < f() < 5:
... print("done")
...
hey
done
>>> if 1 < f() and f() < 5:
... print("done")
...
hey
hey
done
This snippet shows that an expression like 1 < f() < 0 can actually evaluate
to True when it is unfolded:
>>> l = [-2, 2]
>>> def f():
... global l
... l = l[::-1]
... return l[0]
>>> if 1 < f() and f() < 0:
... print("ehh")
...
ehh
Of course that 1 < f() < 0 should never be True, so this just shows that the
chained comparison and the unfolded one aren’t always equivalent.

Ugly chains
This feature looks really natural, but some particular cases aren’t so great. This
is a fairly subjective matter, but I personally don’t love chains where the operat-
ors aren’t “aligned”, so chains like
• a == b == c
• a < b <= c
• a <= b < c
look really good, but in my opinion chains like
• a < b > c
• a <= b > c
• a < b >= c
don’t look that good. One can argue, for example, that a < b > c reads nicely
as “check if b is larger than both a and c”, but you could also write max(a, c)
< b or b > max(a, c).
Now there’s some other chains that are just confusing:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 10


• a < b is True
• a == b in l
• a in l is True
In Python, is, is not, in, and not in are comparison operators, so you can
also chain them with the other operators. This creates weird situations like
>>> a = 3
>>> l = [3, 5]
>>> if a in l == True:
... print("Yeah :D")
... else:
... print("Hun!?")
...
Hun!?
What is happening above is that a in l == True evaluates to the same value
as a in l and l == True. a in l is True, but l == True is False, so a in
l == True is True and False which is False. The one who wrote a in l ==
True probably meant (a in l) == True, but that is also the same as a in l.

Examples in code
Inequality chain
Having a simple utility function that ensures that a given value is between two
bounds becomes really simple, e.g.
def ensure_within(value, bounds):
return bounds[0] <= value <= bounds[1]
or if you want to be a little bit more explicit, while also ensuring bounds is a
vector with exactly two items, you can also write
def ensure_within(value, bounds):
m, M = bounds
return m <= value <= M

Equality chain
Straight from Python’s enum module, we can find a helper function (that is not
exposed to the user), that reads as follows:
def _is_dunder(name):
"""Returns True if a __dunder__ name, False otherwise."""
return (len(name) > 4 and
name[:2] == name[-2:] == '__' and

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 11


name[2] != '_' and
name[-3] != '_')
This function checks if a string is from a dunder method or not, the first thing
it does is checking if the beginning and the ending of the string are the same
and equal to "__":
>>> _is_dunder("__str__")
True
>>> _is_dunder("__bool__")
True
>>> _is_dunder("_dnd__")
False
>>> _is_dunder("_______underscores__")
False
! You have seen the __str__ and __repr__ dunder methods ! in the “str and
repr” Pydon’t and the __bool__ ! dunder method in the “Truthy, falsy, and bool”
Pydon’t. ! I will be writing about dunder methods in general in a later Pydon’t, !
so feel free to subscribe to stay tuned.

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Chaining comparison operators feels so natural, you don’t even no-
tice it is a feature. However, some chains might throw you off if you
overlook them.”
This Pydon’t showed you that:
• you can chain comparisons, and do so arbitrarily many times;
• chains with expressions that have side-effects or with non-deterministic
outputs are not equivalent to the extended version; and
• some chains using is or in can look really misleading.

References
• Python 3 Documentation, The Python Language Reference https://docs.p
ython.org/3/reference/expressions.html#comparisons;
• Python 3 Documentation, The Python Standard Library, enum, https://docs
.python.org/3/library/enum.html;
• Reddit, comment on “If they did make a python 4, what changes from
python 3 would you like to see?”, https://www.reddit.com/r/Python/comm
ents/ltaf3y/if_they_did_make_a_python_4_what_changes_from/gowuau
5?utm_source=share&utm_medium=web2x&context=3.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 12


Online references last consulted on the 1st of March of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 13


Assignment expressions and
the walrus operator :=

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/assig


nment-expressions-and-the-walrus-operator.)

Walrus operator and assignment expressions


The walrus operator is written as := (a colon and an equality sign) and was first
introduced in Python 3.8. The walrus operator is used in assignment expres-
sions, which means assignments can now be used as a part of an expression,
whereas before Python 3.8 the assignments were only possible as statements.
An assignment statement assigns a value to a variable name, and that is it. With
an assignment expression, that value can then be immediately reused. Here is
an example of the difference:
>>> a = 3
>>> print(a)
3
>>> print(b = 3)

14
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'b' is an invalid keyword argument for print()
>>> print(b := 3)
3
>>> b
3
As shown in PEP 572, a good usage of assignment expressions can help write
better code: code that is clearer and/or runs faster.
Assignment expressions should be avoided when they make the code too convo-
luted, even if it saves you a couple of lines of code. You don’t want to disrespect
the Zen of Python, and the Zen of Python recommends writing readable code.
The snippet of code below features what is, in my opinion, a fairly unreadable
usage of an assignment expression:
import sys

if (i := input())[0] == "q" or i == "exit":


sys.exit()
I think a better alternative would have been
import sys

i = input()
if i[0] == "q" or i == "exit":
sys.exit()
The second alternative (without :=) is much easier to read than the first one,
even though using := saved one line of code.
However, good uses of assignment expressions can
• make your code faster,
• make it more readable/expressive, and
• make your code shorter.

Examples in code
Here are a couple of examples of good usages of assignment expressions.

Controlling a while loop with initialisation


Consider the following while loop:
inp = input()
while inp:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 15


eval(inp)
inp = input()
This code can be used to create a very basic Python repl inside your Python
program, and the REPL stops once you give it an empty input, but notice that it
features some repetition. First, you have to initialise inp, because you want to
use it in your while condition, but then you also have to update inp inside your
while loop.
With an assignment expression, the above can be rewritten as:
while inp := input(" >> "):
eval(inp)
This not only makes the code shorter, but it makes it more expressive, by making
it blatantly clear that it is the user input provided by input() that is controlling
the while loop.

Reducing visual noise


Say you want to count the number of trailing zeroes in an integer. An easy way
to do so would be to convert the integer to a string, find its length, and then
subtract the length of that same string with all its trailing zeroes removed. You
could write it like so:
def trailing_zeroes(n):
s = str(n)
return len(s) - len(s.rstrip("0"))
However, for a function so simple and so short, it kind of looks sad to have such
a short s = str(n) line represent half of the body of the trailing_zeroes
function. With an assignment expression, you can rewrite the above as
def trailing_zeroes(n):
return len(s := str(n)) - len(s.rstrip("0"))
The function above can be read as “return the length of the string s you get from
n, minus the length of s without trailing zeroes”, so the assignment expression
doesn’t hurt the readability of the function and, in my opinion, improves it. Feel
free to disagree, of course, as this is not an objective matter.

Reuse computations in list comprehensions


Suppose you are writing a list comprehension with an if filter, but the filter test
in the comprehension uses a value that you also want to use in the list itself.
For example, you have a list of integers, and want to keep the factorials of the
numbers for which the factorial has more than 50 trailing zeroes.
You could do this like so:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 16


from math import factorial as fact

l = [3, 17, 89, 15, 58, 193]


facts = [fact(num) for num in l if trailing_zeroes(fact(num)) > 50]
The problem is that the code above computes the factorial for each number
twice, and if the numbers get big, this can become really slow. Using assignment
expressions, this could become
from math import factorial as fact

l = [3, 17, 89, 15, 58, 193]


facts = [f for num in l if trailing_zeroes(f := fact(num)) > 50]
The use of := allows to reuse the expensive computation of the factorial of num.
Two other similar alternatives, without assignment expressions, would be
from math import factorial as fact

l = [3, 17, 89, 15, 58, 193]


## Alternative 1
facts = [fact(num) for num in l]
facts = [num for num in facts if trailing_zeroes(num) > 50]
## Alternative 2
facts = [num for num in map(fact, l) if trailing_zeroes(num) > 50]
Notice that the second one can be more memory efficient if your list l is large:
the first alternative first computes the whole list of factorials, whereas the
second alternative only computes the factorials as they are needed. (I’ll write
more about this in a later Pydon’t, subscribe so you don’t miss it!)

Flattening related logic


Imagine you reach a point in your code where you need to pick an operation to
do to your data, and you have a series of things you would like to try. But you
also would like to stick to the first one that works. As a very simple example,
suppose we have a string that may contain an email or a phone number, and
you would like to extract the email and, in case you find none, you look for the
phone number. (For the sake of simplicity, let’s assume phone numbers are 9
digits long and let’s also consider simple .com emails with only letters.)
You could do something like:
import re

string = input("Your contact info: >> ")


email = re.search(r"\b(\w+@\w+\.com)\b", string)
if email:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 17


print(f"Your email is {email.group(1)}.")
else:
phone = re.search(r"\d{9}", string)
if phone:
print(f"Your phone is {phone.group(0)}.")
else:
print("No info found...")
Notice the code above is nested, but the logic is flat: we look for successive
things and stop as soon as we find something. With assignment expressions
this could be rewritten as:
import re

string = input("Your contact info: >> ")


if email := re.search(r"\b(\w+@\w+\.com)\b", string):
print(f"Your email is {email.group(1)}.")
elif phone := re.search(r"\d{9}", string):
print(f"Your phone is {phone.group(0)}.")
else:
print("No info found...")

Conclusion
Assignment expressions allow the binding of a name to a part of an expression,
which can be used to great benefit in clarifying the flow of some programs or
saving time on expensive computations, for example. Bad usages of assignment
expressions, however, can make code very unreadable and is therefore crucial
to judge whether or not an assignment expression is a good fit for a particular
task.

References
• Python 3 Documentation, What’s New in Python, What’s new in Python 3.8
- Assignment expressions, https://docs.python.org/3/whatsnew/3.8.htm
l#assignment-expressions.
• PEP 572 – Assignment Expressions, https://www.python.org/dev/peps/pe
p-0572.
• Real Python, “Assignment Expressions: The Walrus Operator”, https://real
python.com/lessons/assignment-expressions/.
Online references consulted on the 26th of January of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 18


Truthy, Falsy, and bool

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/truthy-


falsy-and-bool.)

“Truthy” and “Falsy”


Quoting the Python documentation,

19
“Any object can be tested for truth value, for use in an if or while
condition or as operand of the Boolean operations below [or, and
and not].”
What does that mean? It means that we can use any Python object we want
whenever a boolean value is expected. Boolean values (True and False) are
used in conditions, which pop up in if statements and while statements, as
well as in expressions that make use of the Boolean operators or, and and not.
As a very basic example, consider this Python session:
>>> if True:
... print("Hello, World!")
...
Hello, World!
>>> if False:
... print("Go away!")
...
>>>
This piece of code should not surprise you, as it is very standard Python code:
there are a couple of if statements that make use of explicit Boolean values.
The next step is using an expression that evaluates to a Boolean value:
>>> 5 > 3
True
>>> if 5 > 3:
... print("Hello, World!")
...
Hello, World!
The next step is using an object that is not a Boolean value, which is what this
blog post is all about:
>>> l = [1, 2, 3]
>>> if l:
... print(l)
...
[1, 2, 3]
This is the part that could be surprising if you have never encountered it. The
reason this if statement is getting executed is because the list [1, 2, 3] is
Truthy, that is, the list [1, 2, 3] can be interpreted as True in a Boolean context.
How can you know if an object is “Truthy” or “Falsy”? The simplest way is to use
the built-in bool function that converts any Python object to a Boolean:
>>> bool(l)
True
The way this works is really simple! There are a couple of rules that specify how
this works, but these simple rules can even be simplified further with a simpler

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 20


heuristic:
“A value of a given type if Falsy when it is “empty” or “without any
useful value”.”
Examples of built-in types and their Falsy values include the empty list, empty
set, empty tuple, empty dictionary, the number 0, None and the empty string. For
example:
>>> bool([])
False
>>> bool("")
False
Of course, “without any useful value” definitely depends on what you intend to
do with the value you have, so I should really specify the objective rules:
• By default, a value is Truthy (that is, is interpreted as True).
• An object has a Falsy value (that is, is interpreted as False) if calling len
on it returns 0.
Notice that the previous rule tells us that, in general, types that are containers
or sequences (types of objects for which it generally makes sense to use len
on), are considered Falsy when they are empty, i.e., when they have length equal
to zero. But there is one more case that gives a Falsy value:

The __bool__ dunder method


• An object has a Falsy value (that is, is interpreted as False) if it defines a
__bool__ method that returns False.
__bool__ is a dunder method (dunder stands for double underscore) that you
can use to tell your objects if they are Truthy or Falsy in Boolean contexts, by
implementing it in your own classes. (You have seen other dunder methods
already.)
! If you are not acquainted with Python’s dunder methods, you may want to
subscribe ! to the Pydon’t newsletter, I will write more about them later. ! Until
then, you may want to have a look at the Python 3 Docs and what they say ! about
the data model.
Here is a simple example showing an object that is always taken to be Truthy:
>>> class A:
... pass
...
>>> a = A()
>>> if a:
... print("Hello, World!")

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 21


...
Hello, World!
On the opposite end, we can consider a class whose objects will always be taken
to be Falsy:
>>> class A:
... def __bool__(self):
... return False
...
>>> a = A()
>>> if a:
... print("Go away!")
...
In general, your use case may be such that your object sometimes is Truthy and
sometimes is Falsy.
Finally, it is very important to state the order in which the rules apply!
! When given an arbitrary Python object that needs to be tested for ! a truth
value, Python first tries to call bool on it, in an attempt ! to use its __bool__
dunder method. ! If the object does not implement a __bool__ method, then
Python ! tries to call len on it. ! Finally, if that also fails, Python defaults to
giving a Truthy ! value to the object.

Remarks
Now a couple of remarks about the functioning of Truthy and Falsy values.

A note about containers with falsy objects


We said that things like the empty list, zero, and the empty dictionary are Falsy.
However, things like a list that only contains zeroes or a dictionary composed
of zeroes and empty lists are not Falsy, because the containers themselves are
no longer empty:
>>> bool([])
False
>>> bool({})
False
>>> bool(0)
False
>>> bool([0, 0, 0]) # A list with zeroes is not an empty list.
True
>>> bool({0: []}) # A dict with a 0 key is not an empty dict.
True

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 22


A note about checking for None
As mentioned above, None is Falsy:
>>> bool(None)
False
>>> if None:
... print("Go away!")
...
This seems about right, as None is the go-to value to be returned by a function
when the function does nothing.
Imagine someone implemented the following function to return the integer
square root of a number, returning None for negative inputs (because negative
numbers do not have a square root in the usual sense):
import math
def int_square_root(n):
if n < 0:
return None
return math.floor(math.sqrt(n))
When you use the function above you know it returns None if the computation
fails, so now you might be tempted to use your newfound knowledge about the
Falsy value of None, and you might write something like the following, to check
if the computation succeeded:
n = int(input("Compute the integer square root of what? >> "))
int_sqrt = int_square_root(n)
if not int_sqrt:
print("Negative numbers do not have an integer square root.")
Now, what happens if n is 0 or 0.5?
>>> n = 0.5
>>> int_sqrt = int_square_root(n)
>>> if not int_sqrt:
... print("Negative numbers do not have an integer square root.")
...
Negative numbers do not have an integer square root
Which is clearly wrong, because n = 0.5 is certainly positive. Let us inspect
int_sqrt:
>>> int_sqrt
0
The problem is that int_square_root returned a meaningful value (that is, it
did not return None) but that meaningful value is still Falsy. When you want to

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 23


check if a function returned None or not, do not rely on the Truthy/Falsy value of
the return value. Instead, check explicitly if the return value is None or not:
## Use # Avoid
if returned is None: # if not returned:
# ... # # ...
if returned is not None: # if returned:
# ... # # ...
This recommendation is to avoid problems like the one outlined above.

Examples in code
Now I will show you some examples of places where using the Truthy and Falsy
values of Python objects allows you to write more Pythonic code.

2D point
Let us implement a simple class to represent points in a 2D plane, which could
be an image, a plot or something else. Retrieving what we already had in the
article about __str__ and __repr__, we can add a __bool__ method so that
the origin (the point Point2D(0, 0)) is Falsy and all other points are Truthy:
## Retrieved from https://mathspp.com/blog/pydonts/pydont-confuse-str-and-repr
class Point2D:
"""A class to represent points in a 2D space."""

def __init__(self, x, y):


self.x = x
self.y = y

def __str__(self):
"""Provide a good-looking representation of the object."""
return f"({self.x}, {self.y})"

def __repr__(self):
"""Provide an unambiguous way of rebuilding this object."""
return f"Point2D({repr(self.x)}, {repr(self.y)})"

def __bool__(self):
"""The origin is Falsy and all other points are Truthy."""
return self.x or self.y

print(bool(Point2D(0, 1))) # True


print(bool(Point2D(0, 0))) # False

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 24


print(bool(Point2D(1, 0))) # True
print(bool(Point2D(4, 2))) # True
Notice how we defined the Truthy/Falsy value of the Point2D in terms of the
Truthy/Falsy values of its components! We want the Point2D to be Falsy when
self.x is 0 and self.y is also 0, which means a Point2D is Truthy if any of
self.x or self.y are Truthy (that is, different from 0)!

Handling error codes or error messages


It is quite common for functions to return “error codes”: integers that encode
specific things that did not go quite right, or for such functions to return error
messages as strings when things don’t go right. These error codes are usually
such that returning 0 means everything went ok, while different other integers
can mean all sorts of problems.
If you are calling such a function, you can use the Truthy value of strings and/or
integers to check if something went wrong, and to handle it accordingly.
As a generic example, this is the pattern we are looking for:
return_value, error_code = some_nice_function()
if error_code:
# Something went wrong, act accordingly.

## Alternatively, something like:


return_value, error_msg = some_other_nice_function()
if error_msg:
print(error_msg)
# Something went wrong, act accordingly.

Processing data
It is also very common to use Truthy and Falsy values to measure if there is still
data to be processed.
For example, when I talked about the walrus operator :=, we saw a while loop
vaguely similar to this one:
input_lines = []
while (s := input()):
input_lines.append(s)
## No more lines to read.
print(len(input_lines))
This while loop essentially reads input lines while there are lines to be read. As
soon as the user inputs an empty line "", the loop stops and we print the number
of lines we read:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 25


>>> input_lines = []
>>> while (s := input()):
... input_lines.append(s)
...
Line 1
Line 2

>>> print(len(input_lines))
2
Another common pattern is when you have a list that contains some data that
you have to process, and such that the list itself gets modified as you process
the data.
Consider the following example:
import pathlib

def print_file_sizes(dir):
"""Print file sizes in a directory, recursing into subdirectories."""

paths_to_process = [dir]
while paths_to_process:
path, *paths_to_process = paths_to_process
path_obj = pathlib.Path(path)
if path_obj.is_file():
print(path, path_obj.stat().st_size)
else:
paths_to_process += path_obj.glob("*")
This is not necessarily the way to go about doing this, but notice the while
statement, and then the if: ... else: ... block that either prints something,
or extends the paths_to_process list.

Conclusion
• Python’s Truthy and Falsy values allow you to rewrite common conditions
in a way that is more readable and, therefore, Pythonic.
• You can implement your own Truthy and Falsy values in custom classes by
implementing the __bool__ dunder method.
• You should also be careful when checking if a given variable is None or not,
and avoid using the Falsy value of None in those particular cases.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 26


References
• Python 3 Documentation, The Python Language Reference, Data model,
bool, https://docs.python.org/3/reference/datamodel.html#object.__boo
l__.
• Python 3 Documentation, The Python Standard Library, Truth Value Testing,
https://docs.python.org/3/library/stdtypes.html#truth-value-testing.
• Python 3 Documentation, The Python Standard Library, Built-in Functions,
bool, https://docs.python.org/3/library/functions.html#bool.
• PEP 8 – Style Guide for Python Code, https://www.python.org/dev/peps/
pep-0008/.
• Python 3 Documentation, The Python Standard Library, File and Directory
Access, pathlib, https://docs.python.org/3/library/pathlib.html.
• Stack Overflow, Listing of all files in directory?, https://stackoverflow.co
m/a/40216619/2828287.
• Stack Overflow, How can I check file size in Python?, https://stackoverflo
w.com/a/2104107/2828287.
• freeCodeCamp, Truthy and Falsy Values in Python: A Detailed Introduction,
https://www.freecodecamp.org/news/truthy-and-falsy-values-in-python/.
Online references last consulted on the 9th of February of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 27


Deep unpacking

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/deep-


unpacking.)

Introduction
In this Pydon’t we will go over deep unpacking: - what it is; - how it works; - how
to use it to improve code readability; and - how to use it to help debug your
code.
Learning about deep unpacking will be very helpful in order to pave the road
for structural matching, a feature to be introduced in Python 3.10.

28
Assignments
Before showing you how deep unpacking works, let’s have a quick look at two
other nice features about Python’s assignments.

Multiple assignment
In Python, multiple assignment is what allows you to write things like
>>> x = 3
>>> y = "hey"
>>> x, y = y, x # Multiple assignment to swap variables.
>>> x
3
>>> y
'hey'
or
>>> rgb_values = (45, 124, 183)
>>> r, g, b = rgb_values # Multiple assignment unpacks the tuple.
>>> g
124
With multiple assignment you can assign, well, multiple variables at the same
time, provided the right-hand side has as many items as the left-hand side ex-
pects.

Starred assignment
Starred assignment, that I covered in depth in this Pydon’t, allows you to write
things like
>>> l = [0, 1, 2, 3, 4]
>>> head, *body = l
>>> print(head)
0
>>> print(body)
[1, 2, 3, 4]
>>> *body, tail = l
>>> print(tail)
4
>>> head, *body, tail = l
>>> print(body)
[1, 2, 3]
With starred assignment you can tell Python that you are not sure how many
items the right-hand side will have, but all of them can be stored in a single

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 29


place.

Deep unpacking
Deep unpacking, or nested unpacking, is similar to multiple assignment in a
sense. Multiple assignment allows you to match the length of an iterable, on
the right-hand side of an assignment, and get each element into a variable. In
a similar fashion, deep unpacking allows you to match the shape of what is on
the right-hand side of an assignment; in particular, if there are nested iterables,
you can unpack those iterables at once.
For example, using multiple assignment twice in a row, you could do this:
>>> colour_info = ("AliceBlue", (240, 248, 255))
>>> name, rgb_values = colour_info
>>> name
'AliceBlue'
>>> r, g, b = rgb_values
>>> g
248
But if you already know you want to get to the separate RGB values, you could
use deep unpacking:
>>> colour_info = ("AliceBlue", (240, 248, 255))
>>> name, (r, g, b) = colour_info
>>> name
'AliceBlue'
>>> g
248
Notice how we group the r, g, and b variables with () to create a tuple, mimicking
the shape of the colour_info variable. If we had simply written name, r, g, b
= colour_info then Python would think we are trying to do multiple assignment,
and would expect colour_info to have four items to unpack:
>>> colour_info = ("AliceBlue", (240, 248, 255))
>>> name, r, g, b = colour_info
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 4, got 2)
Our use of parenthesis in (r, g, b) tells Python we actually want to go into the
nested structure of colour_info.
This might be clearer if we actually include the outer set of parenthesis that is
usually omitted:
>>> colour_info = ("AliceBlue", (240, 248, 255))
>>> (name, (r, g, b)) = colour_info

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 30


>>> name
'AliceBlue'
>>> g
248
Now if we put the left-hand side of the assignment, (name, (r, g, b)), next
to the value it is getting, it becomes very clear what values go where:
>>> (name, (r, g, b)) = ("AliceBlue", (240, 248, 255))
!!! Did you know that in Python 2 you could use deep unpacking in function
signatures? !!! For example, this would be valid Python 2 code: !!! py !!! def
print_some_colour_info(name, (r, g, b)): !!! print name + "
has g value of " + str(g) !!! !!! print_some_colour_info("AliceBlue",
(240, 248, 255)) # prints 'AliceBlue has g value of 248' !!! !!!
This was removed with PEP 3113.

In loops
Deep unpacking can also be used in the implicit assignments of for loops, it
doesn’t have to be in explicit assignments with an equals sign! The examples
below will show you that.
Deep unpacking, when used well, can improve the readability of your code – by
removing indexing clutter and by making the intent more explicit – and can help
you test your code for some errors and bugs.
Nothing better than showing you some code, so you can see for yourself.

Examples in code
Increasing expressiveness
Given the RGB values of a colour, you can apply a basic formula to convert it to
greyscale, which weighs the R, G, and B components differently. We could write
a function that takes the colour information like we have been using, and then
computes its greyscale value:
def greyscale(colour_info):
return 0.2126*colour_info[1][0] + 0.7152*colour_info[1][1] + 0.0722*colour_info[1][2]
(This formula we are using,
[ 0.2126R + 0.7152G + 0.0722B ~ , ]
is usually the first step of a slightly more involved formula, but it will be good
enough for our purposes.)
Now you can use your function:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 31


colour = ("AliceBlue", (240, 248, 255))
print(greyscale(colour)) # prints 246.8046
But I think we can all agree that the function definition could surely be improved.
The long formula with the additions and multiplications doesn’t look very nice.
In fact, if we use deep unpacking to extract the r, g, and b values, the formula
will be spelled out pretty much like if it were the original mathematical formula
I showed:
def greyscale(colour_info):
name, (r, g, b) = colour_info
return 0.2126*r + 0.7152*g + 0.0722*b

colour = ("AliceBlue", (240, 248, 255))


print(greyscale(colour)) # still prints 246.8046
Of course, more cunning or suspicious readers might say “That is all well and
good, but you could have just defined the function to take the separate r, g, and
b values as arguments from the get-go.”. And those people are right! You could
have defined your function to be
def greyscale(r, g, b):
return 0.2126*r + 0.7152*g + 0.0722*b
But sometimes you are writing code that interacts with other people’s code, and
sometimes there are already types and formats of data that are in use, and it is
just simpler to adhere to whatever the standards are.
Now imagine that you have a list with some colours and want to compute the
greyscales. You can use deep unpacking in a for loop (and in a list comprehen-
sion too):
colours = [
("AliceBlue", (240, 248, 255)),
("Aquamarine", (127, 255, 212)),
("DarkCyan", (0, 139, 139)),
]
greyscales = [
round(0.2126*r + 0.7152*g + 0.0722*b, 2) for name, (r, g, b) in colours
]
print(greyscales) # [246.8, 224.68, 109.45]

Catching bugs
I said earlier that deep unpacking can also help you find bugs in your code.
It is not hard to believe that the colours list of the previous example could
have come from some other function, for example a function that scrapes the
webpage I have been checking, and creates those tuples with colour informa-
tion.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 32


Let us pretend for a second that my web scraper isn’t working 100% well yet,
and so it ended up producing the following list, where it read the RGB values of
two colours into the same one:
colours = [
("AliceBlue", (240, 248, 255, 127, 255, 212)),
("DarkCyan", (0, 139, 139)),
]
If we were to apply the original greyscale function to colours[0], the function
would just work:
def greyscale(colour_info):
return 0.2126*colour_info[1][0] + 0.7152*colour_info[1][1] + 0.0722*colour_info[1][2]

colours = [
("AliceBlue", (240, 248, 255, 127, 255, 212)),
("DarkCyan", (0, 139, 139)),
]

print(greyscale(colours[0])) # 246.8046
However, if you were to use the function that uses deep unpacking, then this
would happen:
def greyscale(colour_info):
name, (r, g, b) = colour_info
return 0.2126*r + 0.7152*g + 0.0722*b

colours = [
("AliceBlue", (240, 248, 255, 127, 255, 212)),
("DarkCyan", (0, 139, 139)),
]

print(greyscale(colours[0])) # ValueError: too many values to unpack (expected 3)


Deep unpacking expects the shapes to be correct, and so the part (r, g, b)
tells Python it expects a nested iterable with three elements, but all of a sudden
Python tries to give it six numbers and it complains! Hitting this error, you would
realise something is weird in your code and eventually you will find the bug!
All in all, deep unpacking (or the chance to use it) isn’t something you come
across very often, but when you do, it is nice knowing how to use it to your
advantage.

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 33


“Use deep unpacking to improve readability and to keep the shape
of your variables in check.”
This Pydon’t showed you that:
• Python’s assignments have plenty of interesting features;
• deep unpacking can prevent cluttering your code with hardcoded indexing;
• deep unpacking improves the readability of your code; and
• some bugs related to iterable shape can be caught if using deep unpack-
ing.

References
• PEP 634 – Structural Pattern Matching: Specification, https://www.python
.org/dev/peps/pep-0634/;
• PEP 3113 – Removal of Tuple Parameter Unpacking, https://www.python.o
rg/dev/peps/pep-3113/;
• Multiple assignment and tuple unpacking improve Python code readability,
https://treyhunner.com/2018/03/tuple-unpacking-improves-python-
code-readability/#Using_a_list-like_syntax;
• Unpacking Nested Data Structures in Python, https://dbader.org/blog/py
thon-nested-unpacking;
• W3Schools, HTML Color Names, https://www.w3schools.com/colors/color
s_names.asp;
• Wikipedia, Grayscale, Converting color to grayscale, https://en.wikipedia
.org/wiki/Grayscale#Converting_color_to_grayscale.
Online references last consulted on the 23rd of February of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 34


Unpacking with starred
assignments

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/unpac


king-with-starred-assignments.)

Starred Assignment
It is fairly common to have a list or another iterable that you want to split in the
first element and then the rest. You can do this by using slicing in Python, but
the most explicit way is with starred assignments.

35
This feature was introduced in PEP 3132 – Extended Iterable Unpacking and
allows for the following:
>>> l = [1, 2, 3, 4, 5]
>>> head, *tail = l
>>> head
1
>>> tail
[2, 3, 4, 5]
This starred assignment is done by placing one * to the left of a variable name in
a multiple assignment, and by having any iterable on the right of the assignment.
All variable names get a single element and the variable name with the “star”
(the asterisk *) gets all other elements as a list:
>>> string = "Hello!"
>>> *start, last = string
>>> start
['H', 'e', 'l', 'l', 'o']
>>> last
'!'
You can have more than two variable names on the left, but only one asterisk:
>>> a, b, *c, d = range(5) # any iterable works
>>> a
0
>>> b
1
>>> c
[2, 3]
>>> d
4
When you use the starred assignment, the starred name might get an empty list,
>>> a, *b = [1]
>>> a
1
>>> b
[]
and an error is issued if there are not enough items to assign to the names that
are not starred:
>>> a, *b = []
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected at least 1, got 0)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 36


Examples in code
Here are a couple of examples in some actual code, to give more context.

reduce from functools


Imagine you wanted to implement a function akin to the reduce function from
functools (you can reads its documentation here).
Here is how an implementation might look like, using slices:
def reduce(function, list_):
"""Reduce the elements of the list by the binary function."""

if not list_:
raise TypeError("Cannot reduce empty list.")
value = list_[0]
list_ = list_[1:]
while list_:
value = function(value, list_[0])
list_ = list_[1:]
return value
And here is an equivalent implementation using starred assignment:
def reduce(function, list_):
"""Reduce the elements of the list by the binary function."""

if not list_:
raise TypeError("Cannot reduce empty list.")
value, *list_ = list_
while list_:
val, *list_ = list_
value = function(value, val)
return value
The usage of the starred assignment here makes it abundantly clear that we
wish to unpack the list into an item to be used now and the rest to be used later.
Another similar example, but with the starred name in the beginning, follows.

Credit card check digit


The Luhn Algorithm is used to compute a check digit for things like credit card
numbers or bank accounts.
Let’s implement a function that verifies if the check digit is correct, according
to the Luhn Algorithm, and using starred assignment to separate the check digit
from all the other digits:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 37


def verify_check_digit(digits):
"""Use the Luhn algorithm to verify the check digit."""

*digits, check_digit = digits


weight = 2
acc = 0
for digit in reversed(digits):
value = digit * weight
acc += (value // 10) + (value % 10)
weight = 3 - weight # 2 -> 1 and 1 -> 2
return (9 * acc % 10) == check_digit

## Example from Wikipedia.


print(verify_check_digit([7, 9, 9, 2, 7, 3, 9, 8, 7, 1, 3])) # True
Maybe it is not obvious to you what the function does just by looking at it, but it
should be very clear that the line *digits, check_digit = digits splits the
list digits into the items in the beginning and the final digit.
How would you implement the function above, using slices and indexing? An
example could be like so:
def verify_check_digit(digits):
"""Use the Luhn algorithm to verify the check digit."""

weight = 2
acc = 0
for digit in reversed(digits[:-1]):
value = digit * weight
acc += (value // 10) + (value % 10)
weight = 3 - weight # 2 -> 1 and 1 -> 2
return (9 * acc % 10) == digits[-1]

## Example from Wikipedia.


print(verify_check_digit([7, 9, 9, 2, 7, 3, 9, 8, 7, 1, 3])) # True
This also works, but looks a bit more confusing. Notice we have two similar in-
dexing operations, but one is actually a slice while the other is a proper indexing.
In the for loop we have a reversed(digits[:-1]) while in the return value we
have ... == digits[-1]. If I am not paying enough attention, I won’t notice
those are different things. Of course it is my fault that I am not paying enough
attention, but when I’m writing code, I prefer for my code to be as clear as
possible: I don’t want the reader to spend too much time reading the code, I
prefer them to spend time studying the algorithms.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 38


References
• PEP 3132 – Extended Iterable Unpacking, https://www.python.org/dev/p
eps/pep-3132/
• Python 3.9.1 Documentation, The Python Standard Library, Functional Pro-
gramming Modules, functools, https://docs.python.org/3/library/functo
ols.html#functools.reduce [consulted on the 12th of January of 2021].
• Luhn Algorithm, Wikipedia, https://en.wikipedia.org/wiki/Luhn_algorithm.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 39


EAFP and LBYL coding styles

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/eafp-


and-lbyl-coding-styles.)

EAFP and LBYL


“EAFP” is an acronym that stands for “Easier to Ask for Forgiveness than Permis-
sion”, a coding practice that is more or less the opposite of the “LBYL”, which
stands for “Look Before You Leap”.
LBYL means you first check if a given operation can be made successfully, and
then proceed to do it. For example, if you want to ask the user for a number
whose default value will be 1, you can use the code
print("Type a positive integer (defaults to 1):")
s = input(" >> ")
if s.isnumeric():

40
n = int(s)
else:
n = 1
(In the code above, we use the method str.isnumeric to check if the string
is a valid integer. Try running print(str.isnumeric.__doc__) in your Python
REPL.)
With EAFP, you first try to perform whatever operation it is you want to do, and
then use a try block to capture an eventual exception that your operation might
throw in case it is not successful. In our example, this means we simply try to
convert s into an integer and in case a ValueError exception is raised, we set
the default value:
print("Type a positive integer (defaults to 1):")
s = input(" >> ")
try:
n = int(s)
except ValueError:
n = 1
We use except ValueError because a ValueError is the exception that is
raised if you try to convert to integer a string that doesn’t contain an integer:
>>> int("345")
345
>>> int("3.4")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '3.4'
>>> int("asdf")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'asdf'

EAFP instead of LBYL?


Writing code that follows the EAFP style can be advantageous in several situ-
ations, and I will present them now.

Avoid redundancy
Sometimes, coding with EAFP in mind allows you to avoid redundancy in your
code. Imagine you have a dictionary from which you want to extract a value
associated with a key, but that key might not exist.
With LBYL, you would do something like:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 41


d = {"a": 1, "b": 42}
print("What key do you want to access?")
key = input(" >> ")
if key in d:
print(d[key])
else:
print(f"Cannot find key '{key}'")
If the key that was entered exists in the dictionary, this code performs two ac-
cesses to the dictionary: the first checks if key exists as a key, and the second
retrieves its value. This is more or less like you opening a box to see if it con-
tains something and closing it. Then, if the box was not empty, you open it again
and remove whatever is inside. Would you do this in real life?
With EAFP, you can open the box and immediately empty it if you find something
inside:
d = {"a": 1, "b": 42}
print("What key do you want to access?")
key = input(" >> ")
try:
print(d[key])
except KeyError:
print(f"Cannot find key '{key}'")
Still aligned with the EAFP mindset is a method that you should know about:
dict.get! This operation I described is so common that dictionaries even
come with a method that have a EAFP-like behaviour for when you want to take
a value associated with a key, and use a default value if the key is not present:
d = {"a": 1, "b": 42}
print("What key do you want to access?")
key = input(" >> ")
print(d.get(key, None))
Try running the code above and type in keys that don’t exist in your dictionary d.
Notice that None gets printed in those cases.

EAFP can be faster


If failing is expected to happen not very often, then EAFP is faster: you just run
a piece of code (your operation) instead of two (the “look” and the “leap”).
As an example, let’s go over the code from the example image above, using the
timeit module to see what option is faster when the input can be converted to
an integer:
>>> import timeit
>>> eafp = """s = "345"

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 42


... try:
... n = int(s)
... except ValueError:
... n = 0"""
>>> timeit.timeit(eafp)
0.1687019999999393
Here we define s as an integer immediately so that the timing does not have to
take into account the time it takes for me to type an integer. Also, the timeit
function is running the code a bunch of times and I don’t want to have to type
one million integers in the console.
Now, compare it with the LBYL approach:
>>> lbyl = """s = "345"
... if s.isnumeric():
... n = int(s)
... else:
... n = 0"""
>>> timeit.timeit(lbyl)
0.30682630000001154
The LBYL approach took almost twice the time. If you can make it so that the
operation fails very rarely, then you are saving time by using a EAFP approach.

LBYL may still fail


When interacting with the environment, for example with the Internet or with
the OS, in between the time it takes for you to do your safety check and then
perform the operation, circumstances may change and your operation may no
longer be viable.
For example, imagine you have a script that is reading some files. You can only
read a file that exists, obviously, so an LBYL approach could entail writing code
like
import pathlib

print("What file should I read?")


filepath = input(" >> ")
if pathlib.Path(filepath).exists():
with open(filepath, "r") as f:
contents = f.read()
# Do something with the contents.
else:
print("Woops, the file does not exist!")
If your script is in a computer that can be accessed by several users, or if there
are other scripts working with the file system, your if statement might evaluate

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 43


to True because the file was found, but then an external agent might delete the
file and your with statement fails, raising an error and breaking your code. If
you are writing critical code, this possibility has to be taken into account. Or if
the code your executing after the check takes a long time to run.
If you use an EAFP approach, the code either reads the file or doesn’t, but both
cases are covered:
print("What file should I read?")
filepath = input(" >> ")
try:
with open(filepath, "r") as f:
contents = f.read()
except FileNotFoundError:
print("Woops, the file does not exist!")
else:
# Do something with the contents.
pass
The else in the try block above ensures you only run the code that processes
the contents if you are able to read the file. (I’ll write a Pydon’t about this, don’t
worry!)

Catch many types of fails


If you are trying to perform a complex operation that might fail in several ways, it
might be easier to just enumerate the exceptions that might be raised instead of
writing a really, really long if statement that performs all the necessary checks
in advance.
For example, if you want to call a third party function that might throw several
different exceptions, it is fairly simple to write an elegant try block that covers
all the cases that might arise.
Imagine you have a function that takes a string, representing an integer, and
then returns its inverse, but the person who wrote it performs no checks: just
assumes the string represents an integer, converts it with int and then divides
1 by that integer:
def get_inverse(num_str):
return 1 / int(num_str)
You want to use that function in your code after asking for user input, but you
notice the user might type something that is not an integer, or the user might
type a 0, which then gives you a ZeroDivisionError. With an EAFP approach,
you write:
print("Type an integer:")
s = input(" >> ")

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 44


try:
print(get_inverse(s))
except ValueError:
print("I asked for an integer!")
except ZeroDivisionError:
print("0 has no inverse!")
How would you do this with LBYL? Maybe
print("Type an integer:")
s = input(" >> ")
if s.isnumeric() and s != "0":
print(get_inverse(s))
elif not s.isnumeric():
print("I asked for an integer!")
else:
print("0 has no inverse!")
But now you are using the function isnumeric twice. And isnumeric doesn’t
even work for negative integers. And what if the user types something like "
3"? isnumeric fails, but this is still an integer that int can convert! Or what if
the user types "000"? This still evaluates to 0… I hope you get my point by now.

Conclusion
EAFP code is a very good alternative to LBYL code, even being superior in vari-
ous alternatives, like the ones I mentioned above. When writing code, try to
weigh the different pros and cons of the several approaches you can take, and
don’t forget to consider writing EAFP code!
EAFP is not the absolute best way to go in every single situation, but EAFP code
can be very readable and performant!

References
• PEP 463 – Exception-catching expressions, https://www.python.org/dev
/peps/pep-0463/
• Python 3 Documentation, The Python Standard Library, Debugging and
Profiling, timeit, https://docs.python.org/3/library/timeit.html.
• Python 3 Documentation, The Python Tutorial, Errors and Exceptions, https:
//docs.python.org/3/tutorial/errors.html.
• Microsoft Devblogs, Idiomatic Python: EAFP versus LBYL, https://devblo
gs.microsoft.com/python/idiomatic-python-eafp-versus-lbyl/.
• Stack Overflow, “What is the EAFP principle in Python?”, https://stackove
rflow.com/questions/11360858/what-is-the-eafp-principle-in-python.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 45


• Stack Overflow, “Ask forgiveness not permission - explain”, https://stacko
verflow.com/questions/11360858/what-is-the-eafp-principle-in-python.
Online references consulted on the 19th of January of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 46


Zip up

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/zip-up.)

Introduction
One of the things I appreciate most about Python, when compared to other pro-
gramming languages, is its for loops. Python allows you to write very expressive
loops, and part of that is because of the built-in zip function.
In this article you will
• see what zip does;
• get to know a new feature of zip that is coming in Python 3.10;
• learn how to use zip to create dictionaries; and
• see some nice usage examples of zip.

47
How zip works
In a simple for loop, you generally have an iterator it and you just write some-
thing like
for elem in it:
# Do something with elem
print(elem)
An “iterator” is something that can be traversed linearly, of which a list is the
simplest example. Another very common iterator used in Python’s for loops is
a range:
for n in range(10):
print(n**2)
Sometimes you will have two or more iterators that contain related information,
and you need to loop over those iterators to do something with the different bits
of information you got.
In the example below, we have a list of first and last names of people and we
want to print the full names. The naïve solution would be to use a range to
traverse all the indices and then index into the lists:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> for i in range(len(firsts)):
... print(f"'{firsts[i]} {lasts[i]}'")
...
'Anna Smith'
'Bob Doe'
'Charles Evans'
This does the job, but a for loop like this only hints at the fact that you are
probably going to access the values in firsts, because you wrote
range(len(firsts))
but turns out you also want to access the items in lasts. This is what zip is for:
you use it to pair up iterables that you wanted to traverse at the same time:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> for first, last in zip(firsts, lasts):
... print(f"'{first} {last}'")
...
'Anna Smith'
'Bob Doe'
'Charles Evans'

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 48


Notice that you can specify two iterating variables in the for loop, in our case
first and last, and each variable will take the successive values of the respect-
ive iterator.
This is a special case of an unpacking assignment, because zip is actually pro-
ducing tuples with the names in them:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> for z in zip(firsts, lasts):
... print(z)
...
('Anna', 'Smith')
('Bob', 'Doe')
('Charles', 'Evans')
What we are doing is taking that tuple and assigning each portion:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> for z in zip(firsts, lasts):
... first, last = z
... print(f"'{first} {last}'")
...
'Anna Smith'
'Bob Doe'
'Charles Evans'
But instead of the intermediate step, we unpack right in the for statement. This
unpacking, tied with good naming of variables, allows for loops to be read in
plain English.
For example, the loop from before was
for first, last in zip(firsts, lasts):
and that can be read as
“For each first and last [name] in the lists firsts and lasts…”

Zip is lazy
One thing to keep in mind is that zip doesn’t create the tuples immediately. zip
is lazy, and that means it will only compute the tuples when you ask for them, for
example when you iterate over them in a for loop (like in the examples above)
or when you convert the zip object into a list:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans", "Rivers"]
>>> z = zip(firsts, lasts)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 49


>>> z
<zip object at 0x0000019F56702680>
>>> list(z)
[('Anna', 'Smith'), ('Bob', 'Doe'), ('Charles', 'Evans')]
zip being lazy also means that zip by itself isn’t that similar to a list. For ex-
ample, you cannot ask what is the length of a zip object:
>>> len(z)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'zip' has no len()

Three is a crowd
We have seen zip with two arguments, but zip can take an arbitrary number of
iterators and will produce a tuple of the appropriate size:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> middles = ["Z.", "A.", "G."]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> for z in zip(firsts, middles, lasts):
... print(z)
...
('Anna', 'Z.', 'Smith')
('Bob', 'A.', 'Doe')
('Charles', 'G.', 'Evans')

>>> prefixes = ["Dr.", "Mr.", "Sir"]


>>> for z in zip(prefixes, firsts, middles, lasts):
... print(z)
...
('Dr.', 'Anna', 'Z.', 'Smith')
('Mr.', 'Bob', 'A.', 'Doe')
('Sir', 'Charles', 'G.', 'Evans')

Mismatched lengths
zip will always return a tuple with as many elements as the arguments it received,
so what happens if one of the iterators is shorter than the others?
If zip’s arguments have unequal lengths, then zip will keep going until it ex-
hausts one of the iterators. As soon as one iterator ends, zip stops producing
tuples:
>>> firsts = ["Anna", "Bob", "Charles"]

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 50


>>> lasts = ["Smith", "Doe", "Evans", "Rivers"]
>>> for z in zip(firsts, lasts):
... print(z)
...
('Anna', 'Smith')
('Bob', 'Doe')
('Charles', 'Evans')
Starting with Python 3.10, zip will be able to receive a keyword argument named
strict that you can use to tell zip to error if the lengths of the iterators do not
match:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans", "Rivers"]
>>> for z in zip(firsts, lasts, strict=True): # strict=True available in Python >= 3.10
... print(z)
...
('Anna', 'Smith')
('Bob', 'Doe')
('Charles', 'Evans')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: zip() argument 2 is longer than argument 1
Notice that zip only errors when it finds the length mismatch, it doesn’t do the
check in the beginning: this is because the arguments to zip may themselves
be lazy iterators.
! (Lazy) iterators will be covered in further Pydon’ts, ! so be sure to subscribe to
the Pydon’t newsletter to stay tuned!
In general, zip is used with iterators that are expected to have the same length.
If that is the case – if you expect your iterators to have the same length – then
it is a good idea to always set strict=True, because that will help you catch
bugs in your code.

Create a dictionary with zip


You can create dictionaries in Python by feeding key-value pairs to the dict
function, which means zip is a prime way of creating dictionaries when you
have all the keys in an iterator and all the values in another iterator:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> dict(zip(firsts, lasts))
{'Anna': 'Smith', 'Bob': 'Doe', 'Charles': 'Evans'}

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 51


Examples in code
Now you will see some usages of zip in actual Python code.

Snake game
A friend of mine is learning Python and he started creating a replica of the game
of Snake. There is a certain point in the game where he has a menu and he wants
to display thumbnails of the “maps” that that can be played on, and he has those
images in a list called lvlpictures. At the same time, he has the positions of
where those images should go in a list called self.buttons. In order to display
the thumbnails in the correct positions, he has to call a function called blit,
which expects the image and the position the image should go to.
Here is the loop he wrote before knowing about zip:
for i in range(len(lvlpictures)):
self.surface.blit(lvlpictures[i], (self.buttons[i][0]+2,self.buttons[i][1]+2))
And here is the loop he wrote after knowing about zip:
for pic, btn in zip(lvlpictures, self.buttons):
self.surface.blit(pic, (btn[0] + 2, btn[1] + 2))
Notice that using zip makes your code shorter and it also makes more clear the
intent of processing the pictures and the buttons together. Finally, when Python
3.10 is released, he may even add the strict=True keyword argument, because
he expects lvlpictures and self.buttons to have the same length.

Matching paths
If you are not aware of it, then you might be interested in knowing that Python
has a module named pathlib that provides facilities to deal with filesystem
paths.
When you create a path, you can then check if it matches a given pattern:
>>> from pathlib import PurePath
>>> PurePath('a/b.py').match('*.py')
True
>>> PurePath('/a/b/c.py').match('b/*.py')
True
>>> PurePath('/a/b/c.py').match('a/*.py')
False
If you take a look at this match function, you find this:
class PurePath(object):
# ...

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 52


def match(self, path_pattern):
"""
Return True if this path matches the given pattern.
"""
# code omitted for brevity
for part, pat in zip(reversed(parts), reversed(pat_parts)):
if not fnmatch.fnmatchcase(part, pat):
return False
return True
The code omitted does some checks that allow the function to tell right away
that there is no match. The for loop that I am showing you makes use of zip
to pair each part of the path with each part of the pattern, and then we check if
those match with fnmatch.fnmatchcase.
Try adding a couple of prints here:
class PurePath(object):
# ...

def match(self, path_pattern):


"""
Return True if this path matches the given pattern.
"""
# code omitted for brevity

print(parts) # added by hand to check what is going on.


print(pat_parts) # same here.
for part, pat in zip(reversed(parts), reversed(pat_parts)):
if not fnmatch.fnmatchcase(part, pat):
return False
return True
And then rerun the examples from the documentation:
>>> from pathlib import PurePath
>>> PurePath('a/b.py').match('*.py')
['a', 'b.py'] # parts
['*.py'] # pat_parts
True
>>> PurePath('/a/b/c.py').match('b/*.py')
['\\', 'a', 'b', 'c.py']
['b', '*.py']
True
>>> PurePath('/a/b/c.py').match('a/*.py')
['\\', 'a', 'b', 'c.py']
['a', '*.py']
False

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 53


It should become clearer what parts and pat_parts actually do, and it should
become clearer why we zip them up together.
This is a nice example of when using strict=True makes no sense, because it
may happen that the path and the pattern have a different number of parts, and
that is perfectly fine.

Writing a CSV file


The Python Standard Library comes with a module, csv, to read and write CSV
files. Among other things, it provides you with the classes DictReader and
DictWriter for when you want to use the header of the CSV file to read the
data rows like dictionaries or for when you have the data rows as dictionaries.
Here is an example of how you might take several dictionaries and write them
as a CSV file:
import csv

with open('names.csv', 'w', newline='') as csvfile:


fieldnames = ['first_name', 'last_name']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader()
writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
The fieldnames variable will establish the header of the CSV file and is then
used by the writerow method to know the order in which the values of the
dictionary should be written in the file.
The writeheader function is the function that writes the header of the CSV file,
and here is what it looks like:
class DictWriter:
# ...

def writeheader(self):
header = dict(zip(self.fieldnames, self.fieldnames))
return self.writerow(header)
Basically, what this function is doing is using zip to transform the header names
into a dictionary where the keys and the values are the same, pretending that
the header is just a regular data row:
>>> fieldnames = ['first_name', 'last_name']
>>> dict(zip(fieldnames, fieldnames))
{'first_name': 'first_name', 'last_name': 'last_name'}

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 54


Therefore, the writeheader function just needs to create this dictionary and
can then defer the actual writing to the writerow function.

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“zip is your friend whenever you need to traverse two or more iter-
ables at the same time.”
This Pydon’t showed you that:
• zip can be used to traverse several iterables at the same time;
• zip by itself returns a zip object which must then be iterated or converted
explicitly to a list if you want the tuples it produces;
• if the arguments to zip have uneven lengths, zip will stop as soon as one
of the iterators is exhausted;
• starting with Python 3.10, you can use the keyword argument strict=True
to tell zip to error if the arguments to zip have different lengths; and
• zip can provide for a really simple way to create dictionaries.

References
• Python 3 Documentation, The Python Standard Library, zip, docs.python.org/3/library/functions.html#zip
[last accessed 30-03-2021];
• Python 3.10 Documentation, The Python Standard Library, zip,
docs.python.org/3.10/library/functions.html#zip [last accessed 30-
03-2021];
• Python 3 Documentation, The Python Standard Library, csv, docs.python.org/3/library/csv.html
[last accessed 30-03-2021].
• Python 3 Documentation, The Python Standard Library, pathlib,
docs.python.org/3/library/pathlib.html [last accessed 30-03-2021].

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 55


Enumerate me

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/enume


rate-me.)

Introduction
Following up on last week’s Pydon’t about zip, today we are talking about
enumerate.
One of the things I appreciate most about Python, when compared to other pro-
gramming languages, is its for loops. Python allows you to write very expressive
loops, and some of that expressiveness comes from the built-in enumerate func-
tion.
In this article you will
• see what enumerate does;
• take a look at its underrated optional start argument;
• learn a couple of neat use cases for enumerate;

56
• see some nice examples of code using enumerate.

How enumerate works


Python newcomers are usually exposed to this type of for loop very early on:
>>> for i in range(3):
... print(i)
...
0
1
2
This leads them to “learning” this anti-pattern of for loops to go over, say, a list:
>>> words = ["Hey", "there"]
>>> for i in range(len(words)):
... print(f"'<{words[i]}> has {len(words[i])} letters.'")
...
'<Hey> has 3 letters.'
'<there> has 5 letters.'
The Pythonic way of writing such a loop is by iterating directly over the list:
>>> words = ["Hey", "there"]
>>> for word in words:
... print(f"'<{word}> has {len(word)} letters.'")
...
'<Hey> has 3 letters.'
'<there> has 5 letters.'
However, the final step in this indices vs. elements stand-off comes when you
need to know the index of each element but also access the element at the same
time. The naïve approach would be to loop over the range of the length and then
index to get the element:
for i in range(len(words)):
word = words[i]
# ...
or, if you read my Pydon’t on zip and are feeling imaginative, you could also do
for i, word in zip(range(len(words)), words):
# ...
but the Pythonic way of doing so is by using the built-in enumerate:
>>> words = ["Hey", "there"]
>>> for i, word in enumerate(words):
... print(f"'Word #{i}: <{word}> has {len(word)} letters.'")

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 57


...
'Word #0: <Hey> has 3 letters.'
'Word #1: <there> has 5 letters.'

Optional start argument


The enumerate function can also accept an optional argument that specifies the
first index it returns. For example, if we are counting words (like in the example
above), we might want to start counting from 1:
>>> words = ["Hey", "there"]
>>> for i, word in enumerate(words, 1):
... print(f"'Word #{i}: <{word}> has {len(word)} letters.'")
...
'Word #1: <Hey> has 3 letters.'
'Word #2: <there> has 5 letters.'
This optional argument can come in really handy as it saves you from having to
manually offset the index.
By the way, the argument has to be an integer but can be negative:
>>> for i, v in enumerate("abc", start=-3243):
... print(i)
...
-3243
-3242
-3241
Can you come up with a sensible situation where it would make sense to use
enumerate with a negative integer as the optional argument? Comment down
below if you come up with something nice!

Unpacking when iterating


The enumerate function produces a lazy generator, which means the items you
iterate over only become available as you need them. This prevents Python from
spending a lot of memory if you use enumerate on a large argument, e.g. a really
long list.
! This laziness of enumerate is something common in Python, ! for example
range and zip are also lazy. ! Be sure to subscribe to the newsletter so you
don’t miss the Pydon’ts where ! I cover these concepts.
The items that enumerate returns are 2-item tuples, where the first element is
the index and the second element is the value:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 58


>>> for tup in enumerate("abc"):
... print(tup)
...
(0, 'a')
(1, 'b')
(2, 'c')
What we usually do is unpack that tuple right in the loop statement, with some-
thing like
for i, letter in enumerate("abc"):
# use i and letter for whatever
which is roughly equivalent to
for tup in enumerate("abc"):
i, letter = tup
# use i and letter for whatever

Deep unpacking
Things can get even more interesting when you use enumerate, for example, on
a zip:
>>> chapter_pages = [5, 17, 31, 50] # book ends at page 50
>>> for i, (start, end) in enumerate(zip(chapter_pages, chapter_pages[1:]), start=1):
... print(f"'{i}: {end-start} pages long.'")
...
'1: 12 pages long.'
'2: 14 pages long.'
'3: 19 pages long.'
(Here I explicitly named the start= argument in the enumerate so that it was
visually easier to separate it from the argument to zip.)
This code snippet takes a list of pages where chapters of a book start and prints
the length of each chapter. Notice how enumerate returns tuples with indices
and values, but those values are extracted from a zip, which itself returns tuples:
>>> chapter_pages = [5, 17, 31, 50] # book ends at page 50
>>> for tup in enumerate(zip(chapter_pages, chapter_pages[1:]), start=1):
... print(tup)
...
(1, (5, 17))
(2, (17, 31))
(3, (31, 50))
What we do is use deep unpacking to access all these values directly:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 59


>>> chapter_pages = [5, 17, 31, 50] # book ends at page 50
>>> for tup in enumerate(zip(chapter_pages, chapter_pages[1:]), start=1):
... i, (start, end) = tup
... print(f"'{i}: {end-start} pages long.'")
...
'1: 12 pages long.'
'2: 14 pages long.'
'3: 19 pages long.'
! If you don’t know what deep unpacking is or how it works, ! go ahead and take
a look at my Pydon’t about unpacking.

Examples in code
Now you will see some usages of enumerate in real Python code.

Vanilla enumerate
I took a look at the Python Standard Library and by and large the most common
usage of enumerate is just a vanilla enumerate(iter) to access iterable values
and indices at the same time. Let me share a textbook example with you:
The doctest module allows you to write simple tests for your code inside the
docstrings for your functions, classes, etc. The way you write these tests is
in the form of an interactive session in the REPL. doctest then locates those
“interactive sessions” in your docstrings and plays them to see if the actual
output of the code matches what your docstring showed.
If you open your Python REPL, you will see that it starts with the prompt >>>
which has a blank space after the triple >. You cannot delete that blank space, it
is part of the prompt. When parsing a docstring to extract the actual tests, the
parser performs a check to see if the prompts have that leading blank space or
not, and here is the code that does it:
## from Lib\doctest.py in Python 3.9
class DocTestParser:
# ...

def _check_prompt_blank(self, lines, indent, name, lineno):


"""
Given the lines of a source string (including prompts and
leading indentation), check to make sure that every prompt is
followed by a space character. If any line is not followed by
a space character, then raise ValueError.
"""
for i, line in enumerate(lines):
if len(line) >= indent+4 and line[indent+3] != ' ':

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 60


raise ValueError('line %r of the docstring for %s '
'lacks blank after %s: %r' %
(lineno+i+1, name,
line[indent:indent+3], line))
Notice how the top for loop uses enumerate to traverse the lines of the inter-
active examples. If, inside the loop, we encounter a line that does not have the
extra blank space after >>> then we raise a ValueError where we use i to com-
pute the actual line number where the error occurred, which is the lineno+i+1
bit in the second to last line.
Want to see this in action? Try running this short script:
def sum_nats(n):
"""Sums the first n natural numbers.

>>> sum_nats(1)
1
>>> sum_nats(10)
55
>>>sum_nats(100)
5050
"""

return int(n*(n+1)/2)

if __name__ == "__main__":
import doctest
doctest.testmod()
Notice how I intentionally wrote the third example without a space between >>>
and sum_nats(100). Running this script should throw a ValueError at your face,
that should go away when you put a blank space there.

Using the optional argument


Line numbers in docstring tests
If you were paying attention, maybe you noticed that the enumerate usage of
the previous example called for the optional argument of enumerate!
Take a look at the code again:
## from Lib\doctest.py in Python 3.9
class DocTestParser:
# ...

def _check_prompt_blank(self, lines, indent, name, lineno):


# docstring elided.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 61


for i, line in enumerate(lines):
if len(line) >= indent+4 and line[indent+3] != ' ':
raise ValueError('line %r of the docstring for %s '
'lacks blank after %s: %r' %
(lineno+i+1, name,
line[indent:indent+3], line))
Notice that in the string formatting at the end we compute lineno+i+1 to raise
the error message with the correct line number for the prompt that was faulty…
But this is the same as rewriting the loop to use the start= argument:
class DocTestParser:
# ...
def _check_prompt_blank(self, lines, indent, name, lineno):
# docstring elided.
for line_n, line in enumerate(lines, start=lineno+1):
if len(line) >= indent+4 and line[indent+3] != ' ':
raise ValueError('line %r of the docstring for %s '
'lacks blank after %s: %r' %
(line_n, name,
line[indent:indent+3], line))

Counting days of the week


Definitely not as frequent as the plain enumerate(iter) usage, but there were
also quite some places that made use of the optional argument to avoid com-
puting unnecessary offsets.
An interesting use I found was in the calendar module, in the function
calendar.Calendar.itermonthdays2. The function calendar.Calendar.itermonthdays2
does the following: - you give it an year and a month, e.g. 2021 and 4 (for April);
and - it returns a generator with the days of the month paired with the days of
the week (0 to 6). (There’s the little caveat that the iterator returns sequences
of whole weeks, so it may pad the results in the beginning and/or end.)
Here is an example:
>>> for arg in c.Calendar().itermonthdays2(2021, 4):
... print(arg)
...
(0, 0)
(0, 1)
(0, 2)
(1, 3)
(2, 4)
(3, 5)
(4, 6)
(5, 0)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 62


(6, 1)
(7, 2)
## ... cut for brevity
(28, 2)
(29, 3)
(30, 4)
(0, 5)
(0, 6)
The numbers on the left show the day of the month and the days on the right
encode the day of the week, where 0 is Monday, up to 6 which is Sunday. The 6th
of April of 2021 (the day I wrote this article on) was a Tuesday, which is encoded
by the (6, 1) in the output above.
Here is the code that implements itermonthdays2:
## from Lib\calendar.py in Python 3.9
class Calendar(object):
# ...

def itermonthdays2(self, year, month):


"""
Like itermonthdates(), but will yield (day number, weekday number)
tuples. For days outside the specified month the day number is 0.
"""
for i, d in enumerate(self.itermonthdays(year, month), self.firstweekday):
yield d, i % 7
This function relies heavily on itermonthdays(year, month) that just returns
a sequence of month days with some leading and/or trailing zeroes, so that the
sequence represents the whole weeks in which that month fell.
For example, look at my desktop calendar for the month of April of 2021:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 63


If I tell the Calendar class to start counting weeks on Sundays (day 6), like my
desktop calendar does, here is what itermonthdays produces:
>>> for d in c.Calendar(6).itermonthdays(2021, 4):
... print(d)
...
0
0
0
0
1
2
3
4
## ... cut for brevity
30
0
The first four 0 are the four March days that show up in the top week and the
final 0 corresponds to the 1st of May that is shown in the bottom right corner of
my calendar.
In order to return these days together with the respective day of the week,
enumerate is being fed a start= argument, which is self.firstweekday, to
sync up the days of the month to what the Calendar sees as the first day of the
week.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 64


Filtering the indices
A really neat usage of enumerate I found while probing the Python Standard
Library was to filter a list in search for the indices of the elements that satisfy a
certain predicate.
For example, say you have a list of integers and you have a function that tells
you if a number is odd or not:
>>> nums = [4071, 53901, 96045, 84886, 5228, 20108, 42468, 89385, 22040, 18800, 4071]
>>> odd = lambda x: x%2
What code do you write to figure out the indices of the numbers that are odd?
Notice that solutions making use of nums.index in general won’t work because
the list may contain duplicates (cf. nums[0] and nums[-1] above).
In a helper file in the library with, and I quote, “Shared OS X support functions.”,
I found a really elegant solution:
>>> [i for i, n in enumerate(nums) if odd(n)]
[0, 1, 2, 7, 10]
Of course the file I am talking about (Lib\_osx_support.py) didn’t have this
code, but it did have the pattern I just showed. The actual code there is the
following:
## from Lib\_osx_support.py in Python 3.9
def compiler_fixup(compiler_so, cc_args):
# ...
indices = [i for i,x in enumerate(compiler_so) if x.startswith('-isysroot')]
# ...
While I have no clue what the code is doing from the semantic point of view,
we can clearly see that indices is collecting the indices of the elements in
compiler_so that start with "-isysroot".

Making the most out of the tuples


Another interesting usage of the enumerate function I found was to create dic-
tionaries directly. For example, if we take a look at the mailbox module we can
find a line of code that is building a table of contents as a dictionary, where the
keys are the integers given by enumerate and the values are tuples built by zip:
## from Lib\mailbox.py in Python 3.9
class mbox(_mboxMMDF):
# ...
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
last_was_empty = False

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 65


self._file.seek(0)
while True:
# process self._file
self._toc = dict(enumerate(zip(starts, stops)))
# ...
Notice how the code initialises empty lists starts and stops, which are then
populated inside the while loop I deleted because it was fairly long and would
distract us from the main point: the line
self._toc = dict(enumerate(zip(starts, stops)))
Because enumerate returns 2-item tuples, dict can take that and build a dic-
tionary. Curiously enough, we actually want starts and stops to be paired up
together, so we end up with calling enumerate on a zip, so this is what the result
could look like:
>>> starts = [1, 10, 21, 30]
>>> stops = [9, 15, 28, 52]
>>> dict(enumerate(zip(starts, stops)))
{0: (1, 9), 1: (10, 15), 2: (21, 28), 3: (30, 52)}

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“enumerate is your best friend if you need to traverse an iterator
to deal with its data and also need access to information about its
index.”
This Pydon’t showed you that:
• enumerate gives you access to an iterable’s elements and indices at the
same time;
• enumerate by itself returns a lazy enumerate object that must be then
iterated or converted explicitly to a list (or something else that suits
your needs) if you want its values;
• enumerate takes a second argument to set an offset for the indexing;
– and, in particular, that argument can be a negative integer;
• the result of enumerate can be fed directly to dict to create a dictionary
whose keys are the indices;
• using enumerate we get a nice idiom to find the indices of an iterable that
point to the elements that satisfy a given condition; and
• coupling zip, enumerate, and deep unpacking allows you to loop over sev-
eral iterables elegantly.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 66


References
• Python 3 Documentation, The Python Standard Library, enumerate,
docs.python.org/3/library/functions.html#enumerate [last accessed
06-04-2021];
• Python 3 Documentation, The Python Standard Library, calendar.Calendar,
docs.python.org/3/library/calendar.html#calendar.Calendar [last ac-
cessed 06-04-2021].
• Python 3 Documentation, The Python Standard Library, doctest,
docs.python.org/3/library/doctest.html [last accessed 06-04-2021].
• Python 3 Documentation, The Python Standard Library, mailbox,
docs.python.org/3/library/mailbox.html [last accessed 06-04-2021].

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 67


str and repr

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/str-


and-repr.)

str and repr


Python has two built-in mechanisms that allow you to convert an object to a
string, so that you can look at it and print it. I am talking about the str class

68
and the built-in repr function.
There is often confusion as to what the differences between these two built-ins
are, but the difference is simple and clear. The str class is used when you
want to convert something to the string type, and is also used when you need a
readable representation of your object. On the other hand, the repr function is
used to create an unambiguous representation of its argument.
End users generally use str because they want to print readable and good look-
ing text, whereas developers may use repr because they need to debug code
and need to make sure they know what they are looking at. For example, take a
look at the following interactive session:
>>> print(3)
3
>>> print("3")
3
>>> 3
3
>>> "3"
'3'
The print function calls str on its argument and then displays it, so both the
integer 3 and the string "3" get printed the same way: you have no way to tell
if the original object is an integer or a string. After that, you see that simply
writing the integer 3 and the string "3" in the REPL returns an unambiguous
representation of the object: you can tell integers and strings apart, because
the REPL is using repr under the hood to show objects. repr is also used when
your object is inside a container, like a list or a dictionary, because containers
usually defer their str behaviour to repr, as you can see by looking at PEP 3140
and at the following session:
>>> [3, "3"]
[3, '3']
>>> print([3, "3"])
[3, '3']
>>> str([3, "3"]) == repr([3, "3"])
True

The __str__ and __repr__ dunder methods


When you are defining your own classes in Python you will probably want to spe-
cify how your objects should look when printed, given that the default behaviour
in Python is not very helpful:
>>> class A:
... pass
...

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 69


>>> a = A()
>>> print(a)
<__main__.A object at 0x012DF640>
>>> a
<__main__.A object at 0x012DF640>
If you want to display your objects properly, you will want to implement the
__str__ and __repr__ dunder methods (dunder stands for double underscore),
and the implementations should follow the use case of str and repr outlined
above: the implementation of __str__ should provide a nice, readable repres-
entation of your object and __repr__ should represent unambiguously your ob-
ject, preferably by providing an expression that could be used to rebuild the
object.
! If you are not acquainted with Python’s dunder methods, you may want to
subscribe ! to the Pydon’t newsletter, I will write more about them later. ! Until
then, you may want to have a look at the Python 3 Docs and what they say ! about
the data model.
When implementing custom classes, I suggest you start by implementing
__repr__, as __str__ will default to calling __repr__ if no custom imple-
mentation is given, but only implementing __str__ still leaves you with rather
unhelpful representations of your objects.
If you just implement __str__:
>>> class A:
... def __str__(self):
... return "A"
...
>>> a = A()
>>> a
<__main__.A object at 0x01600760>
>>> print(a)
A
if you just implement __repr__:
>>> class A:
... def __repr__(self):
... return "A"
...
>>> a = A()
>>> a
A
>>> print(a)
A

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 70


Examples in code
datetime
Python’s datetime module supplies classes for manipulating dates and times.
A simple date could be created like so:
>>> import datetime
>>> date = datetime.datetime(2021, 2, 2)
Now that we have your date object of type datetime.datetime, we can see what
its repr looks like and compare it to its str version:
>>> print(repr(date))
datetime.datetime(2021, 2, 2, 0, 0)
>>> print(str(date))
2021-02-02 00:00:00
We can see that repr(date) could be used to create the same exact object:
>>> date == datetime.datetime(2021, 2, 2, 0, 0)
True
>>> date == eval(repr(date))
True
Whereas str(date) creates a nice-looking representation of the date in ques-
tion. Notice that from its str we can’t even tell that we were dealing with a
datetime.datetime object.

2D point
An example custom usage of the __str__ and __repr__ dunder methods could
come into play if you were to implement a simple class that represents 2D
points, for example because you have to deal with images or a game or maps,
or whatever your use case is.
Ignoring all other methods you would certainly implement, your class could look
like this:
class Point2D:
"""A class to represent points in a 2D space."""

def __init__(self, x, y):


self.x = x
self.y = y

def __str__(self):
"""Provide a good-looking representation of the object."""
return f"({self.x}, {self.y})"

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 71


def __repr__(self):
"""Provide an unambiguous way of rebuilding this object."""
return f"Point2D({repr(self.x)}, {repr(self.y)})"

p = Point2D(0, 0) # the origin.


print(f"To build the point {p} in your code, try writing {repr(p)}.")
Running this code prints To build the point (0, 0) in your code, try
writing Point2D(0, 0). to your console. Your end user may be accustomed
to 2D points, and thus they may need nothing more than the standard (x,
y) representation of a 2D point. During debugging, the Point2D prefix is
useful because it helps you distinguish between a tuple and a custom Point2D
instance.

Conclusion
When implementing custom classes you will probably want to give a custom
implementation of the __repr__ dunder method, and also a __str__ if you need
your instances to look good when printed to the end user. __str__ and str are
used when you need good looking strings, while the purpose of __repr__ and
repr is to create unambiguous representations of your objects.

References
• Python 3 Documentation, The Python Language Reference, Data model,
repr and str, https://docs.python.org/3/reference/datamodel.html#objec
t.__repr__.
• Python 3 Documentation, The Python Standard Library, Built-in Functions,
https://docs.python.org/3/library/functions.html.
• Python 3 Documentation, The Python Standard Library, Built-in Types, str,
https://docs.python.org/3/library/stdtypes.html#str.
• PEP 3140 – str(container) should call str(item), not repr(item), https://ww
w.python.org/dev/peps/pep-3140/.
• Stack Overflow, “Purpose of Python’s repr”, https://stackoverflow.com/qu
estions/1984162/purpose-of-pythons-repr.
• dbader.org, “Python String Conversion 101: Why Every Class Needs a
“repr””, https://dbader.org/blog/python-repr-vs-str.
Online references last consulted on the 2nd of February of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 72


Structural pattern matching
tutorial

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/structu


ral-pattern-matching-tutorial.)

Introduction
Structural pattern matching is coming to Python, and while it may look like a
plain switch statement like many other languages have, Python’s match state-

73
ment was not introduced to serve as a simple switch statement.
PEPs 634, 635, and 636 have plenty of information on what structural pattern
matching is bringing to Python, how to use it, the rationale for adding it to Python,
etc. In this article I will try to focus on using this new feature to write beautiful
code.
! At the time of writing, Python 3.10 is still a pre-release, ! so you have to look
in the right place if you want to download ! Python 3.10 and play with it.

Structural pattern matching Python could already


do
Structural pattern matching isn’t completely new in Python. For a long time now,
we have been able to do things like starred assignments:
>>> a, *b, c = [1, 2, 3, 4, 5]
>>> a
1
>>> b
[2, 3, 4]
>>> c
5
And we can also do deep unpacking:
>>> name, (r, g, b) = ("red", (250, 23, 10))
>>> name
'red'
>>> r
250
>>> g
23
>>> b
10
I covered these in detail in “Unpacking with starred assignments” and “Deep
unpacking”, so go read those Pydon’ts if you are unfamiliar with how to use these
features to write Pythonic code.
The match statement will use ideas from both starred assignments and deep
unpacking, so knowing how to use them is going to be helpful.

Your first match statement


For your first match statement, let’s implement the factorial function. A factorial
function is a textbook example when introducing people to recursion, and you

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 74


could write it like so:
def factorial(n):
if n == 0 or n == 1:
return 1
else:
return n * factorial(n-1)
factorial(5) # 120
Instead of using an if statement, we could use a match:
def factorial(n):
match n:
case 0 | 1:
return 1
case _:
return n * factorial(n - 1)
factorial(5)
Notice a couple of things here: we start our match statement by typing match
n, meaning we will want to do different things depending on what n is. Then, we
have case statements that can be thought of the different possible scenarios
we want to handle. Each case must be followed by a pattern that we will try to
match n against.
Patterns can also contain alternatives, denoted by the | in case 0 | 1, which
matches if n is either 0 or 1. The second pattern, case _:, is the go-to way of
matching anything (when you don’t care about what you are matching), so it is
acting more or less like the else of the first definition.

Pattern matching the basic structure


While match statements can be used like plain if statements, as you have seen
above, they really shine when you are dealing with structured data:
def normalise_colour_info(colour):
"""Normalise colour info to (name, (r, g, b, alpha))."""

match colour:
case (r, g, b):
name = ""
a = 0
case (r, g, b, a):
name = ""
case (name, (r, g, b)):
a = 0
case (name, (r, g, b, a)):
pass

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 75


case _:
raise ValueError("Unknown colour info.")
return (name, (r, g, b, a))

print(normalise_colour_info((240, 248, 255))) # ('', (240, 248, 255, 0


print(normalise_colour_info((240, 248, 255, 0))) # ('', (240, 248, 255, 0
print(normalise_colour_info(("AliceBlue", (240, 248, 255)))) # ('AliceBlue', (240, 24
print(normalise_colour_info(("AliceBlue", (240, 248, 255, 0.3)))) # ('AliceBlue', (240, 24
Notice here that each case contains an expression like the left-hand side of an
unpacking assignment, and when the structure of colour matches the structure
that the case exhibits, then the names get assigned to the variable names in
the case.
This is a great improvement over the equivalent code with if statements:
def normalise_colour_info(colour):
"""Normalise colour info to (name, (r, g, b, alpha))."""

if not isinstance(colour, (list, tuple)):


raise ValueError("Unknown colour info.")

if len(colour) == 3:
r, g, b = colour
name = ""
a = 0
elif len(colour) == 4:
r, g, b, a = colour
name = ""
elif len(colour) != 2:
raise ValueError("Unknown colour info.")
else:
name, values = colour
if not isinstance(values, (list, tuple)) or len(values) not in [3, 4]:
raise ValueError("Unknown colour info.")
elif len(values) == 3:
r, g, b = values
a = 0
else:
r, g, b, a = values
return (name, (r, g, b, a))
I tried writing a decent, equivalent piece of code to the one using structural pat-
tern matching, but this doesn’t look that good. Someone else has suggested,
in the comments, another alternative that also doesn’t use match. That sug-
gestion looks better than mine, but is much more complex and larger than the
alternative with match.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 76


The match version becomes even better when we add type validation to it, by
asking for the specific values to actually match Python’s built-in types:
def normalise_colour_info(colour):
"""Normalise colour info to (name, (r, g, b, alpha))."""

match colour:
case (int(r), int(g), int(b)):
name = ""
a = 0
case (int(r), int(g), int(b), int(a)):
name = ""
case (str(name), (int(r), int(g), int(b))):
a = 0
case (str(name), (int(r), int(g), int(b), int(a))):
pass
case _:
raise ValueError("Unknown colour info.")
return (name, (r, g, b, a)))

print(normalise_colour_info(("AliceBlue", (240, 248, 255)))) # ('AliceBlue', (240, 248, 2


print(normalise_colour_info2(("Red", (255, 0, "0")))) # ValueError: Unknown colour
How do you reproduce all this validation with if statements..?

Matching the structure of objects


Structural pattern matching can also be used to match the structure of class
instances. Let us recover the Point2D class I have used as an example in a
couple of posts, in particular the Pydon’t about __str__ and __repr__:
class Point2D:
"""A class to represent points in a 2D space."""

def __init__(self, x, y):


self.x = x
self.y = y

def __str__(self):
"""Provide a good-looking representation of the object."""
return f"({self.x}, {self.y})"

def __repr__(self):
"""Provide an unambiguous way of rebuilding this object."""
return f"Point2D({repr(self.x)}, {repr(self.y)})"
Imagine we now want to write a little function that takes a Point2D and writes

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 77


a little description of where the point lies. We can use pattern matching to
capture the values of the x and y attributes and, what is more, we can use short
if statements to help narrow down the type of matches we want to succeed!
Take a look at the following:
def describe_point(point):
"""Write a human-readable description of the point position."""

match point:
case Point2D(x=0, y=0):
desc = "at the origin"
case Point2D(x=0, y=y):
desc = f"in the vertical axis, at y = {y}"
case Point2D(x=x, y=0):
desc = f"in the horizontal axis, at x = {x}"
case Point2D(x=x, y=y) if x == y:
desc = f"along the x = y line, with x = y = {x}"
case Point2D(x=x, y=y) if x == -y:
desc = f"along the x = -y line, with x = {x} and y = {y}"
case Point2D(x=x, y=y):
desc = f"at {point}"

return "The point is " + desc

print(describe_point(Point2D(0, 0))) # The point is at the origin


print(describe_point(Point2D(3, 0))) # The point is in the horizontal axis, at x = 3
print(describe_point(Point2D(3, -3))) # The point is along the x = -y line, with x = 3 and
print(describe_point(Point2D(1, 2))) # The point is at (1, 2)

__match_args__
Now, I don’t know if you noticed, but didn’t all the x= and y= in the code snippet
above annoy you? Every time I wrote a new pattern for a Point2D instance, I
had to specify what argument was x and what was y. For classes where this
order is not arbitrary, we can use __match_args__ to tell Python how we would
like match to match the attributes of our object.
Here is a shorter version of the example above, making use of __match_args__
to let Python know the order in which arguments to Point2D should match:
class Point2D:
"""A class to represent points in a 2D space."""

__match_args__ = ["x", "y"]


def __init__(self, x, y):
self.x = x

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 78


self.y = y

def describe_point(point):
"""Write a human-readable description of the point position."""

match point:
case Point2D(0, 0):
desc = "at the origin"
case Point2D(0, y):
desc = f"in the vertical axis, at y = {y}"
case Point2D(x, 0):
desc = f"in the horizontal axis, at x = {x}"
case Point2D(x, y):
desc = f"at {point}"

return "The point is " + desc

print(describe_point(Point2D(0, 0))) # The point is at the origin


print(describe_point(Point2D(3, 0))) # The point is in the horizontal axis, at x = 3
print(describe_point(Point2D(1, 2))) # The point is at (1, 2)

Wildcards
Another cool thing you can do when matching things is to use wildcards.

Asterisk *
Much like you can do things like
>>> head, *body, tail = range(10)
>>> print(head, body, tail)
0 [1, 2, 3, 4, 5, 6, 7, 8] 9
where the *body tells Python to put in body whatever does not go into head or
tail, you can use * and ** wildcards. You can use * with lists and tuples to
match the remaining of it:
def rule_substitution(seq):
new_seq = []
while seq:
match seq:
case [x, y, z, *tail] if x == y == z:
new_seq.extend(["3", x])
case [x, y, *tail] if x == y:
new_seq.extend(["2", x])
case [x, *tail]:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 79


new_seq.extend(["1", x])
seq = tail
return new_seq

seq = ["1"]
print(seq[0])
for _ in range(10):
seq = rule_substitution(seq)
print("".join(seq))

"""
Prints:
1
11
21
1211
111221
312211
13112221
1113213211
31131211131221
13211311123113112211
11131221133112132113212221
"""
This builds the sequence I showed above, where each number is derived from
the previous one by looking at its digits and describing what you are looking at.
For example, when you find three equal digits in a row, like "222", you rewrite
that as "32" because you are seeing three twos. With the match statement this
becomes much cleaner. In the case statements above, the *tail part of the
pattern matches the remainder of the sequence, as we are only using x, y, and
z to match in the beginning of the sequence.

Plain dictionary matching


Similarly, we can use ** to match the remainder of a dictionary. But first, let us
see what is the behaviour when matching dictionaries:
d = {0: "oi", 1: "uno"}
match d:
case {0: "oi"}:
print("yeah.")
## prints yeah.
While d has a key 1 with a value "uno", and that is not specified in the only
case statement, there is a match and we enter the statement. When matching
with dictionaries, we only care about matching the structure that was explicitly

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 80


mentioned, and any other extra keys that the original dictionary has are ignored.
This is unlike matching with lists or tuples, where the match has to be perfect if
no wildcard is mentioned.

Double asterisk **
However, if you want to know what the original dictionary had that was not spe-
cified in the match, you can use a ** wildcard:
d = {0: "oi", 1: "uno"}
match d:
case {0: "oi", **remainder}:
print(remainder)
## prints {1: 'uno'}
Finally, you can use this to your advantage if you want to match a dictionary that
contains only what you specified:
d = {0: "oi", 1: "uno"}
match d:
case {0: "oi", **remainder} if not remainder:
print("Single key in the dictionary")
case {0: "oi"}:
print("Has key 0 and extra stuff.")
## Has key 0 and extra stuff.
You can also use variables to match the values of given keys:
d = {0: "oi", 1: "uno"}
match d:
case {0: zero_val, 1: one_val}:
print(f"0 mapped to {zero_val} and 1 to {one_val}")
## 0 mapped to oi and 1 to uno

Naming sub-patterns
Sometimes you may want to match against a more structured pattern, but then
give a name to a part of the pattern, or to the whole thing, so that you have
a way to refer back to it. This may happen especially when your pattern has
alternatives, which you add with |:
def go(direction):
match direction:
case "North" | "East" | "South" | "West":
return "Alright, I'm going!"
case _:
return "I can't go that way..."

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 81


print(go("North")) # Alright, I'm going!
print(go("asfasdf")) # I can't go that way...
Now, imagine that the logic to handle that “going” somewhere is nested inside
something more complex:
def act(command):
match command.split():
case "Cook", "breakfast":
return "I love breakfast."
case "Cook", *wtv:
return "Cooking..."
case "Go", "North" | "East" | "South" | "West":
return "Alright, I'm going!"
case "Go", *wtv:
return "I can't go that way..."
case _:
return "I can't do that..."

print("Go North") # Alright, I'm going!


print("Go asdfasdf") # I can't go that way...
print("Cook breakfast") # I love breakfast.
print("Drive") # I can't do that...
And, not only that, we want to know where the user wants to go, in order to
include that in the message. We can do this by leaving the options up, but then
capturing the result of the match in a variable:
def act(command):
match command.split():
case "Cook", "breakfast":
return "I love breakfast."
case "Cook", *wtv:
return "Cooking..."
case "Go", "North" | "East" | "South" | "West" as direction:
return f"Alright, I'm going {direction}!"
case "Go", *wtv:
return "I can't go that way..."
case _:
return "I can't do that..."

print("Go North") # Alright, I'm going North!


print("Go asdfasdf") # I can't go that way...

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 82


Traversing recursive structures
Another type of situation in which structural pattern matching is expected to
succeed quite well is in handling recursive structures.
I have seen great examples of this use-case in the references I included below,
and will now share one of my own.
Imagine you want to transform a mathematical expression into prefix notation,
e.g. "3 * 4" becomes "* 3 4" and 1 + 2 + 3 becomes + 1 + 2 3 or + + 1
2 3 depending on whether + associates from the left or from the right.
You can write a little match to deal with this:
import ast

def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()

print(prefix(ast.parse("1 + 2 + 3", mode="eval"))) # + + 1 2 3


print(prefix(ast.parse("2**3 + 6", mode="eval")) # + * 2 3 6
print(prefix(ast.parse("1 + 2*3 - 5/7", mode="eval"))) # - + 1 * 2 3 / 5 7

Careful with the hype


Now, here is a word of caution: match isn’t the best solution always. Looking up
to the prefix notation example above, perhaps there are better ways to transform
each possible binary operator to its string representation..? The current solution

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 83


spends two lines for each different operator, and if we add support for many
more binary operators, that part of the code will become unbearably long.
In fact, we can (and probably should) do something else about that. For example,
import ast

def op_to_str(op):
ops = {
ast.Add: "+",
ast.Sub: "-",
ast.Mult: "*",
ast.Div: "/",
}
return ops.get(op.__class__, None)

def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
sop = op_to_str(op)
if sop is None:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()

print(prefix(ast.parse("1 + 2 + 3", mode="eval"))) # + + 1 2 3


print(prefix(ast.parse("2*3 + 6", mode="eval")) # + * 2 3 6
print(prefix(ast.parse("1 + 2*3 - 5/7", mode="eval"))) # - + 1 * 2 3 / 5 7

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Structural pattern matching introduces a feature that can simplify
and increase the readability of Python code in many cases, but it will
not be the go-to solution in every single situation.”
This Pydon’t showed you that:
• structural pattern matching with the match statement greatly extends the
power of the already-existing starred assignment and structural assign-
ment features;

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 84


• structural pattern matching can match literal values and arbitrary patterns
• patterns can include additional conditions with if statements
• patterns can include wildcards with * and **
• match statements are very powerful when dealing with the structure of
class instances
• __match_args__ allows to define a default order for arguments to be
matched in when a custom class is used in a case
• built-in Python classes can be used in case statements to validate types

References
• PEP 622 – Structural Pattern Matching, https://www.python.org/dev/peps/
pep-0622/;
• PEP 634 – Structural Pattern Matching: Specification, https://www.python
.org/dev/peps/pep-0634/;
• PEP 635 – Structural Pattern Matching: Motivation and Rationale, https:
//www.python.org/dev/peps/pep-0635/;
• PEP 636 – Structural Pattern Matching: Tutorial, https://www.python.org
/dev/peps/pep-0636/;
• Dynamic Pattern Matching with Python, https://gvanrossum.github.io/doc
s/PyPatternMatching.pdf;
• Python 3.10 Pattern Matching in Action, YouTube video by “Big Python”,
https://www.youtube.com/watch?v=SYTVSeTgL3s.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 85


Structural pattern matching
anti-patterns

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/structu


ral-pattern-matching-anti-patterns.)

Introduction
Structural pattern matching is coming to Python, and while it may look like a
plain switch statement like many other languages have, Python’s match state-
ment was not introduced to serve as a simple switch statement.

86
In this article I explored plenty of use cases for the new match statement, and
in this blog post I will try to explore some use cases for which a match is not
the answer. This article will assume you know how structural pattern matching
works in Python, so if you are unsure how that works feel free to read my “Pattern
matching tutorial for Pythonic code”.
! At the time of writing, Python 3.10 is still a pre-release, ! so you have to look
in the right place if you want to download ! Python 3.10 and play with it.

There should be only one obvious way to do it


As per the Zen of Python,
“There should be one– and preferably only one –obvious way to do
it.”
The introduction of the match statement may seem to violate this principle…
However, you have to remember that the point of the match statement is not
to serve as a basic switch statement, as we already have plenty of alternatives
for that – if match was supposed to be a simple switch, then it would probably
be called “switch” and not “match”. No, the point of the match statement is to
do structural pattern matching, so there’s plenty of basic types of matching and
casing that can be done with the traditional tools that we have been using up
until Python 3.10.
Below I will share some of these tools with you, and I’ll try to describe the situ-
ations in which they are helpful. Of course, this is much easier to understand
with code so there will be plenty of code examples along the way.

A short and sweet if statement


The Collatz conjecture is a mathematical conjecture that says that the following
function terminates for any positive integer given as input:
def collatz_path(n):
path = [n]
while n != 1:
match n % 2:
case 0:
n //= 2
case 1:
n = 3*n + 1
path.append(n)
return path
Which gives the following two example outputs:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 87


>>> collatz_path(8)
[8, 4, 2, 1]
>>> collatz_path(15)
[15, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1]
If we look at the usage of match above, we see it basically served as a simple
switch to match either 0 or 1, the only two values that the operation n % 2 could
result in for a positive integer n. Notice that if we use a plain if we can write
exactly the same code and save one line of code:
def collatz_path(n):
path = [n]
while n != 1:
if n % 2:
n = 3*n + 1
else:
n //= 2
path.append(n)
return path
We saved one line of code and reduced the maximum depth of our indentation:
with the match we had code that was indented four times, whereas the imple-
mentation with the if only has three levels of depth. When you only have a
couple of options and you are checking for explicit equality, a short and sweet
if statement is most likely the way to go.

Be smart(er)
Sometimes you will feel like you have to list a series of cases and corresponding
values, so that you can map one to the other. However, it might be the case that
you could make your life much simpler by looking for an alternative algorithm or
formula and implementing that instead. I’ll show you an example.
In case you never heard of it, Rule 30 is an “elementary cellular automaton”.
You can think of it as a rule that receives three bits (three zeroes/ones) and
produces a new bit, depending on the three bits it received. Automatons are
really, really, interesting, but discussing them is past the point of this article.
Let us just look at a possible implementation of the “Rule 30” automaton:
def rule30(bits):
match bits:
case 0, 0, 0:
return 0
case 0, 0, 1:
return 1
case 0, 1, 0:
return 1

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 88


case 0, 1, 1:
return 1
case 1, 0, 0:
return 1
case 1, 0, 1:
return 0
case 1, 1, 0:
return 0
case 1, 1, 1:
return 0
This seems like a sensible use of the match statement, except that we just wrote
16 lines of code… Ok, you are right, let us put together the rules that return the
same values, that should make the code shorter:
def rule30(bits):
match bits:
case 0, 0, 0 | 1, 0, 1 | 1, 1, 0 | 1, 1, 1:
return 0
case 0, 0, 1 | 0, 1, 0 | 0, 1, 1 | 1, 0, 0:
return 1
Yup, much better. But now we have four options on each case, and I have to
squint to figure out where each option starts and ends, and the long strings of
zeroes and ones aren’t really that pleasant to the eye… Can we make it better..?
With just a little bit of research you can find out that the “Rule 30” can be written
as a closed formula that depends on the three input bits, which means we don’t
have to match the input bits with all the possible inputs, we can just compute
the output:
def rule30(bits):
p, q, r = bits
return (p + q + r + q*r) % 2
You might argue that this formula obscures the relationship between the several
inputs and their outputs. You are right in principle, but having the explicit “Rule
30” written out as a match doesn’t tell you much about why each input maps to
each output either way, so why not make it short and sweet?

Basic mappings
Getting from dictionaries
There are many cases in which you just want to take a value in and map it to
something else. As an example, take this piece of code that takes an expression
and writes it in prefix notation:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 89


import ast

def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()

print(prefix(ast.parse("1 + 2 + 3", mode="eval"))) # + + 1 2 3


print(prefix(ast.parse("2**3 + 6", mode="eval")) # + * 2 3 6
print(prefix(ast.parse("1 + 2*3 - 5/7", mode="eval"))) # - + 1 * 2 3 / 5 7
Notice the inner match to convert the op inside a BinOp to a string? For starters,
that nested match takes up too much vertical space and distracts us from what
really matters, which is the traversal of the recursive structure of the tree. This
means we could actually refactor that bit as a utility function:
import ast

def op_to_str(op):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 90


return sop

def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
return f"{op_to_str(op)} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()

print(prefix(ast.parse("1 + 2 + 3", mode="eval"))) # + + 1 2 3


print(prefix(ast.parse("2**3 + 6", mode="eval")) # + * 2 3 6
print(prefix(ast.parse("1 + 2*3 - 5/7", mode="eval"))) # - + 1 * 2 3 / 5 7
This makes it easier to read and interpret the prefix function, but now we
have another problem that really annoys me: a simple but long function, the
op_to_str function. For every type of operator you support, your function grows
by two lines… If you replace the match with a chain of if and elif statements
you only save one line at the top…
The fix I suggested in the original article was using a dictionary to map the type
of op to its string representation:
def op_to_str(op):
ops = {
ast.Add: "+",
ast.Sub: "-",
ast.Mult: "*",
ast.Div: "/",
}
return ops.get(op.__class__, None)
This usage pattern of a dictionary is quite common in Python, using the get
method to compute the mapping of a value to another value. In case you are
wondering, you can use the second argument of the get function to provide for
a default value, which might be useful if the dictionary hasn’t listed every single
possible value or in case you want to have a fallback value.

getattr
Another useful mechanism that we have available is the getattr function, which
is part of a trio of Python built-in functions: hasattr, getattr and setattr.
! I will be writing about this trio in a future Pydon’t; ! be sure to subscribe to
the Pydon’t newsletter so you don’t miss it! ! For now, I’ll just show you briefly

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 91


what getattr can do for you.
I am writing an APL interpreter called RGSPL, and there is a function named
visit_F where I need to map APL symbols like +, -, and � to the corresponding
Python function that implements it. These Python functions, implementing the
behaviour of the symbols, live in the functions.py file. If I were using a match
statement, here is what this visit_F could look like:
import functions

def visit_F(self, func):


"""Fetch the callable function."""

name = func.token.type.lower() # Get the name of the symbol.


match name:
case "plus":
function = functions.plus
case "minus":
function = functions.minus
case "reshape":
function = functions.reshape
case _:
function = None
if function is None:
raise Exception(f"Could not find function {name}.")
return function
This is a similar problem to the one I showed above, where we wanted to get a
string for each type of operator we got, so this could actually be written with the
dictionary mapping. I invite you to do it, as a little exercise.
However, here’s the catch: I have still a long way to go in my RGSPL project,
and I already have a couple dozen of those primitives, so my match statement
would be around 40 lines long, if I were using that solution, or 20 lines long if I
were using the dictionary solution, with a key, value pair per line.
Thankfully, Python’s getattr can be used to get an attribute from an object, if I
have the name of that attribute. It is no coincidence that the value of the name
variable above is supposed to be exactly the same as the name of the function
defined inside functions.py:
import functions

getattr(functions, "plus", None) # returns functions.plus


getattr(functions, "reshape", None) # returns functions.reduce
getattr(functions, "fasfadf", None) # returns None
With the getattr function that Python provides, my visit_F stays with a con-
stant size, regardless of how many functions I add to the functions.py file:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 92


def visit_F(self, func):
"""Fetch the callable function."""

name = func.token.type.lower() # Get the name of the symbol.


function = getattr(functions, name, None)
if function is None:
raise Exception(f"Could not find function {name}.")
return function
The getattr function can also be used to get attributes from an instance of a
class, e.g.,
class Foo:
def __ini__(self, a, b):
self.a = a
self.b = b

foo = Foo(3, 4)
print(getattr(foo, "a")) # prints 3
bar = Foo(10, ";")
print(getattr(bar, ";")) # prints ';'
This goes to show that it is always nice to know the tools you have at your dis-
posal. Not everything has very broad use cases, but that also means that the
more specialised tools are the ones that make the most difference when they
are brought in.
Speaking of knowing your tools, the last use case in this article for which match
is a bad alternative is related to calling different functions when your data has
different types.

Single-dispatch generic functions


If you have programming experience in a programming language like Java, you
will be familiar with the concept of overloading a function: you implement the
same function several times, but you get to specify the behaviour of the function
for different types of arguments and/or number of arguments.
For example, you might want to implement a function to pretty-print a series of
different types of objects:
def pretty_print(arg):
if isinstance(arg, complex):
print(f"{arg.real} + {arg.imag}i")
elif isinstance(arg, (list, tuple)):
for i, elem in enumerate(arg):
print(i, elem)
elif isinstance(arg, dict):

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 93


for key, value in arg.items():
print(f"{key}: {value}")
else:
print(arg)
Which then works like so:
>>> pretty_print(3)
3
>>> pretty_print([2, 5])
0 2
1 5
>>> pretty_print(3+4j)
3.0 + 4.0i
You can see that the branching introduced by the if statement is merely to
separate the different types that the arg could have, and while the handling
logic might be different, the final purpose is always the same: to pretty-print
an object. But what if the code to handle each type of argument took 10 or 20
lines? You would be getting a really long function with what would essentially
be embedded subfunctions.
You can separate all these subfunctions by making use of the functools.singledispatch
decorator:
import functools

@functools.singledispatch
def pretty_print(arg):
print(arg)

@pretty_print.register(complex)
def _(arg):
print(f"{arg.real} + {arg.imag}i")

@pretty_print.register(list)
@pretty_print.register(tuple)
def _(arg):
for i, elem in enumerate(arg):
print(i, elem)

@pretty_print.register(dict)
def _(arg):
for key, value in arg.items():
print(f"{key}: {value}")
And this can then be used exactly like the original function:
>>> pretty_print(3)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 94


3
>>> pretty_print([2, 5])
0 2
1 5
>>> pretty_print(3+4j)
3.0 + 4.0i
The pretty_print example isn’t the best example because you spend as many
lines decorating as in defining the actual subfunctions, but this shows you the
pattern that you can now be on the lookout for. You can read more about
singledispatch in the docs.

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“The new match statement is great, but that does not mean the match
statement will be the best alternative always and, in particular, the
match statement is generally being misused if you use it as a simple
switch.”
This Pydon’t showed you that:
• match isn’t necessarily always the best way to implement control flow;
• short and basic match statements could be vanilla if statements;
• sometimes there is a way to compute what you need, instead of having to
list many different cases and their respective values;
• built-in tools like dict.get and getattr can also be used to fetch different
values depending on the matching key; and
• you can use functools.singledispatch when you need to execute dif-
ferent subfunctions when the input has different types.

References
• PEP 622 – Structural Pattern Matching, https://www.python.org/dev/peps/
pep-0622/;
• PEP 634 – Structural Pattern Matching: Specification, https://www.python
.org/dev/peps/pep-0634/;
• PEP 635 – Structural Pattern Matching: Motivation and Rationale, https:
//www.python.org/dev/peps/pep-0635/;
• PEP 636 – Structural Pattern Matching: Tutorial, https://www.python.org
/dev/peps/pep-0636/;
• PEP 443 – Single-dispatch generic functions, https://www.python.org/dev
/peps/pep-0443/;
• Python 3 Documentation, The Python Standard Library, getattr, https:
//docs.python.org/3/library/functions.html#getattr;

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 95


• Python 3 Documentation, The Python Standard Library, functools.singledispatch,
https://docs.python.org/3/library/functools.html#functools.singledispatc
h;
• Wikipedia, “Collatz Conjecture”, https://en.wikipedia.org/wiki/Collatz_con
jecture;
• WolframAlpha, “Rule 30”, https://www.wolframalpha.com/input/?i=rule+
30;
• Wikipedia, “Rule 30”, https://en.wikipedia.org/wiki/Rule_30;

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 96


Watch out for recursion

(Thumbnail of the original article at https://mathspp.com/blog/pydonts/watch-


out-for-recursion.)

Introduction
In this Pydon’t I am going to talk a little bit about when and why recursion
might not be the best strategy to solve a problem. This discussion will entail
some particularities of Python, but will also cover broader topics and concepts
that encompass many programming languages. After this brief discussion, I

97
will show you some examples of recursive Python code and its non-recursive
counterparts.
Despite what I said I’ll do, don’t take me wrong: the purpose of this Pydon’t is
not to make you dislike recursion or to say that recursion sucks. I really like
recursion and I find it very elegant.

Watch out for recursion


Now that you know what is the purpose of this Pydon’t, let me mention some
things that can influence the suitability of recursion to solve problems.

RecursionError
The first thing we will discuss is the infamous recursion depth limit that Python
enforces.
If you have no idea what I am talking about, then either - you never wrote a
recursive function in your life, or - you are really, really good and never made a
mistake in your recursive function definitions.
The recursion depth limit is something that makes your code raise a
RecursionError if you make too many recursive calls. To see what I am
talking about, just do the following in your REPL:
>>> def f():
... return f()
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
File "<stdin>", line 2, in f
File "<stdin>", line 2, in f
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded
>>>
In many cases, this limit helps, because it helps you find recursive functions for
which you did not define the base case properly.
There are, however, cases in which 1000 recursive calls isn’t enough to finish
your computations. A classical example is that of the factorial function:
>>> def fact(n):
... if n == 0:
... return 1
... return n*fact(n-1)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 98


...
>>> fact(10)
3628800
>>> fact(2000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in fact
File "<stdin>", line 5, in fact
File "<stdin>", line 5, in fact
[Previous line repeated 995 more times]
File "<stdin>", line 2, in fact
RecursionError: maximum recursion depth exceeded in comparison
Our function is properly defined but by default Python does not allow us to make
sufficient recursive calls.
If you must, you can always set your own recursion depth:
>>> import sys
>>> sys.setrecursionlimit(3000)
>>> fact(2000)
33162... # (omitted for brevity)
>>> sys.getrecursionlimit()
3000
Just be careful with it. I never tried, but you are likely not to be interested in
having Python run out of memory because of your obscenely large amount of
recursive calls.
Hence, if your function is such that it will be constantly trying to recurse more
than the recursion depth allowed, you might want to consider a different solution
to your problem.

No tail recursion elimination


In some programming languages, the factorial function shown above could be
tweaked – so as to perform a tail call – and that would prevent some problems
while saving memory: tail calls happen when the recursive call is the very last
thing that is done inside the function, which more or less means that you do not
need to keep any information whatsoever about the context you are in when you
recurse.
In the factorial function above, after recursing with fact(n-1) we still have to
perform a multiplication before returning from the function. If we rewrote the
function to carry the partial factorial as an accumulator, we could have a factorial
function that performs tail calls:
>>> def fact(n, partial=1):
... if n <= 1:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 99


... return partial
... return fact(n-1, n*partial)
...
>>> fact(10)
3628800
As you can see, the very last thing done inside the fact function is to call it-
self, so in theory Python could “forget everything about its surroundings” when
making the recursive call, and save a lot of memory in the process.
In practice, Python does not do this intentionally, and I refer you to the two
articles on the Neopythonic blog (by Guido van Rossum) in the references to
read more on why Python does not have such a feature.
Converting recursive functions into tail recursive functions is an interesting exer-
cise and I challenge you to do so, but you won’t get speed gains for it. However,
it is very easy to remove the recursion of a tail recursive function, and I will show
you how to do it in the examples below.

Branching overlap
Another thing to take into account when considering a recursive solution to a
problem is: is there going to be much overlap in the recursive calls?
If your recursive function branches in its recursive calls and the recursive calls
overlap, then you may be wasting plenty of time recalculating the same values
over and over again. More often than not this can be fixed easily, but just be-
cause a problem probably has a simple solution, it doesn’t mean you can out-
right ignore it.
A classical example of recursion that leads to plenty of wasted computations is
the Fibonacci sequence example:
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
A simple modification to this function shows that there are many recursive calls
being made:
call_count = 0
def fibonacci(n):
global call_count
call_count += 1
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 100


print(fibonacci(10))
print(call_count) # 177
If your function is more involved, then the time you waste on recalculations can
become unbearable.

Depth-first versus breadth-first


Something else to take into consideration when writing recursive solutions to
your problems is that recursive solutions are inherently depth-first in nature,
whereas your problem might warrant a breadth-first solution.
This is unlikely to be a large concern, but it just goes to show that sometimes,
even though a solution has a very clear recursive solution, you are better off with
not implementing a purely-recursive solution.
A very good example of this distinction popped up when I solved the water
bucket riddle: I wanted to write code that solved (a more generic version of)
that riddle where you have a bucket that can hold A litres, another one that
holds B litres, and you have to move water around to get one of the buckets to
hold exactly T litres. The solution can be easily expressed in recursive terms,
but my implementation actually used a while loop and a BFS algorithm.
If you don’t know what this means, the best thing to do is to google it. For ex-
ample, visit the Wikipedia pages on Depth-first Search and Breadth-first Search.
In a short and imprecise sentence, Depth-First Search (DFS) means that when
you are traversing some structure, you prioritise exploring in depth, and only
then you look around, whereas in Breadth-First Search (BFS) you first explore
the level you are at, and only then go a level deeper.

Examples in code
I will now show some recursive code that can incur in some of the problems men-
tioned above, and will also share non-recursive versions of those same pieces
of code.

Factorials
The toy example of the factorial is great because it lends itself to countless
different implementations, and the ideas that these implementations exhibit
can then be adapted to more complex recursions.
The main characteristic here is that the recursion of the factorial is a “linear” re-
cursion, where each call only performs a single recursive call, and each recursive
call is for a simpler problem.
The vanilla recursion follows:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 101


def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)
Like we have seen above, we could use an accumulator to write a tail recursive
version of the factorial, even thought Python won’t optimise that in any way:
def factorial(n, partial=1):
if n <= 1:
return partial
return factorial(n-1, n*partial)
Now that we have this function written in a tail recursive way, we can actually
remove the recursion altogether following a simple recipe:
def factorial(n):
partial = 1
while n > 1:
n, partial = n-1, n*partial
return partial
This is a generic transformation you can do for any tail recursive function and
I’ll present more examples below.
Still on the factorial, because this is a linear recursion (and a fairly simple one,
yes), there are many ways in which this function can be rewritten. I present a
couple, pretending for a second that math.factorial doesn’t exist:
import math
def factorial(n):
return math.prod(i for i in range(1, n+1))

import functools, operator


def factorial(n):
return functools.reduce(operator.mul, [i for i in range(1, n+1)])

def factorial(n):
fact = 1
for i in range(1, n+1):
fact *= i
return fact
If you are solving a problem and come up with different solutions, don’t be afraid
to try them out.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 102


More on tail recursion
Let me show you a couple of simple recursive functions, their tail recursive equi-
valents and then their non-recursive counterparts. I will show you the generic
transformation, so that you too can rewrite any tail recursive function as an im-
perative one with ease.

List sum
You can implement your own sum recursively:
def sum(l):
if not l:
return 0
return l[0] + sum(l[1:])
If you carry a partial sum down the recursive calls, you can make this tail recurs-
ive:
def sum(l, partial=0):
if not l:
return partial
return sum(l[1:], l[0] + partial)
From the tail recursive function to the while solution is simple:
def sum(l):
partial = 0
while l:
l, partial = l[1:], l[0] + partial
return partial
Notice what happened: - the default value of the auxiliary variable becomes the
first statement of the function; - you write a while loop whose condition is the
complement of the base case condition; - you update your variables just like you
did in the tail recursive call, except now you assign them explicitly; and - after
the while you return the auxiliary variable.
Of course there are simpler implementations for the sum, the point here is that
this transformation is generic and always works.

Sorting a list
Here is another example where we sort a list with selection sort. First, “regular”
recursion:
def selection_sort(l):
if not l:
return []
m = min(l)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 103


idx = l.index(m)
return [m] + selection_sort(l[:idx]+l[idx+1:])
Now a tail recursive version:
def selection_sort(l, partial=None): # partial=[] is bad!
if partial is None:
partial = []
if not l:
return partial
m = min(l)
idx = l.index(m)
selection_sort(l[:idx]+l[idx+1:], partial + [m])
In the above we just have to be careful with something: the default value of
partial is supposed to be the empty list, but you should avoid mutable types in
your arguments’ default values, so we go with None and then the very first thing
we do is set partial = [] in case it was None.
Finally, applying the recipe, we can remove the recursion:
def selection_sort(l):
partial = []
while l:
m = min(l)
idx = l.index(m)
l, partial = l[:idx]+l[idx+1:], partial + [m]
return partial

Traversing (a directory)
The Depth-first versus Breadth-first distinction is more likely to pop up when
you have to traverse something.
In this example, we will traverse a full directory, printing file names and file sizes.
A simple, purely recursive solution follows:
import pathlib

def print_file_sizes(path):
"""Print file sizes in a directory."""

path_obj = pathlib.Path(path)
if path_obj.is_file():
print(path, path_obj.stat().st_size)
else:
for path in path_obj.glob("*"):
print_file_sizes(path)

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 104


If you apply that function to a directory tree like this one,
- file1.txt
- subdir1
| - file2.txt
| - subdir2
| - file3.txt
| - subdir3
| - deep_file.txt
then the first file you will see printed is deep_file.txt, because this recursive
solution traverses your file-system depth first. If you wanted to traverse the
directory breadth-first, so that you first found file1.txt, then file2.txt, then
file3.txt, and finally deep_file.txt, you could rewrite your function to look
like the following:
import pathlib

def print_file_sizes(dir):
"""Print file sizes in a directory, recursing into subdirectories."""

paths_to_process = [dir]
while paths_to_process:
path, *paths_to_process = paths_to_process
path_obj = pathlib.Path(path)
if path_obj.is_file():
print(path, path_obj.stat().st_size)
else:
paths_to_process += path_obj.glob("*")
This example that I took from my “Truthy, Falsy, and bool” Pydon’t uses the
paths_to_process list to keep track of the, well, paths that still have to be
processed, which mimics recursion without actually having to recurse.

Keeping branching in check


Overlaps
When your recursive function branches out a lot, and those branches overlap,
you can save some computational effort by saving the values you computed so
far. This can be as simple as having a dictionary inside which you check for
known values and where you insert the base cases.
! This technique is often called memoisation and will be covered in depth in ! a
later Pydon’t, so stay tuned!
call_count = 0

fibonacci_values = {0: 0, 1: 1}

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 105


def fibonacci(n):
global call_count
call_count += 1

try:
return fibonacci_values[n]
except KeyError:
fib = fibonacci(n-1) + fibonacci(n-2)
fibonacci_values[n] = fib
return fib

print(fibonacci(10))
print(call_count) # 19
Notice that this reduced the recursive calls from 177 to 19. We can even count
the number of times we have to perform calculations:
computation_count = 0

fibonacci_values = {0: 0, 1: 1}
def fibonacci(n):
try:
return fibonacci_values[n]
except KeyError:
global computation_count
computation_count += 1
fib = fibonacci(n-1) + fibonacci(n-2)
fibonacci_values[n] = fib
return fib

print(fibonacci(10))
print(computation_count) # 9
This shows that saving partial results can really pay off!

Writing recursive branching as loops


To show you how you can rewrite a recursive, branching function as a function
that uses while loops we will take a look at another sorting algorithm, called
merge sort. The way merge sort works is simple: to sort a list, you start by
sorting the first and last halves separately, and then you merge the two sorted
halves.
Written recursively, this might look something like this:
def merge(l1, l2):
result = []
while l1 and l2:

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 106


if l1[0] < l2[0]:
h, *l1 = l1
else:
h, *l2 = l2
result.append(h)
# One of the two lists is empty, the other contains the larger elements.
result.extend(l1)
result.extend(l2)
return result

def merge_sort(l):
"""Sort a list recursively with the merge sort algorithm."""

# Base case.
if len(l) <= 1:
return l
# Sort first and last halves.
m = len(l)//2
l1, l2 = merge_sort(l[:m]), merge_sort(l[m:])
# Now put them together.
return merge(l1, l2)
If you don’t want to have all this recursive branching, you can use a generic list
to keep track of all the sublists that are still to be sorted:
def merge(l1, l2):
"""Merge two lists in order."""

result = []
while l1 and l2:
if l1[0] < l2[0]:
h, *l1 = l1
else:
h, *l2 = l2
result.append(h)
# One of the two lists is empty, the other contains the larger elements.
result.extend(l1)
result.extend(l2)
return result

def merge_sort(l):
"""Sort a list with the merge sort algorithm."""

# Save all sorted sublists.


already_sorted = []
# Keep track of sublists that need sorting:
to_sort = [l]

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 107


while to_sort:
# Pick a list to be sorted.
lst, *to_sort = to_sort
# Base case.
if len(lst) <= 1:
already_sorted.append(lst)
else:
# Split in halves to sort each half.
m = len(lst) // 2
to_sort.append(lst[:m])
to_sort.append(lst[m:])

# Merge all the sublists.


while len(already_sorted) > 1:
l1, l2, *already_sorted = already_sorted
# Factored out the `merge` to keep this short.
already_sorted.append(merge(l1, l2))

return already_sorted[0]
! If you don’t really know what the h, *l1 = l1, h, *l2 = l2, ! lst, *to_sort
= to_sort and l1, l2, *already_sorted = already_sorted lines ! are do-
ing, you might want to have a look at ! this Pydon’t about unpacking with starred
assignments.
In this particular example, my translation of the merge sort to a non-recursive
solution ended up being noticeably larger than the recursive one. This just goes
to show that you need to judge all situations by yourself: would this be worth it?
Is there an imperative implementation that is better than this direct translation?
The answers to these questions will always depend on the programmer and the
context they are in.
This also shows that the way you think about the problem has an effect on the
way the code looks: even though this last implementation is imperative, it is a
direct translation of a recursive implementation and so it may not look as good
as it could!

Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Pydon’t recurse mindlessly.”
This Pydon’t showed you that:
• Python has a hard limit on the number of recursive calls you can make and
raises a RecursionError if you cross that limit;
• Python does not optimise tail recursive calls, and probably never will;

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 108


• tail recursive functions can easily be transformed into imperative func-
tions;
• recursive functions that branch can waste a lot of computation if no care
is taken;
• traversing something with pure recursion tends to create depth first tra-
versals, which might not be the optimal way to solve your problem; and
• direct translation of recursive functions to imperative ones and vice-versa
will probably produce sub-optimal code, so you need to align your mindset
with what you want to accomplish.

References
• Stack Overflow, “What is the maximum recursion depth in Python, and how
to increase it?”, https://stackoverflow.com/questions/3323001/what-is-
the-maximum-recursion-depth-in-python-and-how-to-increase-it.
• Stack Overflow, “Does Python optimize tail recursion?”, https://stackove
rflow.com/questions/13591970/does-python-optimize-tail-recursion.
• Neopythonic, Tail Recursion Elimination, http://neopythonic.blogspot.com
/2009/04/tail-recursion-elimination.html.
• Neopythonic, Final Words on Tail Calls, http://neopythonic.blogspot.com
/2009/04/final-words-on-tail-calls.html.
• Documentation, The Python Standard Library, Functional Programming
Modules, operator, https://docs.python.org/3/library/operator.html.
Online references last consulted on the 16th of February of 2021.

This book is a WIP. Original articles at https://mathspp.com/blog/pydonts. 109

You might also like