@codingexpert142 Zero To Py (2023 - New Release)
@codingexpert142 Zero To Py (2023 - New Release)
@codingexpert142 Zero To Py (2023 - New Release)
Michael Green
* * * * *
This is a Leanpub book. Leanpub empowers authors and publishers with the
Lean Publishing process. Lean Publishing is the act of publishing an in-
progress ebook using lightweight tools and many iterations to get reader
feedback, pivot until you have the right book and build traction once you
do.
* * * * *
And finally in Part VI we’ll dig into mechanisms for profiling and
debugging python. Furthermore when we identify specific pain points in
our implementation, we’ll see how to refactor our projects to use
performant C code for a boost in execution speed. We’ll also look into how
to use C debuggers to run the python interpreter, so we can debug our C
extensions.
Part I: A Whirlwind Tour
Chapter 0: System Setup
If you have a working instance of python installed and are comfortable
using it, feel free to skip this chapter.
This book is written against CPython 3.11, and many of the features
discussed are version-dependent. If your system comes with a prepackaged
version of python that is older than 3.11, you may consider looking into a
python version manager. Personally I recommend using pyenv to manage
your local and global python versions. My typical workflow is to use pyenv
to switch local versions to a target distribution, and then use that version to
create a local virtual environment using venv. This however is just one of
many opinions, and how you choose to configure your setup is completely
up to you.
Installing the python interpreter is the first step to getting started with
programming in python. The installation process is different for every
operating system. To install the latest version, Python 3.11, you’ll want to
go to the downloads page of the official website for the Python Software
Foundation (PSF), and click the “download” button for the latest release
version. This will take you to a page where you can select a version specific
for your operating system.
Python Versions
Python is versioned by release number, <major>.<minor>.<patch>. The
latest version is the one with the greatest major version, followed by the
greatest minor version, followed by the greatest patch version. So for
example, 3.1.0 is greater that 2.7.18, and 3.11.2 is greater than 3.10.8
(This is worth noting because in the download page on python.org, a patch
release for an older version may sometimes be listed above the latest patch
release for a newer version).
Windows
The PSF provides several installation options for windows users. The first
two options are for 32-bit and 64-bit versions of the python interpreter. Your
default choice is likely to be the 64-bit interpreter - the 32 bit interpreter is
only necessary for hardware which doesn’t support 64-bit, which nowadays
is atypical for standard desktop and laptop. The second two options are for
installing offline vs. over the web. If you intend to hold onto the original
installer for archive, the offline version is what you want. Else, the web
installer is what you want.
When you start the installer, a dialogue box will appear. There will be four
clickable items in this installer; an “Install Now” button, a “Customize
installation” button, a “install launcher for all users” checkbox, and a “Add
python to PATH” checkbox. In order to launch python from the terminal,
you will need to check the “add python to path” checkbox. With that, click
“Install Now” to install Python.
Linux
For linux users, installing the latest version of python can be done by
compiling the source code. First, download a tarball from the downloads
page and extract into a local directory. Next, install the build dependencies,
and run the ./configure script to configure the workspace. Run make to
compile the source code and finally use make altinstall to install the
python version as python<version> (this is so your latest version doesn’t
conflict with the system installation).
1 root@f26d333a183e:~# apt-get install wget tar make \
2 > gcc build-essential gdb lcov pkg-config libbz2-dev \
3 > libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \
4 > libncurses5-dev libreadline6-dev libsqlite3-dev \
5 > libssl-dev lzma lzma-dev tk-dev uuid-dev zlib1g-dev
6 root@f26d333a183e:~# wget
https://www.python.org/ftp/python/3.11.\
7 2/Python-3.11.2.tar.xz
8 root@f26d333a183e:~# file="Python-3.11.2.tar.xz"; \
9 > tar -xvf $file && rm $file
10 root@f26d333a183e:~# cd Python-3.11.2
11 root@f26d333a183e:~# ./configure
12 root@f26d333a183e:~# make
13 root@f26d333a183e:~# make altinstall
14 root@f26d333a183e:~# python3.11
15 Python 3.11.2 (main, Feb 13 2023, 18:44:04) [GCC 11.3.0] on linux
16 Type "help", "copyright", "credits" or "license" for more
informa\
17 tion.
18 >>>
What’s in a PATH?
Before moving forward, it’s worthwhile to talk a bit about the PATH variable
and what it’s used for. When you type a name into the terminal, python for
example, the computer looks for an executable on your machine that
matches this name. But it doesn’t look just anywhere; it looks in a specific
list of folders, and the PATH environment variable is that list.
1 root@2711ea43ad26:~# echo $PATH
2 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
3 root@2711ea43ad26:~# which python3.11
4 /usr/local/bin/python3.11
So for example, on this linux machine, the $PATH variable is a list of folders
separated by a colon (on Windows the separator is a semicolon, but the
concept is the same). One of those folders is the /usr/local/bin folder,
which is where our python3.11 binary was installed. This means we can
simply type python3.11 into our terminal, and the OS will find this binary
and execute it.
Many developers have many opinions on what the best setup is for writing
Python. This debate over the optimal setup is a contentious one, with many
developers holding strong opinions on the matter. Some prefer a lightweight
text editor like Sublime Text or Neovim, while others prefer an IDE like
PyCharm or Visual Studio Code. Some developers prefer a minimalist setup
with only a terminal and the interpreter itself, while others prefer a more
feature-rich environment with additional tools builtin for debugging and
testing. Ultimately, the best setup for writing Python will vary depending on
the individual developer’s needs and preferences. Some developers may
find that a certain setup works best for them, while others may find that
another setup is more suitable.
With that being said, if you’re new to python and writing software, it might
be best to keep your setup simple and focus on learning the basics of the
language. When it comes down to it, all you really need is an interpreter
and a .py file. This minimal setup allows you to focus on the core concepts
of the programming language without getting bogged down by the plethora
of additional tools and features. As you progress and gain more experience,
you can explore more advanced tools and setups. But when starting out,
keeping things simple will allow you to quickly start writing and running
your own code.
Running Python
The first mode is called a REPL. REPL is an acronym that stands for “read,
evaluate, print, loop”. These are the four stages of a cycle which python can
use to collect input from a user, execute that input as code, print any
particular results, and finally loop back to repeat the cycle.
To enter the python REPL, execute the python binary with no arguments.
1 root@2711ea43ad26:~# python3.11
2 Python 3.11.2 (main, Feb 14 2023, 05:47:57) [GCC 11.3.0] on linux
3 Type "help", "copyright", "credits" or "license" for more informa\
4 tion.
5 >>>
Within the REPL we can write code, submit it, and the python interpreter
will execute that code and update its state accordingly. To exit the REPL,
simply type exit() or press Ctrl-D.
Many examples in this textbook are depicted as code which was written in
the python REPL. If you see three right-pointing angle brackets >>>,
sometimes known as a chevron, this is meant to represent the right pointing
angle brackets of the python REPL. Furthermore, any blocks of code which
require multiple lines will be continuated with three dots ..., which is the
default configuration of the python REPL.
1 root@2711ea43ad26:~# python3.11
2 Python 3.11.2 (main, Feb 14 2023, 05:47:57) [GCC 11.3.0] on linux
3 Type "help", "copyright", "credits" or "license" for more
informa\
4 tion.
5 >>> for i in range(2):
6 ... print(i)
7 ...
8 0
9 1
10 >>>
A second mode is where you write python code in a file, and then use the
interpreter to execute that file. This is referred to as scripting, where you
execute your python script from the terminal using python ./script.py,
where the only argument is a filepath to your python script, which is a file
with a .py extension. Many examples in this textbook are depicted as
scripts. They all start with the first line as a comment # which will depict
the relative filepath of the script.
1 # ./script.py
2
3 for i in range(2):
4 print(i)
There are other ways to interact with the python interpreter and execute
python code, and the choice of which one to use will depend on your
specific needs and preferences. But for just starting off, these two means of
interacting with the interpreter are sufficient for our use cases.
Chapter 1. Fundamental data types
In the Python programming language, the data types are the fundamental
building blocks for creating and manipulating data structures.
Understanding how to work with different data structures is essential for
any programmer, as it allows you to store, process, and manipulate data in a
variety of ways. In this chapter, we will cover the basic data types of
Python; including integers, floats, strings, booleans, and various container
types. We will explore their properties, how to create and manipulate them,
and how to perform common operations with them. By the end of this
chapter, you will have a solid understanding of the basic data types in
Python and will be able to use them effectively in your own programs.
But before we talk about data structures, we should first talk about how we
go about referencing a given data structure in Python. This is done by
assigning our data structures to variables using the assignment operator, =.
Variables are references to values, like data structures, and the type of the
variable is determined by the type of the value that is assigned to the
variable. For example, if you assign an integer value to a variable, that
variable will be of the type int. If you assign a string value to a variable,
that variable will be of the type str. You can check the type of a variable by
using the built-in type() function.
1 >>> x = "Hello!"
2 >>> x
3 "Hello!"
4 >>> type(x)
5 <class 'str'>
Note: this example using type() is our first encounter of whats called a
“function” in Python. We’ll talk more about functions at a later point in this
book. For now, just know that to use a function, you “call” it using
parentheses, and you pass it arguments. This type() function can take one
variable as an argument, and it returns the “type” of the variable it was
passed - in this case x is a str type, so type(x) returns the str type.
It’s also important to note that, in Python, variables do not have a set data
type; they are simply a name that refers to a value. The data type of a
variable is determined by the type of the value that is currently assigned to
the variable.
Variables are not constant, meaning that their values and types can be
changed or reassigned after they are created. Or, once you have created a
variable and assigned a value to it, you can later change that value to
something else. For example, you might create a variable x and assign it the
value of 9, and later on in your program you can change the value of x to
"message".
1 >>> x
2 9
3 >>> x = "message"
4 >>> x
5 "message"
The primitive data types, also referred to as scalar data types, represent a
single value and are indivisible. The examples of of these data types include
the bool, the int, the float, the complex, and to a certain extent, the str
and bytes types. Primitive types are atomic, in that they do not divide into
smaller units of more primitive types. With the exception of the string
types, since they represent a single value, they cannot be indexed or
iterated.
On the other hand, container data types, also known as non-scalar data
types, represent multiple values, and are divisible. Some examples of
container data types include list, tuple, set, and dict. They are used to
store collections of values and allow for indexing and iteration (more on
that later). These types are built on top of the primitive data types and
provide a way to organize and manipulate more primitive data in a
structured way.
Python str and bytes types are somewhat unique in that they have
properties of both primitive and container data types. They can be used like
primitive data types, as a single, indivisible value; but they also behave like
container types in that they are sequences. It can be indexed and iterated,
where each letter can be accessed individually. In this sense, these types can
be said to be a hybrid that combine the properties of both primitive and
container data types.
Mutability vs Immutability
Below is a short introduction to the fundamental data types which are found
in Python. We’ll cover each data type more in depth as we progress
throughout this book.
Primitive Types
Booleans
The python boolean, represented by the bool data type, is a primitive data
type that has two possible values: True and False. It is used to represent
logical values and is commonly used to check whether a certain condition is
met.
1 >>> x = True
2 >>> type(x)
3 <class 'bool'>
Integers
The python integer, represented by the int data type, is a primitive data
type that is used to represent whole numbers. Integers can be positive or
negative and are commonly used in mathematical operations such as
addition, subtraction, multiplication, and division.
1 >>> x = 5
2 >>> type(x)
3 <class 'int'>
Integers are by default are coded in base 10. However, Python integers can
also be represented in different number systems, such as decimal,
hexadecimal, octal, and binary. Hexadecimal representation uses the base
16 number system with digits 0-9 and letters A-F. In Python, you can
represent a hexadecimal number by prefixing it with 0x. For example, the
decimal number 15 can be represented as 0xF in hexadecimal. Octal
representation uses the base 8 number system with digits 0-7. In Python,
you can represent an octal number by prefixing it with 0o. For example, the
decimal number 8 can be represented as 0o10 in octal. Binary representation
uses the base 2 number system with digits 0 and 1. In Python, you can
represent a binary number by prefixing it with 0b. For example, the decimal
number 5 can be represented as 0b101 in binary.
1 >>> 0xF
2 15
3 >>> 0o10
4 8
5 >>> 0b101
6 5
Floats
Complex
Strings
The python string, represented by the str data type, is a primitive data type
that is used to represent sequences of characters, such as "hello" or
"world". Strings are one of the most commonly used data types in Python
and are used to store and manipulate text. Strings can be created by
enclosing a sequence of characters in 'single' or "double" quotes. They
also can be made into multi-line strings using triple quotes (""" or ''').
Strings are immutable, meaning that once they are created, their value
cannot be modified.
1 >>> x = "Hello"
2 >>> type(x)
3 <class 'str'>
4 >>> _str = """
5 ... this is a
6 ... multiline string.
7 ... """
8 >>> print(_str)
9
10 this is a
11 multiline string.
12
13 >>>
Strings can contain special characters that are represented using backslash \
as an escape character. These special characters include the newline
character \n, the tab character \t, the backspace character \b, the carriage
return \r, the form feed character \f, the single quote ' the double quote "
and backslash character itself \.
Raw strings, represented by the prefix r before the string, are used to
represent strings that should not have special characters interpreted. For
example, a raw string like r"c:\Windows\System32" will include the
backslashes in the string, whereas a normal string would interpret them as
escape characters.
1 >>> "\n"
2 '\n'
3 >>> r"\n"
4 '\\n'
Byte strings, represented by the bytes data type, are used to represent
strings as a sequence of bytes. They are typically used when working with
binary data or when working with encodings that are not Unicode. Byte
strings can be created by prefixing a string with a b, for example, b"Hello".
1 >>> type(b"")
2 <class 'bytes'>
3 >>>
Container Types
The python tuple, represented by the tuple data type, is a container data
type that is used to store an ordered collection of items. A tuple is
immutable; once it is created, its items cannot be modified. Tuples are
created by writing a sequence of primitives separated by commas. While
technically optional, it’s common to also enclude a set of enclosing
parentheses. For example, a tuple of integers can be written as (1, 2, 3)
and a tuple of strings ("one", "two", "three"). The primitives contained
within a tuple can be of varying types.
1 >>> my_tuple = (1, "two", 3)
2 >>> type(my_tuple)
3 <class 'tuple'>
4 >>> # parentheses are optional (though recommended)
5 >>> 4, 5, 6
6 (4, 5, 6)
The python list, represented by the list data type, is a container data type
that is used to store an ordered collection of items. A list is similar to a
tuple, but it is mutable, meaning that its items in the container can be
modified after it is created. Lists are created by enclosing a sequence of
primitives in square brackets, separated by commas. For example, a list of
integers [1, 2, 3] or a list of strings ["one", "two", "three"]. Again,
these primitives can be of varying types.
1 >>> my_list = [4, "five", 6]
2 >>> type(my_list)
3 <class 'list'>
Both tuples and lists in Python can be indexed to access the individual
elements within their collection. Indexing is done by using square brackets
[] and providing the index of the element you want to access. Indexing
starts at 0, so the first element in the tuple or list has an index of 0, the
second element has an index of 1, and so on.
Both tuples and lists use this convention for getting items from their
respective collections. Lists, given their mutability, also use this convention
for setting items.
1 >>> my_tuple = ("Peter", "Theo", "Sarah")
2 >>> my_tuple[0]
3 "Peter"
4 >>> my_list = ["Bob", "Karen", "Steve"]
5 >>> my_list[1]
6 "Karen"
7 >>> my_list[1] = "Sarah"
8 >>> my_list
9 ["Bob", "Sarah", "Steve"]
Negative indexing can also be used to access the elements of the tuple or
list in reverse order. For example, given the tuple my_tuple = ("Peter",
"Theo", "Sarah"), you can access the last element by using the index
my_tuple[-1]. Similarly, if you have a list my_list = ["Bob", "Karen",
"Steve"], you can access the second-to-last element by using the index
my_list[-2].
When you try to access an index that is out of bounds of a list or a tuple,
Python raises an exception called IndexError (we’ll talk more about
exceptions later, but for now just consider exceptions as how python
indicates that something went wrong). Or in other words, if you try to
access an element at an index that does not exist, Python will raise an
IndexError.
Dictionaries
The python dictionary, represented by the dict data type, is a container data
type that stores key-value pairs. Dictionaries are created by enclosing a
sequence of items in curly braces {}, with each key-value pair separated by
a colon. For example, a dictionary of integers {1: "one", 2: "two", 3:
"three"} or a dictionary of strings {"apple": "fruit", "banana":
"fruit", "cherry": "fruit"}. Empty dictionaries can be created simply
using {}.
To get an item from a dictionary, you can once again use the square bracket
notation [] and provide a key as the index. For example, if you have a
dictionary my_dict = {"apple": "fruit", "banana": "fruit",
"cherry": "fruit"}, you can get the value associated with the key
"apple" by using the notation my_dict["apple"], which would return
"fruit". Requesting a value from a dictionary using a key which is not
indexed causes python to raise a KeyError exception.
1 >>> my_dict = {"apple": "fruit", "banana": "fruit", "cherry": "fr\
2 uit"}
3 >>> my_dict["apple"]
4 "fruit"
5 >>> my_dict["orange"]
6 Traceback (most recent call last):
7 File "<stdin>", line 1, in <module>
8 KeyError: "orange"
Sets
The python set, represented by the set data type, is a container data type that
is used to store an unordered collection of unique items. Sets are created by
enclosing a sequence of items in curly braces {} and separated by commas.
For example, a set of integers {1, 2, 3} or a set of strings {"bippity",
"boppity", "boop"}. It is important to note that you can not create an
empty set outside of using a constructing function (a constructor), as {} by
default creates a dictionary. The constructor for creating an empty set is
set().
Python also includes a data type called frozenset that is similar to the set
data type, but with one key difference: it is immutable. This means that
once a frozenset is created, its elements cannot be added, removed, or
modified. Frozensets are created by using the frozenset() constructor; for
example, frozenset({1, 2, 3}) or frozenset([4, 5, 6]).
1 >>> my_set = {"Michael", "Theo", "Michael"}
2 >>> my_set
3 {"Theo", "Michael"}
4 >>> frozenset(my_set)
5 frozenset({'Theo', 'Michael'})
Python Objects
For example, the following code creates two variables, x and y, and assigns
the value 257 to both of them. Even though the values of x and y are the
same, they are different objects in memory and therefore have different
identities. Even though the values are the same, x and y are given two
different id values, because they are different objects.
1 >>> x = 257
2 >>> y = 257
3 >>> id(x)
4 140097007292784
5 >>> id(y)
6 140097007292496
It is generally the case that variable assignment creates an object that has a
unique identity. However, there are some exceptions to this rule.
There are specific values which Python has elected to make as what is
called a singleton. Meaning, that there will only ever be a single instance of
this object. For example, the integers -5 through 256 are singletons, because
of how common they are in everyday programming. The values None,
False, and True are singletons as well.
1 >>> x = 1
2 >>> y = 1
3 >>> id(x)
4 140097008697584
5 >>> id(y)
6 140097008697584
References to Objects
In Python, variables are references to objects, which means that when you
create a variable and assign a value to it, the variable does not store the
value directly, it instead stores a reference to the object that contains the
value.
It’s worth noting that when you work with mutable objects, like lists or
dictionaries, you have to be aware that if you assign a variable to another
variable, both of the variables fundamentally reference the same object in
memory. This means that any modification made to the object from one
variable will be visible to the other variable.
1 x = {} y = x x["fizz"] = "buzz"
2 ------ ----- -----
3 x -- {} x x
4 \ \
5 {} {"fizz": "buzz"}
6 / /
7 y y
Arithmetic
1 >>> x = 5
2 >>> y = 2
3 >>> x + y
4 7
5 >>> x - y
6 3
7 >>> x * y
8 10
9 >>> x / y
10 2.5
11 >>> x // y
12 2
13 >>> x % y
14 1
15 >>> x ** y
16 25
Assignment
Assignment operators are used to assign values to variables. The most basic
assignment operator in Python is the = operator (of which we’ve made
plenty of use so far).
For example:
1 >>> my_var = "Hello!"
+= - adds the right operand to the left operand and assigns the result to
the left operand
-= - subtracts the right operand from the left operand and assigns the
result to the left operand
*= - multiplies the left operand by the right operand and assigns the
result to the left operand
/= - divides the left operand by the right operand and assigns the result
to the left operand
%= - takes the modulus of the left operand by the right operand and
assigns the result to the left operand
//= - floor divides the left operand by the right operand and assigns the
result to the left operand
**= - raises the left operand to the power of the right operand and
assigns the result to the left operand
:= - expresses a value while also assigning it
For example:
1 >>> # () can contain expressions: this assigns 2 to y,
2 >>> # and expresses the 2 which is added to 3 (more on this
later)
3 >>> x = 3 + (y := 2)
4 >>> y
5 2
6 >>> x
7 5
8 >>> x += 5 # is equivalent to x = x + 5
9 >>> x
10 10
11 >>> x *= 2 # is equivalent to x = x * 2
12 >>> x
13 20
Packing operators
Python also defines two unpacking operators which can be used to assign
the elements of a sequence to multiple variables in a single expression.
These include:
* - iterable unpacking, unpacks a sequence of items
** - dictionary unpacking, unpacks dictionaries into key-value pairs
The ** operator can also be used for unpacking operations, though in this
case the operator expects operands which are dictionaries. For example, we
can merge two smaller dictionaries into one large dictionary by unpacking
the two dictionaries during construction of the merged dictionary.
1 >>> dict_one = {1: 2, 3: 4}
2 >>> dict_two = {5: 6}
3 >>> {**dict_one, **dict_two}
4 {1: 2, 3: 4, 5: 6}
Comparison
Comparison operators are used to compare two values and return a Boolean
value (True or False) based on the outcome of the comparison. These
include:
Logical
1 >>> x = 4
2 >>> y = 2
3 >>> z = 8
4 >>> (x > 2 or y > 5) and z > 6
5 True
What is Truthy?
False
None
0
0.0
Empty collections (i.e. [], (), {}, set(), etc.)
Objects which define a __bool__() method that returns falsy
Objects which define a __len__() method that returns falsy
Short-Circuits
It’s important to note that the and and or operators are short-circuit
operators, which means that they exit their respective operations eagerly. In
the case of and operator, it only evaluates the second operand if the first
operand is True, and in the case of or operator, it only evaluates the second
operand if the first operand is False.
Logical Assignments
This example assigns the variable x the value 3 because the empty string is
falsy.
1 >>> x = 0 and 6
2 >>> x
3 0
This example assigns the variable x the value 0 because zero is falsy, and
since and requires the first operand to be truthy, the resulting assignment is
0.
1 >>> x = 3 and 6
2 >>> x
3 6
This example assigns the variable x the value 6 because 3 is truthy, and
since and requires the first operand to be truthy, the resulting assignment
falls to the second operand.
However, we can also do this using only logical operators, and without a
function call.
1 >>> x = not not (3 and 6)
2 >>> x
3 True
Membership
Bitwise
It’s also worth noting that Python also has support for the analogous bitwise
assignment operators, such as &=, |=, ^=, <<=, and >>=, which allow you to
perform a bitwise operation and assignment in a single statement.
Identity
is - returns True if the operands are the same object, False otherwise
is not - returns True if the operands are not the same object, False
otherwise
You can use this identity operator to check against singletons, such as None,
False, True, etc.
1 # ./script.py
2
3 x = None
4 if x is None:
5 print("x is None!")
It’s important to note that two objects can have the same value but different
memory addresses; in this case the == operator can be used to check if they
have the same value and the is operator can be used to check if they have
the same memory address.
1 >>> x = [1, 2, 3]
2 >>> y = [1, 2, 3]
3 >>> z = x
4
5 >>> z is x
6 True
7 >>> x is y
8 False
9 >>> x == y
10 True
This example shows that even though x and y contain the same elements,
they are two different lists, so x is not y returns True. The variables x and
z however point to the same list, so x is z returns True.
Chapter 3. Lexical Structure
Up until now, we’ve been working with Python in the context of single-line
statements. However, in order to proceed to writing full python scripts and
complete programs, we need to take a moment to discuss the lexical
structure which goes into producing valid Python code.
Lexical Structure refers to the smallest units of code that the language is
made up of, and the rules for combining these units into larger structures. In
other words, it is the set of rules that define the syntax and grammar of the
language, and they determine how programs written by developers will be
interpreted.
Line Structure
In this example, we see how the python script ‘script.py’ looks in our editor.
We can contrast that to how the computer interpret the file, as a sequence of
characters separated by newline characters (as this is on a linux machine,
the newline character is \n).
Comments
A comment in Python begins with the hash character # and continues until
the end of the physical line. A comment indicates the end of the logical line
unless the rules for implicit line joining are applied. Comments are ignored
by the syntax and are not executed by the interpreter.
1 >>> # this is a comment, and the next
2 >>> # line is executable code.
3 >>> this = True
Two or more physical lines in Python can be joined into a single logical line
using backslash characters \. This is done by placing a backslash at the end
of a physical line, which will be followed by the next physical line. The
backslash and the end-of-line character are removed, resulting in a single
logical line.
1 >>> # you can join multiple physical lines to make a single logic\
2 al line
3 >>> x = 1 + 2 + \
4 ... 3 + 4
5 ...
6 >>> x
7 10
In this example, the logical line “x = 1 + 2 + 3 + 4” is created by
combining two physical lines, separated by the backslash.
It should be noted that a line that ends with a backslash cannot contain a
comment.
1 >>> x = 1 + 2 \ # comment
2 File "<stdin>", line 1
3 x = 1 + 2 \ # comment
4 ^
5 SyntaxError: unexpected character after line continuation charact\
6 er
In this example, all the expressions are spread across multiple physical
lines, but they are still considered as a single logical line, because they are
enclosed in parentheses, square brackets, or curly braces.
Indentation
For example, in the following code snippet, the lines of code that are
indented under the if statement are considered to be within the scope of the
if statement and will only be executed if the condition x > 0 is True.
1 >>> x = 5
2 >>> if x > 0:
3 ... print("x is positive")
4 ... x = x - 1
5 ... print("x is now", x)
6 ...
7 x is positive
8 x is now 4
The indentation of the python code is crucial; if it is not correct, it will raise
an error, or worse, will do something you did not expect, leading to bugs.
Finally, the indentation must be consistent throughout the program, usually
4 spaces or a tab.
Chapter 4. Control Flow
Control flow is a fundamental concept in programming that allows a
developer to control the order in which code is executed. In Python, control
flow is achieved through the use of conditional statements (such as if/else)
and looping constructs (such as for and while loops). By using these
constructs, developers can create logical conditions for code execution and
repeat blocks of code as necessary. Understanding control flow is essential
for creating Python programs. In this chapter, we will cover the basics of
control flow, focusing on conditional statements, loops, and other related
concepts.
In addition to the if statement, Python also provides the elif (short for
“else if”) and else statements, which can be used to specify additional
blocks of code to be executed if the initial condition is not met.
1 if condition:
2 # code to be executed if `condition` is truthy
3 elif other_condition:
4 # code to be executed if `condition` is falsy
5 # and `other_condition` is truthy
6 else:
7 # code to be executed if both conditions are falsy
Question 1:
What values would trigger the if statement to execute?
1. condition = ["", None] and other_condition = True
2. condition = "" and other_condition = True
3. condition = "" and other_condition = None
Question 2:
What values would trigger the elif statement to execute?
while
break
The break statement is a control flow construct that can be used to exit
loops, including the while loop. When the interpreter encounters a break
statement within a loop, it immediately exits the loop and continues
execution at the next line of code following the loop.
1 >>> condition = 0
2 >>> while condition > 10:
3 ... if condition == 5:
4 ... break
5 ... condition -= 1
6 ...
7 >>> condition
8 5
continue
In instances where you would like to skip an iteration of a loop and move
onto the next, you can use a continue statement. When the interpreter
encounters a continue statement within a loop, it immediately skips the rest
of the code in the current iteration of the block, and continues on to the next
iteration of the loop.
1 >>> i = 0
2 >>> skipped = {2, 4}
3 >>> while i <= 5:
4 ... i += 1
5 ... if number in skipped:
6 ... continue
7 ... print(number)
8 ...
9 1
10 3
11 5
for
Here, the variable item will take on the value of each item in the sequence
in turn, allowing you to operate on each item of the collection one at a time.
What is an Iterable?
In this case, numbers is an iterable, where each item is a number, and the
items can be accessed in order by their index.
A string is also a collection of items, where each item is a letter, and the
items can also be accessed by their index.
1 >>> word = "hello"
2 >>> for letter in word:
3 ... print(letter)
4 ...
5 h
6 e
7 l
8 l
9 o
In this case, word is an iterable where each item is a character, and the items
can be accessed by their position in the string.
The for/else construct is a special combination of the for loop and the
else statement. It allows you to specify a block of code (the else block)
that will be executed after the for loop completes, but only if the loop
completes without a break. This means that if the loop is exited early using
a break statement, the else block will not be executed.
1 >>> numbers = [1, 2, 3, 4, 5]
2 ... for number in numbers:
3 ... if number == 4:
4 ... print("Found 4, exiting loop.")
5 ... break
6 ... print(number)
7 ... else:
8 ... print("Loop completed normally.")
9 ...
10 1
11 2
12 3
13 Found 4, exiting loop.
In this example, the for loop iterates over the numbers in the list. When it
encounters the number 4, it exits the loop using the break statement. Since
the loop was exited early, the else block is not executed, and the message
“Loop completed normally” is not printed.
The for/else construct can be useful in certain situations where you want
to take different actions depending on whether the loop completed normally
or was exited early. However, it’s not a very common construct, and in most
cases, it can be replaced by a simple if statement after the loop.
Exception Handling
Each try block can have one or more except blocks, so as to handle
multiple different types of errors. Each except block can have multiple
exceptions to match against, by using a tuple of exceptions. The exception
in an except block can be assigned to a variable using the as keyword.
When an Exception is raised in the try block, the interpreter checks each
except block in the order which they were defined to see if a particular
exception handler matches the type of exception that was raised. If a match
is found, the code in the corresponding except block is executed, and the
exception is considered handled. If no except block matches the exception
type, the exception continues to bubble up the call stack until it is either
handled elsewhere or the program terminates.
1 >>> try:
2 ... raise ArithmeticError
3 ... except (KeyError, ValueError):
4 ... print("handle Key and Value errors")
5 ... except ArithmeticError as err:
6 ... print(type(err))
7 ...
8 <class 'ArithmeticError'>
Most exceptions are derived from the Exception class. As such, a general
except Exception will match most raised exceptions. There are some
instances where this may not be the desired effect; it is generally better to
be precise with your exception handling, so that unexpected errors aren’t
caught accidentally. In a similar vein, it’s worth noting that not all
exceptions need to be handled, and sometimes it may be better to let the
program crash if it encounters an error that it cannot recover from.
raise from
The raise from idiom can be used to re-raise an exception which was
caught in an except block. This is useful in instance where we want to add
more context to an exception, so as to give more meaning to a given
exception that was raised.
1 >>> try:
2 ... x = int("abc")
3 ... except ValueError as e:
4 ... raise ValueError(f"Variable `x` recieved invalid input:
{\
5 e}") from e
6 ...
7 Traceback (most recent call last):
8 File "<stdin>", line 2, in <module>
9 ValueError: invalid literal for int() with base 10: 'abc'
10
11 The above exception was the direct cause of the following
excepti\
12 on:
13
14 Traceback (most recent call last):
15 File "<stdin>", line 4, in <module>
16 ValueError: Variable `x` recieved invalid input: invalid literal
\
17 for int() with base 10: 'abc'
In the second case, the pattern is (0, y), which matches if the first value of
the tuple is 0, and the second value can be any value, represented by the
variable y. If this pattern is matched, the string “coordinate is on the x axis,
y=<value of y>” is printed, where <value of y> is replaced with the value of
y. This pattern is matched to the coordinate case, so the variable y is
assigned, and the case block executes, printing the formatted string literal.
In the third case statement, the pattern is simply _, which is the wildcard
pattern that matches any value. This case would be executed if no prior
cases were to match the expression. Since the second case was a match,
this block was passed over.
In this example, we’re using a match statement to check both the type and
the value of the given item. The first case defines a class pattern for
matching int types if the item matches the pattern of 0 or -1. This pattern
fails to match because item = 1. The second case defines a class pattern for
matching all int types, since a pattern is not provided in the type call. This
pattern matches the item, so the case block is executed.
guards
In this example, the first case matches because both the pattern of the
coordinate matches the case, and the expression y == 1 is True.
Or Patterns
If an or pattern is used for variable binding, each pattern should define the
same variable names. If different patterns define different variable names,
the case will throw a SyntaxError.
1 >>> coordinate = (0, 1)
2 >>> match coordinate:
3 ... case (0, y) | (x, 0):
4 ... print((
5 ... "coordinate is on an axis,"
6 ... f" x={x}, y={y}"
7 ... ))
8 ...
9 File "<stdin>", line 2
10 SyntaxError: alternative patterns bind different names
as
In this example, the or pattern is used in the case definition, and each
pattern in the case maps the values of the coordinate to the variables x and
y. Both variables are defined in each pattern, so the case does not throw a
syntax error. The as keyword is used to bind matching values to variables,
which can subsequently be used in the case block when the pattern is
matched.
Chapter 5. Functions
Functions are some of the most fundamental building blocks in
programming. They allow you to encapsulate pieces of code and reuse it
multiple times throughout your program. This is beneficial, in that it helps
you avoid repeating the same code in multiple places, which can lead to
bugs and make your code harder to maintain. In short, functions are a
powerful tool that allow you to write better, more efficient, and more
maintainable code.
For example, the following code defines a simple function called greet that
takes in a single argument, name, and returns a greeting using that
parameter:
1 >>> def greet(name):
2 ... greeting = f"Hello, {name}!"
3 ... return greeting
4 ...
5 >>> greet("Ricky")
6 'Hello, Ricky!'
Functions are not executed immediately when they are defined. Instead,
they are executed only when they are called. This means that when a script
or program is running, the interpreter will read the function definition, but it
will not execute the code within the function block until the program calls
the function.
When this script is run, the Python interpreter first reads the function
definition. The function is defined, but the function code is not executed
until the function is called on the line my_function(). As a result, the
output will be:
1 root@b854aeada00a:~/code# python script.py
2 Script start
3 Function called
4 Script end
Once a function is defined, it can be called multiple times, and each time it
will execute the code inside the function.
Function Signatures
A function signature is the combination of the function’s name and its input
parameters. In Python, the function signature includes the name of the
function after the def keyword, followed by the names of the function
parameters contained in parentheses.
1 function name
2 |
3 | /--/------- function parameters
4 def add(a, b):
The function name is the identifier that is used to call the function. It should
be chosen to be descriptive and meaningful, so other developers can
ascertain the function’s purpose.
The parameters of a function are variables that are assigned in the scope of
the body of the function when it is called. In Python, the parameters are
defined in parentheses following the function name. Each parameter has a
name, which is used to reference values within the function. It’s also
possible to define a function with no parameters by using empty
parentheses.
When we call this function using positional arguments only, the first
argument is assigned to the name a and the second argument is assigned to
the name b. In the first function call, we pass the values 1, 2. Since 1 is the
argument passed first, it is assigned to the name a, and subsequently 2 is
assigned to b. The function returns a - b, which in this case is 1 - 2 which
results to -1.
If we reverse the order of the arguments in the second function call, passing
the values 2, 1, the 2 is now assigned to a and the 1 is assigned to b. The
function still returns the result of a - b, but in this case the expression is 2
- 1, so the function returns 1.
In the final function call, we explicitly assign values to the argument names
using the key=value syntax, by passing the values b=2, a=1. Keyword
arguments take preferential assignment over positional arguments, so even
though the 2 is passed first in the function signature, it is explicitly assigned
to b, where a is explicitly assigned to 1. This again results in returning the
expression 1 - 2, so the function returns -1.
Function calls can make use of both positional and keyword arguments, but
any positional arguments must be listed first when calling the function.
Given this flexibility, it’s possible to accidentally call a function with a
keyword argument which references a name which was already assigned a
value via a positional argument. Yet multiple arguments cannot be assigned
to the same name, so doing this will cause the interpreter to raise a
TypeError.
Explicitly positional/key-value
Default Values
Unless this shared state is desired, it’s better to use immutable state for
default values, or a unique object to serve as an identity check.
1 >>> def my_function(
2 ... value,
3 ... default=(UNDEFINED := object())
4 ... ):
5 ... if default is UNDEFINED:
6 ... default = []
7 ... default.append(value)
8 ... return default
9 ...
10 >>> x = my_function(0)
11 >>> y = my_function(1)
12 >>> y
13 [1]
14 >>> x
15 [0]
Scope
Up until this point, the code we’ve written has made no sort of distinction
between when access to a variable is considered valid. Once a variable has
been defined, later code is able to access its value and make use of it.
In this example, the variable x is defined within the local scope of the
function my_function, and it can only be accessed within the block scope
of the function. If you try to access it outside the function, you will get an
error.
Nested Scopes
In Python, the block scopes can be nested. A nested scope is a scope that is
defined within another scope. Any scope which is nested can read variables
from scopes which enclose the nested scope.
Closures
Anonymous Functions
Anonymous functions are functions that are defined without a name. They
are also known as lambda functions. Anonymous functions are defined
using the lambda keyword, followed by one or more arguments, a colon :
and finally an expression that defines the function’s behavior.
For example, the following code defines an anonymous function that takes
one argument and returns its square:
1 >>> square = lambda x: x**2
2 >>> square(5)
3 25
Lambda functions typically used when you need to define a function that
will be used only once, or when you need to pass a small function as an
argument to another function.
Decorators
The pass keyword is used as a placeholder and does not do anything, but it
is needed in this case to create an empty class.
Inside the block scope of a class, you can define methods that are associated
with an object or class. Methods are used to define the behavior of an object
and to perform operations on the properties of the object.
Methods are defined inside a class using the def keyword, followed by the
name of the method, a set of parentheses which contain the method’s
arguments, and finally a colon :. The first parameter of a standard method
is always a variable which references the instance of the class. By
convention, this is typically given the name self.
For example:
1 >>> class MyClass:
2 ... def my_method(self):
3 ... print("Hello from my_method!")
4 ...
Methods can also take additional parameters and return values, similar to
functions.
1 >>> class MyClass:
2 ... def add(self, a, b):
3 ... return a + b
4 ...
5 ...
6 >>> my_object = MyClass()
7 >>> result = my_object.add(2, 3)
8 >>> result
9 5
Finally, classes can subclass other classes, which allows them to derive
functionality from a superclass. This concept is generally known as
inheritance, which will be discussed in more depth at a later time.
1 >>> class YourClass:
2 ... def subtrace(self, a, b):
3 ... return a - b
4 ...
5 >>> class MyClass(YourClass):
6 ... pass
7 ...
8 >>> my_object = MyClass()
9 >>> result = my_object.subtract(2, 3)
10 >>> result
11 -1
You might have noticed that earlier in the book when we called the type()
function on an object, the return specified that the type was oftentimes a
class.
1 >>> x = "Hello!"
2 >>> x
3 "Hello!"
4 >>> type(x)
5 <class 'str'>
In Python, all of the built-in data types, such as integers, strings, lists, and
dictionaries, are implemented in a manner similar to classes. And similar to
user-defined classes, all of the built-in data types in Python have associated
methods that can be used to manipulate their underlying data structures. For
example, and as we just saw, the str class defines a method called join().
This join method takes the instance of the calling object, a string, as self,
and a collection of other strings which are joined using self as the
separator.
1 >>> " ".join(('a', 'b', 'c'))
2 'a b c'
Given this, calling the class method is equally valid; in this case we’re
simply passing the self parameter explicitly.
1 >>> str.join(" ", ('a', 'b', 'c'))
2 'a b c'
__dunder__ methods
For now we’re only going to focus on one particular method, the __init__
method, and we’ll return to review the others in later chapters.
Once you have defined the __init__ method, you can create an instance of
the class, and initialize any attributes you wish to assign the instance. This
is done by passing in the values as arguments when you create the object by
calling it. Those values will be passed into the call of the __init__ method,
where they can be assigned to the instance self via an instance attribute.
1 >>> class MyClass:
2 ... def __init__(self, my_value):
3 ... self.my_value = my_value
4 ...
5 >>> my_object = MyClass(5)
6 >>> my_object.my_value
7 5
Attributes
In Python, attributes are a way to define variables that are associated with
an object or class. They are used to store the state of an object and can be
accessed or modified using the dot notation.
There are different ways to define attributes in Python, but the most
common approach is to use instance variables, which can be defined inside
any class method using the self keyword.
You can access, assign, or modify attribute values using the dot notation.
For example, the following code creates an instance of the MyClass class
and sets its my_value attribute to 10:
1 >>> my_object = MyClass(5)
2 >>> my_object.my_value
3 5
4 >>> my_object.my_value = 10
5 >>> my_object.my_value
6 10
Class Attributes
Class attributes are variables that are defined at the class level, rather than
the instance level. They are shared amongst all instances of a class and can
be accessed using either the class name, or an instance of the class.
1 >>> class MyClass:
2 ... class_attribute = "I am a class attribute."
3 ...
4 >>> obj1 = MyClass()
5 >>> obj2 = MyClass()
6
7 >>> obj1.class_attribute
8 I am a class attribute.
9 >>> obj2.class_attribute
10 I am a class attribute.
11 >>> MyClass.class_attribute
12 I am a class attribute.
It’s worth noting that the same precaution about using default function
arguments also applies to class attributes. Since class attributes are shared
across all instances of the class, mutable state should likewise be avoided.
1 >>> class One:
2 ... items = []
3 ...
4 >>> a = One()
5 >>> b = One()
6 >>> a.items.append(1)
7 >>> b.items
8 [1]
A Functional Approach
@staticmethod
@classmethod
@classmethod is a decorator that is used to define class methods. A class
method is a method that is bound to the class and not the instance of the
object. It can be called on the class itself, as well as on any instance of the
class. A class method takes the class as its first argument, typically aliased
as cls.
1 >>> class MyClass:
2 ... def __init__(self, value=None):
3 ... self.value = value
4 ...
5 ... @classmethod
6 ... def class_method(cls, value):
7 ... return cls(value)
8 ...
9 >>> MyClass.class_method("Hello World!")
10 <__main__.MyClass object at 0x7fe83b29c040>
11 >>> obj = MyClass()
12 >>> obj.class_method("Hello World!")
13 <__main__.MyClass at 0x7fe83b39c6d0>
14 >>> class YourClass(MyClass): pass
15 >>> YourClass.class_method("Hello World!")
16 <__main__.YourClass at 0x7fe83bdec3d0>
Generator Expressions
Generator expressions are useful when working with large data sets or when
the values in the expression are the result of some expensive computation.
Because they generate values on-the-fly, they can be more memory-efficient
than creating a list in memory.
They can also be used to build powerful and efficient iterator pipelines.
Using iterators in this fashion allows you to perform multiple operations on
data without necessitating that each operation hold intermediary values in
memory.
1 >>> numbers = range(10)
2 >>> # generator expression, doesn't
3 >>> #create intermediary state
4 >>> even_numbers = (
5 ... n for n in range(10) if n % 2 == 0
6 ... )
7 >>> # another generator expression, doesn't
8 >>> # create intermediary state
9 >>> squared_numbers = (n**2 for n in even_numbers)
10 >>> # state only materializes in the final list
11 >>> list(squared_numbers)
12 [0, 4, 16, 36, 64]
Generator Functions
yield from
The yield from statement can be used to pass the context of an iteration
from one generator to the next. When a generator executes yield from, the
control of an iteration passes to the called generator or iterator. When that
generator is exhausted, control is yielded back to the calling generator for
further iteration.
1 >>> def _generator():
2 ... yield 2
3 ...
4 >>> def generator():
5 ... yield 1
6 ... yield from _generator()
7 ... yield 3
8 ...
9 >>> for i in generator():
10 ... print(i)
11 1
12 2
13 3
List Comprehensions
List comprehensions are generally faster than their equivalent python for
loop, because the code that generates the list is fully implemented in C.
1 >>> [x**2 for x in range(4)]
2 [0, 1, 4, 9]
Dictionary Comprehensions
This however is not strictly necessary; for example the following dictionary
comprehension creates a dictionary that maps some letters of the alphabet to
their corresponding ASCII values:
1 >>> {chr(i): i for i in range(97, 100)}
2 {'a': 97, 'b': 98, 'c': 99}
Type Conversions
Type conversions are the process of converting one data type to another
data type. Python provides the built-in functions int(), float(),
complex(), str(), bytes(), list(), dict(), set(), and frozenset() to
use for type conversions.
1 >>> complex(1, 2)
2 (1+2j)
Mathematical Functions
The min() function returns the smallest item in an iterable or the smallest of
two or more arguments. An optional key keyword argument can be
provided, which is a function to be applied to each item before comparison.
If the item is an iterable, a default keyword argument can also be
provided, in case the iterable is empty.
1 >>> min([1, 2, 3, 4, 5])
2 1
3 >>> min(1, 2, 3, 4, 5)
4 1
5 >>> min('abc', key=lambda x: ord(x))
6 'a'
7 >>> min((), default=0)
8 0
The max() function returns the largest item in an iterable or the largest of
two or more arguments. The same optional keyword values from min apply.
1 >>> max([1, 2, 3, 4, 5])
2 5
3 >>> max(1, 2, 3, 4, 5)
4 5
5 >>> max('abc', key=lambda x: ord(x))
6 'c'
7 >>> max((), default=0)
8 0
The sum() function returns the sum of all items in an iterable. It also
accepts an optional second argument which is used as the starting value.
1 >>> sum([1, 2, 3, 4, 5])
2 15
3 >>> sum([1, 2, 3, 4, 5], 10)
4 25
The all() and any() functions are used to check if all or any of the
elements in an iterable are truthy. The all() function returns True if all
elements in an iterable are truthy and False otherwise. The any() function
returns True if any of the elements in an iterable are truthy, and False
otherwise.
1 >>> all([True, True, True])
2 True
3 >>> all([True, True, False])
4 False
5 >>> all([0, 1, 2])
6 False
7 >>> all([])
8 True
9
10 >>> any([True, True, True])
11 True
12 >>> any([True, True, False])
13 True
14 >>> any([0, 1, 2])
15 True
16 >>> any([])
17 False
all() and any() can also be used with an expression to check if all or any
elements in a sequence meet a certain condition.
1 >>> my_list = [1, 2, 3, 4]
2 >>> all(i > 0 for i in my_list)
3 True
4 >>> any(i < 0 for i in my_list)
5 False
dir()
The dir() function is used to find out the attributes and methods of an
object. When called without an argument, dir() returns a list of names in the
current local scope or global scope. When called with an argument, it
returns a list of attribute and method names in the namespace of the object.
1 >>> dir()
2 ['__annotations__', '__builtins__', '__doc__',
3 '__loader__', '__name__', '__package__', '__spec__']
4 >>> dir(list)
5 ['__add__', '__class__', '__contains__', '__delattr__',
6 '__delitem__', '__dir__', '__doc__', '__eq__',
7 '__format__', '__ge__', '__getattribute__',
8 '__getitem__', '__gt__', '__hash__', '__iadd__',
9 '__imul__', '__init__', '__init_subclass__',
10 '__iter__', '__le__', '__len__', '__lt__', '__mul__',
11 '__ne__', '__new__', '__reduce__', '__reduce_ex__',
12 '__repr__', '__reversed__', '__rmul__', '__setattr__',
13 '__setitem__', '__sizeof__', '__str__',
14 '__subclasshook__', 'append', 'clear', 'copy', 'count',
15 'extend', 'index', 'insert', 'pop', 'remove',
16 'reverse', 'sort']
enumerate()
The eval() and exec() functions are built-in Python functions that are used
to evaluate and execute code, respectively.
The eval() function takes a least one argument, which is a string
containing a valid Python expression, and evaluates it, returning the result
of the expression. It also takes optional globals and locals arguments
which act as global and local namespaces. If these values aren’t provided,
the global and local namespaces of the current scope are used.
1 >>> x = 1
2 >>> y = 2
3 >>> eval("x + y")
4 3
5 >>> eval("z + a", {}, {"z": 2, "a": 3})
6 5
1 >>> x = 1
2 >>> y = 2
3 >>> exec("result = x + y")
4 >>> result
5 3
It’s important to note that eval() and exec() can execute any code that
could be written in a Python script. If the strings passed to these functions
are not properly sanitized, a hacker could use them to execute arbitrary code
with the permissions of the user running the script.
map()
The map() function applies a given function to all items of an input iterable.
It returns a map object as a lazy iterable, meaning that the values of the
mapping are only produced when the map object is consumed.
1 >>> numbers = [1, 2, 3, 4, 5]
2 >>> # the squares are not calculated
3 >>> # when the map() is created
4 >>> squared_numbers = map(lambda x: x**2, numbers)
5 >>> # the squares are only calculated once
6 >>> # the squared_numbers iterable is consumed,
7 >>> # in this case during the construction of a list.
8 >>> list(squared_numbers)
9 [1, 4, 9, 16, 25]
filter()
The filter() function takes a function f and an iterable, and returns a filter
object. This filter object is a lazy iterable, only yielding values as it is
consumed. The values it yields are all items i of the iterable where the
application of the function f(i) returns a truthy value.
1 >>> numbers = [1, 2, 3, 4, 5]
2 >>> even_numbers = filter(lambda x: x%2==0, numbers)
3 >>> list(even_numbers)
4 [2, 4]
5 >>> numbers = (-1, 0, 1)
6 >>> # 0 is falsy, so filter will drop it
7 >>> # and keep both -1 and 1
8 >>> list(filter(lambda x: x, numbers))
9 [-1, 1]
The input() and print() functions are are used to read input from the user
and print output to the console, respectively.
The input() function reads a line of text from the standard input (usually
the keyboard) and returns it as a string. A prompt can be passed as an
optional argument that will be displayed to the end user.
1 >>> name = input("What is your name? ")
2 What is your name? TJ
3 >>> name
4 'TJ'
Conversely, the print() function writes a string to the standard output
(usually the console). The print() function takes one or more objects to be
printed, separated by commas, with an optional separator between objects,
which defaults to a space. The function also takes an optional end
parameter, which is appended after the last object, which defaults to a
newline character, an an optional file argument, which is the file to write
to, by default sys.stdout, and a flush argument, which defaults to False,
which specifies whether to flush the buffer or not on printing to the file.
1 print(
2 *objects,
3 sep=' ',
4 end='\n',
5 file=sys.stdout,
6 flush=False
7 )
open()
The open() function takes the name of the file or a file-like object as an
argument and opens it in a specified mode. Different modes can be selected
by passing a second mode argument, or the mode keyword. Modes include r
for read mode (default), w for write mode, a for append mode, x for
exclusive creation mode, b for binary mode, t for text mode (which is the
default), and + for both reading and writing. Modes which don’t conflict can
be used in concert.
1 >>> file = open("./file.txt", mode="wb")
2 >>> file.write(b"Hello!\n")
3 >>> file.close()
buffering - the buffering policy for the file. The default is to use the
system default buffering policy (-1). A value of 1 configures the file
object to buffer per-line in text mode. A value greater than 1
configures a buffer size, in bytes, for the file object to use. Finally, a
value of 0 switches buffering off (though this option is only available
with binary mode).
encoding - the encoding to be used for the file. The default is None,
which sets the file object to use the default encoding for the platform.
Can be any string which python recognizes as a valid codec.
errors - the error handling policy for the file. The default is None,
which means that errors will be handled in the default way for the
platform.
newline - configures the file object to anticipate a specific newline
character. The default is None, which means that universal newlines
mode is disabled.
closefd - whether the file descriptor should be closed when the file is
closed. The default is True.
opener - a custom opener for opening the file. The default is None,
which means that the built-in opener will be used.
range()
1 >>> tuple(range(2))
2 (0, 1)
3 >>> list(range(1, 3))
4 [1, 2]
5 >>> for i in range(start=2, stop=8, step=2):
6 >>> print(i)
7 2
8 4
9 6
sorted()
The sorted() function is used to produce a sorted list of elements. It takes
an argument for the sequence of elements to be sorted, an optional
argument for a function used to extract a key from each element for sorting
and an optional argument, a boolean indicating whether the elements should
be sorted in descending order, which defaults to False. It returns a new list,
instead of mutating the original.
1 >>> numbers = ["3", "2", "1", "4", "5"]
2 >>> sorted_numbers = sorted(
3 ... numbers,
4 ... key=lambda x: int(x),
5 ... reverse=True
6 ... )
7 >>> print(sorted_numbers)
8 ['5', '4', '3', '2', '1']
9 >>> numbers
10 ['3', '2', '1', '4', '5']
reversed()
zip()
The zip() function creates an iterator which consumes iterables. The first
value yielded by the iterator is a tuple containing the first item yielded by
each iterable provided. The second item yielded is a tuple containing the
second item yielded by each iterable. This process of yielding tuples
continues until one of the iterables provided to the iterator is exhausted. An
optional strict keyword can be provided to zip() which will cause the
iterator to raise a ValueError should one iterable be exhausted before the
others.
1 >>> a, b = (1,2,3), (4,5,6)
2 >>> for items in zip(a, b):
3 >>> print(items)
4 (1, 4)
5 (2, 5)
6 (3, 6)
7 >>> list(zip([1, 2], [3, 4, 5]))
8 [(1, 3), (2, 4)]
9 >>> list(zip([1, 2], [3, 4, 5], strict=True))
10 Traceback (most recent call last):
11 File "<stdin>", line 1, in <module>
12 ValueError: zip() argument 2 is longer than argument 1
Chapter 9. The Python Data Model
Underlying infrastructure that defines how Python objects interact with one
another is referred to as the Python Data Model. It is the set of rules and
conventions that govern how Python objects can be created, manipulated,
and used.
__new__ and __init__ are two special methods in Python that are used in
the process of creating and initializing new objects.
Singletons
The basic idea is to override the __new__ method in the singleton class so
that it only creates a new instance if one does not already exist. If an
instance already exists, the __new__ method simply returns that instance,
instead of creating a new one.
1 >>> class Singleton:
2 ... _instance = None
3 ...
4 ... def __new__(cls, *args, **kwargs):
5 ... if cls._instance is None:
6 ... cls._instance = object.__new__(cls)
7 ... return cls._instance
8 ...
9 ... def __init__(self, my_value):
10 ... self.my_value = my_value
11 ...
12 >>> my_object = Singleton(1)
13 >>> my_object.my_value
14 1
15 other_object = Singleton(2)
16 other_object.my_value
17 2
18 my_object.my_value
19 2
In this example, the __new__ method first checks if an instance of the class
already exists on the class attribute cls._instance. If it does not exist, the
method creates a new instance using the __new__ method on object as its
constructor, and it assigns the new instance to the _instance class variable.
If an instance already exists on cls._instance, the __new__ method simply
returns the existing instance. This ensures that the class can only ever have
one instance.
Rich Comparisons
In Python, rich comparison methods are special methods that allow you to
define custom behavior for comparison operators, such as <, >, ==, !=, <=,
and >=.
In this example, the Money class has two attributes: amount, which is the
monetary value, and currency, which is the currency type. The __eq__
method compares the amount and currency of two Money objects, and
returns True if they are the same, and False otherwise. The __lt__ and
__le__ methods compare the amount value between two Money objects with
the same currency, and raises an error if they have different currencies.
You are typically only required to define half of the rich comparison
methods of any given object, as Python can infer the inverse value if a
requested operation is not defined.
Operator Overloading
Operator overloading refers to the ability to define custom behavior for
operators such as +, -, *, /, etc. when they are used with objects of a certain
class. This is achieved by specific special methods, shown below.
By using operator overloading, we can create classes that have natural and
intuitive behavior when used as operands.
1 >>> class Money:
2 ... def __init__(self, amount: int, currency: str):
3 ... self.amount = amount
4 ... self.currency = currency
5 ...
6 ... def __add__(self, other):
7 ... if self.currency != other.currency:
8 ... raise ValueError(
9 ... "Can't add money of differing currencies"
10 ... )
11 ... return Money(
12 ... self.amount + other.amount,
13 ... self.currency
14 ... )
15 ...
16 ... def __sub__(self, other):
17 ... if self.currency != other.currency:
18 ... raise ValueError(
19 ... "Can't subtract money of differing
currencies"
20 ... )
21 ... return Money(
22 ... self.amount - other.amount,
23 ... self.currency
24 ... )
25 ...
26 ... def __mul__(self, other):
27 ... if not isinstance(other, int):
28 ... raise ValueError(
29 ... "Can't multiply by non-int value"
30 ... )
31 ... return Money(self.amount * other, self.currency)
32 ...
33 ... def __truediv__(self, other):
34 ... if not isinstance(other, int):
35 ... raise ValueError(
36 ... "Can't divide by non-int value"
37 ... )
38 ... # divmod() is a builtin which does
39 ... # // and % at the same time
40 ... quotient, remainder = divmod(self.amount, other)
41 ... return (
42 ... Money(quotient, self.currency),
43 ... Money(remainder, self.currency)
44 ... )
45 ...
In this example, the Money class has two attributes: amount, which is the
monetary value, and currency, which is the currency type.
The __add__ method overloads the + operator, allowing you to add two
Money objects. It also check to make sure that the operands are of the same
currency, otherwise it raises a ValueError.
String Representations
Emulating Containers
Container types such as lists, tuples, and dictionaries have built-in behavior
for certain operations, such as the ability to iterate over elements, check if
an element is in their collection, and retrieve the length of the container. We
can emulate this behavior using special methods.
The [] operator can be overloaded using the following methods:
By using these methods, we can create classes that have similar behavior to
built-in container types, making our code more effecient and Pythonic.
1 >>> class MutableString:
2 ... def __init__(self, text: str):
3 ... self._text = list(text)
4 ...
5 ... def __getitem__(self, idx):
6 ... return self._text[idx]
7 ...
8 ... def __setitem__(self, idx, value):
9 ... self._text[idx] = value
10 ...
11 ... def __delitem__(self, idx):
12 ... del self._text[idx]
13 ...
14 ... def __str__(self):
15 ... return "".join(self._text)
16 ...
17 ... def __len__(self):
18 ... return len(self._text)
19 ...
20 >>> my_str = MutableString("fizzbuzz")
21 >>> my_str[0] = "F"
22 >>> str(my_str)
23 Fizzbuzz
24 >>> len(my_str)
25 8
Emulating Functions
__slots__ is a special class attribute that you can define to reserve a fixed
amount of memory for each instance of the class. It is used to optimize
memory usage for classes that have a large number of instances, and that do
not need to add new attributes dynamically.
When you define __slots__ in a class, it creates a fixed-size array for each
instance to store the attributes defined in __slots__. This array is much
smaller than the dictionary that is used by default to store instance
attributes. The result is that __slots__ objects save a considerable amount
of memory, particularly if you have many instances of the class. Slots also
have the benefit of faster attribute access.
1 >>> class SlotsClass:
2 ... __slots__ = ['x', 'y']
3 ... def __init__(self, x, y):
4 ... self.x = x
5 ... self.y = y
6 ...
7 >>> obj = SlotsClass(1, 2)
8 >>> print(obj.x, obj.y)
9 1 2
There are two special methods for accessing the attributes of a given object.
The __getattribute__ method is the first attribute accessor to be called
when an attribute is accessed using the dot notation (e.g., obj.attribute)
or when using the getattr() built-in function. It takes the attribute name as
its parameter and should return the value of the attribute. The __getattr__
method is called only when an attribute is not found by __getattribute__,
i.e. when __getattribute__ raises an AttributeError. It takes the
attribute name as its parameter and should return the value of the attribute.
The benefit to this dual implementation is that you can get other known
attributes on the instance inside __getattr__ using dot notation, without
running the risk of recursion errors, while still hooking into the accessor
protocol before an object attribute is returned.
1 >>> class ADTRecursion:
2 ... def __init__(self, **kwargs):
3 ... self._data = kwargs
4 ...
5 ... def __getattribute__(self, key):
6 ... return self._data.get(key)
7 ...
8 >>> class AbstractDataType:
9 ... def __init__(self, **kwargs):
10 ... self._data = kwargs
11 ...
12 ... def __getattr__(self, key):
13 ... return self._data.get(key)
14 ...
15 >>> this = ADTRecursion(my_value=1)
16 >>> this.my_value
17 File "<stdin>", line 12, in __getattribute__
18 File "<stdin>", line 12, in __getattribute__
19 File "<stdin>", line 12, in __getattribute__
20 [Previous line repeated 996 more times]
21 RecursionError: maximum recursion depth exceeded
22 >>> that = AbstractDataType(my_value=1)
23 >>> that.my_value
24 1
The __setattr__ method is called when an attribute is set using the dot
notation (e.g., obj.attribute = value) or when using the setattr()
built-in function. It takes the attribute name and value as its parameters and
should set the attribute to the specified value.
Iterators
The iterator protocol is a set of methods that allow Python objects to define
their own iteration behavior. The protocol consists of two methods:
__iter__ and __next__.
The __next__ method is used to retrieve the next item from the iterator. It is
called automatically by the for loop or when the next() built-in function is
used on the iterator. It should return the next item or raise a StopIteration
exception when there are no more items.
1 >>> class CountByTwos:
2 ... def __init__(self, start, stop):
3 ... self.internal_value = start
4 ... self.stop = stop
5 ...
6 ... def __iter__(self):
7 ... return self
8 ...
9 ... def __next__(self):
10 ... snapshot = self.internal_value
11 ... if snapshot >= self.stop:
12 ... raise StopIteration
13 ... self.internal_value += 2
14 ... return snapshot
15 ...
16 >>> _iter = CountByTwos(start=5, stop=13)
17 >>> next(_iter)
18 5
19 >>> next(_iter)
20 7
21 >>> for i in _iter: # takes the partially consumed iterator and
e\
22 xhausts it.
23 ... print(i)
24 9
25 11
In this example, the CountByTwos class has an __iter__ method that returns
self, and a __next__ method that lazily generates the next number in the
sequence. Calling the iter() function on our _iter value simply returns
self, as the instance already conforms to the iterator protocol. When
next() is called, either manually or in the context of a for loop, the iterator
yields the next number in the sequence. The class stops iterating when the
internal value reaches the stop value, as at this point the StopIteration
exception is raised.
Lazy Evaluation
Context Managers
Context Managers provide a clean and convenient method for managing
resources that need to be acquired and released. The with statement ensures
that resources are acquired before the with block is executed, and released
after the block of code is exited, even if an exception is raised.
To hook into the context manager protocol, an object should define the
special methods __enter__ and __exit__. The __enter__ method is called
when the context is entered, and can be used to acquire the resources
needed by the block of code. It can return an object that will be used as the
context variable in the as clause of the with statement. The __exit__
method is called when the context is exited, and it can be used to release the
resources acquired by the __enter__ method. It takes three arguments: an
exception type, an exception value, and a traceback object. The __exit__
method can use these arguments to perform cleanup actions, suppress
exceptions, or log errors. If you don’t plan on using the exception values in
the object, they can be ignored.
1 >>> class ContextManager(object):
2 ... def __enter__(self):
3 ... print('entering!')
4 ... return self
5 ...
6 ... def __exit__(self, *args, **kwargs):
7 ... print('exiting!')
8 ...
9 ... def print(self):
10 ... print('in context')
11 ...
12 >>> with ContextManager() as cm:
13 ... cm.print()
14 ...
15 entering!
16 in context
17 exiting!
18
19 >>> with ContextManager():
20 ... raise ValueError
21 entering!
22 exiting!
23 Traceback (most recent call last):
24 File "<stdin>", line 7, in <module>
25 raise ValueError
26 ValueError
In this example, when the with statement is executed, an instance of
ContextManager() is created, and the __enter__ method is called, printing
“entering!”. The __enter__ object returns the instance self which is
assigned to the variable cm. The block of code inside the with statement is
then executed, where the print() method of the class is called and it prints
“in context”. Finally, the __exit__ method is called, printing “exiting!”.
Descriptors
Descriptors are objects that define one or more of the special methods
__get__, __set__, and __delete__. These methods are used to customize
the behavior of attribute access, such as getting, setting and deleting
attributes respectively.
Descriptors can be used to define attributes that have custom behavior, such
as computed properties, read-only properties, or properties that enforce
constraints. For example, we can write a descriptor which requires
attributes to be non-falsy.
1 >>> class MyDescriptor:
2 ... def __get__(self, instance, owner):
3 ... return getattr(instance, "_my_attr", None)
4 ...
5 ... def __set__(self, instance, value):
6 ... if not value:
7 ... raise AttributeError(
8 ... "attribute must not be falsy"
9 ... )
10 ... setattr(instance, "_my_attr", value)
11 ...
12 ... def __delete__(self, instance):
13 ... delattr(instance, "_my_attr")
14 ...
15 >>> class MyClass:
16 ... desc = MyDescriptor()
17
18 >>> my_object = MyClass()
19 >>> my_object.desc
20 >>> my_object.desc = 1
21 >>> my_object.desc
22 1
23 >>> my_object.desc = 0
24 Traceback (most recent call last):
25 File "<stdin>", line 14, in <module>
26 AttributeError: attribute must not be falsy
27 del my_object.desc
Inheritance
Here, the SubClass inherits from the SuperClass, and automatically has
access to all the properties and methods defined in the SuperClass.
A derived class can override or extend the methods of the super class by
redefining them in the derived class.
1 >>> class SuperClass:
2 ... def __init__(self, name):
3 ... self.name = name
4 ...
5 ... def print_name(self):
6 ... print(self.name)
7 ...
8 ... class SubClass(SuperClass):
9 ... def print_name(self):
10 ... print(self.name.upper())
11 ...
In Python, a class can inherit from multiple classes by listing them in the
class definition, separated by commas. For example:
1 >>> class SubClass(SuperClass1, SuperClass2):
2 ... pass
When a class inherits from multiple classes, it can potentially have multiple
versions of the same method or property. This is known as the diamond
problem, and can lead to ambiguity about which version of the method or
property to use.
Here, class D inherits from both B and C which both inherit from A. If we
create an instance of D and call the method method, C is printed, because C
is a direct superclass, where as A is a superclass once removed and B does
not implement the method.
1 >>> D().method()
2 C
3 >>> D.__mro__
4 (__main__.D, __main__.B, __main__.C, __main__.A, object)
Encapsulation
It should be noted that, unlike many other languages, Python doesn’t truly
keep methods and attributes private. The underscore convention is merely a
convention. Any user of the class can access these methods with impunity.
Polymorphism
The syntax for creating a custom metaclass is to define a new class that
inherits from the type class. From there, we can overload the default
methods of the type class to create custom hooks that run when a class is
defined. This is typically done using the __new__ method to define the
behavior of class creation. The __init__ method is also commonly
overridden to define the behavior of the class initialization.
For example, the following code defines a custom metaclass that adds a
greeting attribute to the class definition, and prints to the console when this
is done:
1 >>> class MyMetaclass(type):
2 ... def __new__(cls, name, bases, attrs):
3 ... attrs["greeting"] = "Hello, World!"
4 ... print(f"creating the class: {name}")
5 ... return type.__new__(cls, name, bases, attrs)
6 ...
7 >>> class MyClass(metaclass=MyMetaclass):
8 ... pass
9 ...
10 creating the class: MyClass
11 >>> MyClass.greeting
12 Hello, World!
As we can see, the print() function is executed at the moment which
MyClass is defined. We can also see that the class has a class attribute
greeting which is the string “Hello World!”
Metaclasses allow you to hook into user code from library code by
providing a way to customize the behavior of class creation. By defining a
custom metaclass and setting it as the metaclass of a user-defined class, you
can change the way that class is created and initialized, as well as add new
attributes and methods to the class.
In order to inspect these attributes and methods, we can use the built-in
function dir() to see a list of all the names of attributes and methods that
the object has. This can be a useful tool for exploring and understanding the
functionality of a particular object or module in code. The help() function
can also be used in order to show the official documentation for any method
not covered here.
1 >>> help(str.count)
2 Help on method_descriptor:
3
4 count(...)
5 S.count(sub[, start[, end]]) -> int
6
7 Return the number of non-overlapping occurrences
8 of substring sub in string S[start:end]. Optional
9 arguments start and end are interpreted as in slice
10 notation.
Numbers
The three major numeric types in Python are int, float, and complex.
While each data type is uniquely distinct, they share some attributes so to
make interoperability easier. For example, the .conjugate() method
returns the complex conjugate of a complex number - for integers and
floats, this is just the number itself. Furthermore, the .real and .imag
attributes define the real and imaginary part of a given number. For floats
and ints, the .real part is the number itself, and the .imag part is always
zero.
Integers
The int type represents integers, or whole numbers. Integers can be
positive, negative, or zero and have no decimal point. Python supports
arbitrarily large integers, so there is no limit on the size of an integer value,
unlike some other programming languages. Python only guarantees
singletons for the integers -5 through 256, so when doing integer
comparisons outside this range, be sure to use == instead of is.
The Python int has several methods for convenience when it comes to bit
and byte representations. The int.bit_length() method returns the
number of bits necessary to represent an integer in binary, excluding the
sign and leading zeros. int.bit_count() returns the number of bits set to 1
in the binary representation of an integer. int.from_bytes() and
int.to_bytes() convert integer values to and from a bytestring
representation.
1 >>> int(5).bit_count()
2 2
3 >>> int(5).bit_length()
4 3
5 >>> int(5).to_bytes()
6 b'\x05'
7 >>> int.from_bytes(b'\x05')
8 5
Floats
The python floating point value, represented by the float data type, is a
primitive data type that is used to represent decimal numbers. Floats are
numbers with decimal points, such as 3.14 or 2.718. There are also a few
special float values, such as float('inf'), float('-inf'), and
float('nan'), which are representations of infinity, negative infinity, and
“Not a Number”, respectively.
Float Methods
The float() type defines a method .is_integer() which returns True if
the float is directly convertible to an integer value, and False otherwise.
1 >>> float(5).is_integer()
2 True
1 >>> float(1.5).as_integer_ratio()
2 (3, 2)
Hex Values
Complex Numbers
Strings
As stated previously, strings are a built-in data type used to represent
sequences of characters. They are enclosed in either single quotes ' or
double quotes ", and can contain letters, numbers, and symbols. Strings are
immutable, which means that once they are created, their values cannot be
changed.
When operating on a method of the str class, the result of the method call
returns a new string. This is important to remember in cases where you
have multiple references to the initial string, and when you’re doing
comparisons between strings; be sure to use == instead of is.
Paddings
Formatting
Translating
Partitioning
Boolean Checks
A number of methods on the string class are checks to return True or False
depending on if a certain condition is true. Those methods are as follows:
Case Methods
A number of methods on the string class are methods for formatting strings
based on the cases of individual characters. Those methods are as follows:
Encodings
The str.encode() method takes one argument, which is the name of the
encoding to be used, by default utf-8. Some of the other encodings are utf-
16, ascii, and latin1. It returns a bytes object that contains the encoded
string.
1 >>> "Hello, World!".encode("utf8")
2 b'Hello, World!'
Bytes
So far in this text we’ve been referring to bytes as related to strings. And
this is with some justification; many of the methods for strings are available
for bytestrings, and the syntax is largely the same, save for the b'' prefix.
However, str and bytes are two distinct data types. A str is a sequence of
Unicode characters; bytes however are fundamentally a sequence of
integers, each of which is between 0 and 255. They represent binary data,
and it’s useful when working with data that needs to be stored or
transmitted in a specific format. If the value of an individual byte is below
127, it may be represented using an ASCII character.
1 >>> "Maß".encode('utf8')
2 b'Ma\xc3\x9f'
There are a few distinct methods which are unique to the bytes type, and
we’ll discuss those here.
1 >>> [m for m in dir(bytes)
2 ... if not any((m.startswith("__"), m in dir(str)))]
3 ['decode', 'fromhex', 'hex']
Decoding
Hex Values
Tuples
The Python tuple data type has limited functionality, given its nature as an
immutable data type. The two methods it does define however are
tuple.count() and tuple.index().
The .index() method is used to find the index of the first occurrence of a
specific element in a tuple. The method takes one required argument, which
is the element to find, and returns the index of the first occurrence of that
element in the tuple. If the element is not found in the tuple, the method
raises a ValueError.
1 >>> t = (1, 2, 3, 2, 1)
2 >>> t.index(2)
3 1
Lists
The Python list type defines a suite a methods which are used to inspect
and mutate the data structure.
The .index() method is used to find the index of the first occurrence of a
specific element in a list. The method takes one required argument, which is
the element to find, and returns the index of the first occurrence of that
element in the list. If the element is not found in the list, the method raises a
ValueError.
1 >>> l = [1, 2, 3, 2, 1]
2 >>> l.index(2)
3 1
Copying
The copy method is used to create a shallow copy of a list. It does not take
any arguments and it returns a new list that is a copy of the original list.
1 >>> l1 = [object()]
2 >>> l2 = l1.copy()
3 >>> l2
4 [<object at 0x7f910fec4630>]
5 >>> l1[0] is l2[0]
6 True
Mutations
The .append() method is used to add an element to the end of a list. The
method takes one required argument, which is the element to add, and adds
it to the end of the list.
1 >>> l = [0]
2 >>> l.append(1)
3 >>> l
4 [0, 1]
The .pop() method is used to remove an element from a list and return it.
The method takes one optional argument, which is the index of the element
to remove. If no index is provided, the method removes and returns the last
element of the list.
1 >>> l = [1, 2, 3, 4]
2 >>> l.pop(2)
3 3
4 >>> l
5 [1, 2, 4]
Finally, the .clear() method is used to remove all elements from a list. It
does not take any arguments and removes all elements from the list.
1 >>> l = [1, 2, 3, 4]
2 >>> l.clear()
3 >>> l
4 []
Orderings
Finally, the .sort() method is used to sort the elements in a list. The
method is an in-place operation and returns None. The method does not take
any arguments but accepts a few optional arguments, such as key and
reverse. The key argument specifies a function that is used to extract a
comparison value from each element in the list. The reverse argument is a
Boolean value that specifies whether the list should be sorted in a ascending
or descending order. The sort() method sorts the elements in ascending
order by default.
1 >>> l = [3, 2, 4, 1]
2 >>> l.sort()
3 >>> l
4 [1, 2, 3, 4]
Dictionaries
The Python dict type defines a suite a methods which are used to inspect
and mutate the data structure.
Iter Methods
Getter/Setter Methods
Mutations
Sets
Sets are used to store unique elements. They are useful for operations such
as membership testing, removing duplicates from a sequence and
mathematical operations. They are mutable and have a variety of built-in
methods to perform various set operations.
Mutations
The .add() method is used to add an element to a set. The method takes
one required argument, which is the element to add, and adds it to the set.
1 >>> s = {1, 2, 3}
2 >>> s.add(4)
3 >>> s
4 {1, 2, 3, 4}
The .remove() method is used to remove a specific element from a set. The
method takes one required argument, which is the element to remove. If the
element is not present in the set, it raises a KeyError.
1 >>> s = {1, 2, 3}
2 >>> s.remove(2)
3 >>> s
4 {1, 3}
The .pop() method is used to remove and return an arbitrary element from
a set. The method does not take any arguments and removes and returns an
arbitrary element from the set. If the set is empty, it raises a KeyError.
1 >>> s = {1, 2, 3}
2 >>> s.pop()
3 1
4 >>> s
5 {2, 3}
The .clear() method is used to remove all elements from a set. It does not
take any arguments and removes all elements from the set.
1 >>> s = {1, 2, 3}
2 >>> s.clear()
3 >>> s
4 set()
Set Theory
The .union() method returns a new set that contains all elements from both
the set and an iterable.
1 >>> s1 = {1, 2, 3}
2 >>> s2 = {2, 3, 4}
3 >>> s1.union(s2)
4 {1, 2, 3, 4}
The .intersection() and .intersection_update() methods compute a
new set that contains only the elements that are common to both the set and
the iterable. Calling .intersection() returns a new set, and
.intersection_update() does the operation in-place.
1 >>> s1 = {1, 2, 3}
2 >>> s2 = {2, 3, 4}
3 >>> s1.intersection(s2)
4 {2, 3}
5 >>> s1.intersection_update(s2)
6 >>> s1
7 {2, 3}
Boolean Checks
Several methods available for set objects allow you to check the
relationship between a pair of sets.
The .issubset() method returns True if all elements of a set are present in
another set.
1 >>> s1 = {1, 2, 3}
2 >>> s2 = {1, 2, 3, 4, 5, 6}
3 >>> s1.issubset(s2)
4 True
Copying
Finally, the .copy() method allows you to create a shallow copy of a given
set. It does not take any arguments and returns a new set that is a copy of
the original set. The references in each set are the same.
1 >>> s1 = {"Hello World",}
2 >>> s2 = s1.copy()
3 >>> s1.pop() is s2.pop()
4 True
Chapter 13. Type Hints
Type hints, also known as type annotations, are a recent addition to Python
which allow developers to add static type information for function
arguments, return values, and variables. The philosophy behind type hints is
to provide a way to improve code readability and catch certain types of
errors earlier in the development cycle.
Type hints make it easier for other developers to understand the expected
inputs and outputs of functions, without having to manually inspect the
code or drop into a debugger during test. They also allow static type
checkers, such as mypy or Language Server Protocols (LSP), to analyze
code and detect type-related issues before the code is executed.
It’s important to note that type hints are optional, and Python’s dynamic
typing is still maintained. Type hints do not affect runtime behavior and are
only used for static analysis and documentation purposes. It should also be
explicitly noted that, absent a mechanism for type checking, type hints are
not guaranteed to be accurate by the interpreter. They are hints in the very
sense of the word.
One of the best places to start adding type hints is in existing functions.
Functions typically have explicit inputs and outputs, so adding type
definitions to I/O values is relatively straightforward. Once you have
identified a function that can be type annotated, determine the types of
inputs and outputs for that function. This should be done by inspecting the
code and looking at the context in which each variable of the function is
used. For example, if a function takes two integers as inputs and returns a
single integer as output, you would know that the inputs should have type
hints int and the return type hint should be int as well.
Type hints for functions go in the function signature. The syntax for these
hints is name: type[=default] for function arguments, and -> type: for
the return value.
1 def multiply(a: int, b: int) -> int:
2 return a * b
In this example, the multiply function is given type annotations for the a
and b variables. In this case, those types are specified as integers. Since the
resulting type of multiplying two integers is always an integer, the return
type of this function is also int.
Its worth reiterating that this multiply function will not fail at runtime if it
is passed values which aren’t integers. However, static checkers like mypy
will fail if the function is called elsewhere in the code base with non-int
values, and text editors with LSP support will indicate that you are using
the function in error.
1 # ./script.py
2
3 def multiply(a: int, b: int) -> int:
4 return a * b
5
6 print(multiply(1, 2.0))
Union types
You can specify multiple possible types for a single argument or return
value by using a union type. This is done either by using the Union type
from the typing module, enclosing the possible types in square brackets, or
by using the bitwise or operator | as of python 3.10.
1 # ./script.py
2
3 def multiply(a: int|float, b: int|float) -> int|float:
4 return a * b
5
6 print(multiply(1, 2.0))
Optional
The Optional type is a type hint which is used to indicate that an argument
to a function or method is optional, meaning that it does not have to be
provided in order for the function or method to be called. It is typically
paired with a default value which is used when the function caller does not
pass an explicit argument.
1 # ./script.py
2
3 from typing import Optional
4
5 def greet(name: str, title: Optional[str] = None) -> str:
6 if title:
7 return f"Hello, {title} {name}"
8 return f"Hello, {name}"
9
10 print(greet("Justin", "Dr."))
11 print(greet("Cory"))
type|None
An optional argument is by definition a union between some type t and
None. In addition to using Optional as an explicit type definition, it is
possible to achieve the same type definition using the | operator as t|None.
Both methods of type annotation serve the same purpose, but the choice of
which to use often comes down to personal preference and the specific use
case.
1 # ./script.py
2
3 def greet(name: str, title: str|None = None) -> str:
4 if title:
5 return f"Hello, {title} {name}"
6 return f"Hello, {name}"
7
8 print(greet("Justin", "Dr."))
9 print(greet("Cory"))
Literal
The Literal type allows you to restrict the values that an argument or
variable can take to a specific set of literal values. This can be used for
ensuring that a function or method is only called with a specific correct
argument.
1 # ./script.py
2
3 from typing import Literal
4
5 def color_picker(
6 color: Literal["red", "green", "blue"]
7 ) -> tuple[int, int, int]:
8 match color:
9 case "red":
10 return (255, 0, 0)
11 case "green":
12 return (0, 255, 0)
13 case "blue":
14 return (0, 0, 255)
15
16 print(color_picker("pink"))
Final
TypeAlias
In instances where inlining type hints becomes too verbose, you can use a
TypeAlias type to create an alias for a definition based on existing types.
1 # ./script.py
2
3 from typing import Literal, TypeAlias
4
5 ColorName = Literal["red", "green", "blue"]
6 Color: TypeAlias = tuple[int, int, int]
7
8 def color_picker(color: ColorName) -> Color:
9 match color:
10 case "red":
11 return (255, 0, 0)
12 case "green":
13 return (0, 255, 0)
14 case "blue":
15 return (0, 0, 255)
16
17 print(color_picker("green"))
NewType
In contrast to a TypeAlias, the NewType type allows you to define new types
based on existing types. This type is treated as it’s own distinct type,
separate from whatever original type it was defined with. Creating a
variable defined as the new type is done by calling the type with values
matching the type’s type definition.
1 # ./script.py
2
3 from typing import Literal, NewType
4
5 ColorName = Literal["red", "green", "blue"]
6 Color = NewType("Color", tuple[int, int, int])
7
8 def color_picker(color: ColorName) -> Color:
9 match color:
10 case "red":
11 return Color((255, 0, 0))
12 case "green":
13 return Color((0, 255, 0))
14 case "blue":
15 return Color((0, 0, 255))
16
17 print(color_picker("green"))
TypeVar
Generic Types can be created as a placeholders for specific types that are to
be determined at a later time. They still provide type safety by enforcing
type consistency, but they do not assert what type is to be expected. Generic
types can be constructed using the TypeVar constructor.
1 # ./script.py
2
3 from typing import TypeVar
4
5 T = TypeVar('T')
6
7 def identity(item: T) -> T:
8 return item
9
10 identity(1)
1 # ./script.py
2
3 from typing import TypeVar
4
5 class Missing(int):
6 _instance = None
7 def __new__(cls, *args, **kwargs):
8 if not cls._instance:
9 cls._instance = super().__new__(cls, -1)
10 return cls._instance
11
12 # True, False, and Missing all satisfy this type
13 # because they are subclasses of int. However,
14 # int types also satisfy.
15 BoundTrinary = TypeVar("Trinary", bound=int)
16
17 def is_falsy(a: BoundTrinary) -> bool:
18 return a is not True
19
20 print(
21 is_falsy(True),
22 is_falsy(False),
23 is_falsy(Missing()),
24 is_falsy(-1)
25 )
26
27 # True, False, and Missing all satisfy this type
28 # because the type is constrained. In this case
29 # however int does not satisfy, and throws a type
30 # error.
31 ConstrainedTrinary = TypeVar("Trinary", bool, Missing)
32
33 def is_truthy(a: ConstrainedTrinary) -> bool:
34 return a is True
35
36 print(
37 is_truthy(True),
38 is_truthy(False),
39 is_truthy(Missing()),
40 is_truthy(-1)
41 )
Protocols
Generics
Generics are used to create type-safe data abstractions while being type
agnostic. To do this, we can create classes which subclasses the Generic
type from the typing module. When the class is later instantiated, a type
definition can then be defined at the point of object construction using
square brackets.
1 # ./script.py
2
3 from typing import Generic, TypeVar
4
5 T = TypeVar("T")
6
7 class Queue(Generic[T]):
8 def __init__(self) -> None:
9 self._data: list[T] = []
10
11 def push(self, item: T) -> None:
12 self._data.append(item)
13
14 def pop(self) -> T:
15 return self._data.pop(0)
16
17 int_queue = Queue[int]()
18 str_queue = Queue[str]()
19
20 # fails type check, pushing a string to an `int` queue
21 int_queue.push('a')
TypedDict
The TypedDict class is used to create typesafe dictionaries. With
TypedDict, you can ensure that dictionaries have the right keys and values,
and those values are type safe at instantiation.
1 # ./script.py
2
3 from typing import TypedDict
4
5 class Person(TypedDict):
6 name: str
7 age: int
8
9 person = Person(name="Chloe", age=27)
Classes which subclass TypedDict can use class variables to specify keys in
a dictionary key-value store. The included type annotation asserts during
static type check that the value associated with a given key is of a specified
type.
Finally, TypedDict objects can subclass the Generic type. This allows
dictionary values to be collections of types which are specified at
instantiation.
1 # ./script.py
2
3 from typing import TypedDict, Generic, TypeVar
4
5 T = TypeVar("T")
6
7 class Collection(TypedDict, Generic[T]):
8 name: str
9 items: list[T]
10
11 coll = Collection[int](name="uid", items=[1,2,3])
Chapter 14. Modules, Packages, and
Namespaces
Up until now, we have been utilizing the functionality that is built directly
into the Python runtime. We have been using literals, containers, functions,
and classes that are available by default. However, this is only a small
portion of the tools and functionality that are available to Python
developers. To access this additional functionality, we must delve into the
intricacies of Python’s module system and its packaging system.
Modules
Consider the following directory structure with two files. The first file is a
module file, and the second file is a script.
1 root@b9ba278f248d:/code# find . -type f
2 ./module.py
3 ./script.py
In the module file, we define the functions add() and subtract(). These
functions now exist in what is called the “namespace” of the module, i.e.
the name module is recognized to contain the functions add and subtract.
Everything defined in the top level scope of the module file is defined under
the module namespace.
1 # ./module.py
2
3 def add(a, b):
4 return a + b
5
6 def subtract(a, b):
7 return a - b
In the script file, we use the import statement to import our module. The
import statement will import the entire module and make all of its
definitions and statements available in the current namespace of the script,
under the name module. We can access all of the defined functionality of the
module using the dot syntax, similar to accessing the attributes of a class.
1 # ./script.py
2
3 import module
4
5 print(module.add(1, 2))
6 print(module.subtract(1, 2))
With this, we can run python ./script.py and the output to the console is
3, and then -1.
1 root@b9ba278f248d:/# cd code/
2 root@b9ba278f248d:/code# ls
3 module.py script.py
4 root@b9ba278f248d:/code# python script.py
5 3
6 -1
7 root@b9ba278f248d:/code#
If we want to instead import our module under a given alias, we can use the
as keyword in our import statement to rename the module in the scope of
the current namespace.
1 # ./script.py
2
3 import module as m
4
5 print(m.add(1, 2))
6 print(m.subtract(1, 2))
In instances where our modules are collected into folders within the root
directory, we can use dot syntax to specify the path to a module.
1 root@b9ba278f248d:/code# find . -type f
2 ./modules/module.py
3 ./script.py
1 # ./script.py
2 import modules.module as m
3
4 print(m.add(1, 2))
Module Attributes
There are several attributes that are automatically created for every module
in Python. Some of the most commonly used module attributes are:
The process of importing a module causes the python interpreter to run the
module. In order to distinguish between a module being run as the main
process as compared to it being run as an import, the if __name__ ==
"__main__" idiom is used. __name__ will only be equal to __main__ when
the file is being executed as the main program. __name__ will set to the
module name in all other instances.
1 # ./script.py
2
3 import module
4
5 def main():
6 print(module.add(1, 2))
7
8 if __name__ == "__main__":
9 main()
Packages
To create a package, you need to create a directory with the package name,
and within that directory, you can have one or more modules. The directory
should contain an __init__.py file, which is a special file that tells Python
that this directory should be treated as a package.
Outside the directory you should have a setup file. Typically these come in
the form of either a setup.py file or a pyproject.toml file. The setup.py
file is a setup script that the setuptools package uses to configure the build
of a python package. A typical setup.py file will define a function called
setup() that is used to specify build parameters and metadata. A
pyproject.toml file accomplishes much of the same functionality, but uses
a declarative configuration scheme to define the project’s metadata,
dependencies, and build settings. For now, we’ll use just use the
pyproject.toml toolchain, and introduce setup.py later when necessary.
1 # ./pyproject.toml
2
3 [build-system]
4 requires = ["setuptools", "wheel"]
1 # ./my_package/__main__.py
2
3 if __name__ == "__main__":
4 print("hello from main!")
Inside the my_package directory is two folders. The first is a folder called
logging, and inside that folder there is a log.py file. There is only one
function in this file, named fn. This function simply calls the print()
function.
1 # ./my_package/logging/log.py
2
3 def fn(*args, **kwargs):
4 print(*args, **kwargs)
Inside the second folder math is two files. One file, addition.py, contains a
function which adds two numbers. This file also calls the fn from
logging.log to print out the two numbers whenever it is called. The second
file, subtraction.py does the same, except it returns the result of
subtracting the two numbers, instead of adding them.
1 # ./my_package/math/addition.py
2
3 from ..logging.log import fn
4
5 def add(a, b):
6 fn(a, b)
7 return a + b
1 # ./my_package/math/subtraction.py
2
3 from my_package.logging.log import fn
4
5 def subtract(a, b):
6 fn(a, b)
7 return a - b
Packages allow for both relative and absolute imports across the package.
Relative import allow you to specify the package and module names
relative to the current namespace using dot notation. For example, in
addition.py the log file’s fn is imported relatively. The logging module is
once removed up the folder tree, so ..logging specifies that the logging
module is once removed from the current directory using a second .. As
another example, from the addition.py file you could import the subtract
function by specifying a relative import from .subtraction import
subtract. Absolute imports on the other hand specify the full package and
module name, as seen in subtraction.py. This is more explicit and
straightforward, but in deeply nested package directories it can get verbose.
Installation
In order to make use of this package, we need to install it. Python’s default
package installer is pip, which stands for “pip installs packages”. Pip also
allows you to easily search for, download, and install packages from PyPI,
the Python Package Index, which is an online repository of Python
packages. Packages are installed from the package index by running python
-m pip install <package_name> in your terminal. For local packages,
you can install from the package directory using python -m pip install
./. The -e flag can be optionally passed so that the package is installed as -
-editable. This links the installed python package to the local developer
instance, so that any changes made to the local codebase are automatically
reflected in the imported module.
1 root@e08d854dfbfe:/code# ls
2 my_package pyproject.toml
3
4 root@e08d854dfbfe:/code# python -m pip install -e .
5 Obtaining file:///code
6 Installing build dependencies ... done
7 Checking if build backend supports build_editable ... done
8 Getting requirements to build editable ... done
9 Preparing editable metadata (pyproject.toml) ... done
10 Building wheels for collected packages: my-package
11 Building editable for my-package (pyproject.toml) ... done
12 Created wheel for my-package: filename=my_package-0.0.0-
0.edita\
13 ble-py3-none-any.whl size=2321
sha256=2137df7f84a48acdcb77e9d1ab3\
14 33bf838693910338918fe16870419cb351979
15 Stored in directory: /tmp/pip-ephem-wheel-cache-
qqft00bw/wheels\
16 /71/fd/a9/eb23a522d4ed2deb67e9d98937897b0b77b5bf9c1ac50a2378
17 Successfully built my-package
18 Installing collected packages: my-package
19 Successfully installed my-package-0.0.0
20 WARNING: Running pip as the 'root' user can result in broken
perm\
21 issions and conflicting behaviour with the system package
manager\
22 . It is recommended to use a virtual environment instead:
https:/\
23 /pip.pypa.io/warnings/venv
24
25 root@e08d854dfbfe:/code# python
26 Python 3.11.1 (main, Jan 17 2023, 23:30:27) [GCC 10.2.1
20210110]\
27 on linux
28 Type "help", "copyright", "credits" or "license" for more
informa\
29 tion.
30 >>> from my_package.math import addition
31 >>> addition.add(1, 2)
32 1 2
33 3
34 >>>
35 >>> from my_package.math import subtraction
36 >>> subtraction.subtract(3, 4)
37 3 4
38 -1
39 >>>
40 root@71c53592c77f:~# python -m my_package
41 hello from main!
Part III. The Python Standard Library
The Python Standard Library is a collection of modules and functions that
are included with every Python installation. It provides a range of
functionality that can be used to accomplish a wide variety of tasks, such as
string and text processing, file and directory manipulation, networking,
mathematics and statistics, multithreading and multiprocessing, date and
time operations, compression and archiving, and logging and debugging.
The inclusion of such an expansive standard library by default means that
developers can focus on writing their application code, rather than worrying
about recreating standard implementations from scratch. It is a big reason as
to why Python is often referred to as a “batteries included” language.
Chapter 15. Virtual Environments
As we start to create packages and start integrating other packages into
projects, we’re going to quickly run into the issue where we want to keep
project dependencies separate. This is where Python virtual environments
come in handy. Virtual environments allow us to create isolated
environments with their own independent set of dependencies, which can be
managed separately from other projects and the global Python environment.
Using virtual environments ensures that each project has access to the
specific versions of packages that it needs, without worrying about conflicts
or breaking other projects. It also allows for easier sharing and portability of
projects, as the environment and its dependencies can be packaged and
distributed together with the project.
Inside the newly created venv directory, we’ll see a few folders. First is the
./venv/bin folder (on Windows this will be the ./venv/Scripts folder)
which is where executables will be placed. The two files within this folder
that are worth knowing about are the python file, which is a symlink to the
binary executable of the global python version, and the activate script,
which for POSIX systems (linux, mac) is generally ./venv/bin/activate
(or an analogous activate.fish or activate.csh if using either the fish
shell or csh/tcsh shells, respectively), and for Windows is
./venv/Scripts/Activate.ps1 if using PowerShell. The activate script
acts differently depending on which platform you are using, but in general
its responsibility is to add the executables folder to your shell’s PATH. Doing
this makes such that when it goes to execute python, it first finds the virtual
environment executable instead of finding the global install. It also adds
environment variables such as VIRTUAL_ENV so that the interpreter knows
that it is running in a virtual environment mode, and finally it updates the
shell prompt to reflect that the virtual environment is active. Finally, it adds
a deactivate command to restore the shell to it’s original state once we are
done with the virtual environment.
1 root@cd9ed409ec97:~# which python
2 /usr/local/bin/python
3 root@cd9ed409ec97:~# echo $PATH
4
/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:\
5 /sbin:/bin
6 root@cd9ed409ec97:~# . venv/bin/activate
7 (venv) root@cd9ed409ec97:~# which python
8 /root/venv/bin/python
9 (venv) root@cd9ed409ec97:~# echo $PATH
10
/root/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr\
11 /sbin:/usr/bin:/sbin:/bin
12 (venv) root@cd9ed409ec97:~#
Apart from the ./venv/bin folder, the venv package also creates a
./venv/lib folder. Inside this folder nested under a python version folder is
a folder called site-packages. This site-packages directory inside the
./venv/lib/ directory of the virtual environment is where third-party
packages are installed when the virtual environment is active.
1 (venv) root@959fd3c7f2e4:~# python -m pip install ciso8601
2 Collecting ciso8601
3 Downloading ciso8601-2.3.0-cp311-cp311-
manylinux_2_5_x86_64.man\
4 ylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
(39\
5 kB)
6 Installing collected packages: ciso8601
7 Successfully installed ciso8601-2.3.0
8
9 [notice] A new release of pip available: 22.3.1 -> 23.0.1
10 [notice] To update, run: pip install --upgrade pip
11 (venv) root@13be81b026f6:~# ls -1a ./venv/lib/python3.11/site-
pac\
12 kages/
13 .
14 ..
15 _distutils_hack
16 ciso8601
17 ciso8601-2.3.0.dist-info
18 ciso8601.cpython-311-x86_64-linux-gnu.so
19 distutils-precedence.pth
20 pip
21 pip-22.3.1.dist-info
22 pkg_resources
23 setuptools
24 setuptools-65.5.0.dist-info
If instead we want to make a separate and unique object, where state is not
bound between the two variables, we need to instead make a copy of that
value, and then assign that copy, instead of assigning the original variable.
Python provides a copy module in the standard library for just this purpose.
The copy module defines two functions, copy and deepcopy. The function
copy.copy(item) creates a shallow copy of a given item (meaning, only
make a copy of the object itself). Comparatively, copy.deepcopy(item)
creates a deeply nested copy (meaning, each mutable object which is
referenced by the object is also copied).
One of the main reasons for its existence is to make it easier to work with
large datasets or sequences of data without having to write complex and
repetitive code. The library also aims to promote a more functional
programming style, as many of its functions are designed to be used in
combination with other functions to build up more complex operations. The
result is that itertools allows for a more expressive and concise codebase.
Chaining Iterables
Finally, if you only wish to filter out the elements of an iterable while the
callback is satisfied, the itertools.dropwhile() function can be used.
This function creates an iterator that drops elements from the input iterator
as long as a given callback is truthy for those elements. Once the callback
returns a falsy value for an element, the remaining elements are included in
the output iterator. There’s also an analogous itertools.takewhile()
function, which instead of dropping values, it yields values until the
callback returns falsy.
1 >>> import itertools
2 >>> data = range(1, 10)
3 >>> list(itertools.dropwhile(lambda x: x%2 or x%3, data))
4 [6, 7, 8, 9]
5 >>> list(itertools.takewhile(lambda x: x < 5, data))
6 [0, 1, 2, 3, 4]
Cycling Through Iterables
Creating Groups
Slicing Iterables
Zip Longest
Partials
Reduce
Pipes
Caching
Enums
An Enum can also be created using a factory function. In this case, the first
argument is the name of the Enum, and the second argument is an iterable
containing each class variable. The values are automatically assigned as
integers incrementing from 1.
1 from enum import Enum
2
3 class Weekends(Enum):
4 Saturday = object()
5 Sunday = object()
6
7 Weekdays = Enum(
8 "Weekdays",
9 (
10 "Monday",
11 "Tuesday",
12 "Wednesday",
13 "Thursday",
14 "Friday",
15 ),
16 )
Once an Enum is defined, you can use the enumerated values as if they were
variables. Enums in python are immutable, meaning that their values cannot
be changed after they are created, and they are also comparable, meaning
that you can use them in comparison operations.
1 match current_day:
2 case Weekdays.Monday:
3 print("Today is Monday")
4 case Weekdays.Tuesday:
5 print("Today is Tuesday")
6 case Weekdays.Wednesday:
7 print("Today is Wednesday")
8 case Weekdays.Thursday:
9 print("Today is Thursday")
10 case Weekdays.Friday:
11 print("Today is Friday")
12 case Weekends.Saturday | Weekends.Sunday:
13 print("Weekend!")
NamedTuples
NamedTuples are a special type of tuple in Python that allows you to give
names to the elements of the tuple. They are designed to be interoperable
with regular tuples, but they also allow you to access elements by name, in
addition to by index.
The NamedTuple class from the typing module can be used to create a
NamedTuple. The derived class defines attributes via type annotations.
Instance variables are defined at instantiation. A NamedTuple can be
instantiated using variadic arguments or keyword arguments; in the case of
using variadic args, the order of attributes defined in the class determine
which indexed value is assigned to the attribute.
Dataclasses
Mutable default values can be provided using the fields function. fields
takes an optional default_factory argument that must be a callable of zero
arguments.
1 >>> from dataclasses import dataclass, field
2 >>> @dataclass
3 ... class Group:
4 ... people: list[str] = field(default_factory=list)
5 ...
6 >>> Group(["Jessie", "Ryan"])
7 Group(people=['Jessie', 'Ryan'])
Chapter 20. Multithreading and
Multiprocessing
Parallelism refers to the concept of running multiple tasks simultaneously,
in order to improve performance and speed up the overall execution of a
program. In Python, parallelism can be achieved through the use of
multithreading and multiprocessing.
Multithreading involves the use of multiple threads, which are small units
of a process that can run independently of each other. The threading module
can be used to create and manage threads. Multithreading can be useful for
tasks that involve a lot of waiting, such as waiting for input/output
operations to complete or waiting for a response from a network resource.
Multithreading
If you want to subclass the Thread object, you need to be sure to explicitly
call the parent class’s __init__ method inside the child class’s __init__
method, or use a @classmethod for instantiation. Furthermore, you should
not call the .run() method directly. Rather, use the .start() method to
start the thread. The .start() method from threading.Thread will create
a new native thread, and it will call the .run() method.
1 >>> import threading, time
2 >>> class MyThread(threading.Thread):
3 ... @classmethod
4 ... def create(cls, value):
5 ... self = cls()
6 ... self.value = value
7 ... self.start()
8 ... return self
9 ... def run(self):
10 ... time.sleep(self.value)
11 ... print(f"Thread slept {self.value} seconds")
12 ...
13 >>> threads = [MyThread.create(i) for i in range(5)]
14 >>> # .join() returns None, so any() will exhaust the expression
15 >>> any(t.join() for t in threads)
16 Thread slept 0 seconds
17 Thread slept 1 seconds
18 Thread slept 2 seconds
19 Thread slept 3 seconds
20 Thread slept 4 seconds
21 False
Thread Locks
The Lock() class defines two methods: .acquire() and .release() that
can be used to acquire and release the lock. The .acquire() method can
also take an optional blocking parameter, which defaults to True. When
blocking is True, the .acquire() method blocks the execution of a thread
until the lock can be acquired; when blocking is False, the .acquire()
method returns immediately with a boolean indicating whether the lock was
acquired or not. Additionally, the .acquire() method takes an optional
timeout parameter, which will release the lock after a certain amount of
time, in case of deadlock.
A lock object can also be used as a context manager. The arguments of the
__enter__ method are equivalent to the .acquire() method.
1 >>> import threading, time
2 >>> value = 0
3 >>> lock = threading.Lock()
4 >>> def increment():
5 ... with lock:
6 ... global value
7 ... tmp = value
8 ... time.sleep(0.1)
9 ... value = tmp + 1
10 ...
11
12 >>> threads = (
13 ... threading.Thread(target=increment)
14 ... for _ in range(2)
15 ... )
16 ...
17 >>> for t in threads:
18 ... t.start()
19 ...
20 >>> for t in threads:
21 ... t.join()
22 ...
23 >>> value # should be 2
24 1
Multiprocessing
The Process() class can be used to create and manage new processes. A
process is a separate execution environment, with its own memory space
and Python interpreter. This allows you to take advantage of multiple cores
on a machine, and to work around the Global Interpreter Lock (GIL) that
prevents multiple threads from executing Python code simultaneously. The
multiprocessing.Process class has the same API as the
threading.Thread class.
1 >>> def f(name):
2 ... print(f"Hello {name}")
3
4 >>> multiprocessing.Process(target=f, args=("Peter",)).start()
5
6 Hello Peter
7
8 >>> import multiprocessing, time
9 >>> class MyProcess(multiprocessing.Process):
10 ... @classmethod
11 ... def create(cls, value):
12 ... self = cls()
13 ... self.value = value
14 ... self.start()
15 ... return self
16 ... def run(self):
17 ... time.sleep(self.value)
18 ... print(f"Process slept {self.value} seconds")
19 ...
20 >>> ps = [MyProcess.create(i) for i in range(5)]
21 >>> any(p.join() for p in ps)
22 Process slept 0 seconds
23 Process slept 1 seconds
24 Process slept 2 seconds
25 Process slept 3 seconds
26 Process slept 4 seconds
27 False
The multiprocessing library also provies a Pool class that can be used to
orchestrate multiple tasks in parallel. A pool of worker processes is a group
of processes that can be reused to execute multiple tasks.
1 >>> import multiprocessing
2 >>> def double(n):
3 ... return n * 2
4 ...
5 >>> with multiprocessing.Pool(processes=4) as pool:
6 ... results = pool.map(double, range(8))
7 ... print(results)
8 ...
9 [0, 2, 4, 6, 8, 10, 12, 14]
In this example, a Pool() object is created with four processes and assigned
to the variable pool. The pool’s .map() method is used to apply
my_function() on multiple inputs in parallel. The .map() method returns a
list of the results in the order of the input.
The Pool() class also provides several other methods for submitting tasks,
such as .imap(), .imap_unordered() and .apply(). The .imap() and
.imap_unordered() methods are similar to .map(), except these methods
return iterators which yields results lazily. The distinction between the two
is that .imap() will yield results in order, and .imap_unordered() will
yield results arbitrarily. Finally, the .apply() method is used to submit a
single task for execution, and it blocks the main process until the task is
completed.
Process Locks
concurrent.futures
It’s common to utilize as_completed when you want to process the results
of the tasks as soon as they are completed, regardless of the order they were
submitted. On the other hand, wait is useful when you want to block until
all the tasks are completed and retrieve the results in the order they were
submitted.
1 >>> import concurrent.futures
2 >>> def double(n):
3 ... return n * 2
4 ...
5 >>> with concurrent.futures.ThreadPoolExecutor() as exec:
6 ... results = [exec.submit(double, i) for i in range(10)]
7 ... values = [
8 ... future.result() for future in
9 ... concurrent.futures.as_completed(results)
10 ... ]
11 ...
12 >>> values
13 [12, 18, 10, 8, 2, 16, 0, 4, 6, 14]
14 >>> with concurrent.futures.ProcessPoolExecutor() as exec:
15 ... jobs = [exec.submit(double, i) for i in range(10)]
16 ... results = concurrent.futures.wait(jobs)
17 ... values = list(future.result() for future in results.done)
18 ...
19 >>> values
20 [0, 18, 6, 14, 2, 12, 16, 4, 8, 10]
Chapter 21. Asyncio
The asyncio library is a redesign of how you use threading in python to
handle I/O and network related tasks. Previously, in order to implement
threading, you would use the threading module, which relies on a
preemptive task switching mechanism for coordination between threads. In
preemptive multitasking, the scheduler is responsible for interrupting the
execution of a task and switching to another task. This means that the
interpreter can interrupt the execution of a thread and switch to another
thread at any point. This can happen when a thread exceeds its time slice or
when a higher-priority thread becomes ready to run.
It’s important to note that when using coroutines, you should avoid
blocking operations, such as waiting for file I/O or performing
computations, as it will block the event loop and prevent other tasks from
running. Instead, you should use non-blocking operations, such as
asyncio.sleep() or async with statements.
The except* handler unpacks the exceptions of the exception group and it
allow you to handle each exception in isolation from other exceptions
caught by the group exception handler. In this case, the AssertionError
and the ValueError are print to the console, and the main block exits.
Part VI. The Underbelly of the Snake
As Python developers, it’s important to have the skills and tools to find and
fix bugs, and optimize the performance of our code. As the codebase grows
and becomes more complex, it becomes increasingly difficult to identify
and fix issues. Additionally, as our respected platforms start to gain users,
or start deployed in more demanding environments, the importance of
ensuring that it is performing well becomes even more critical.
In this section, we’ll cover the various techniques and tools available for
debugging and profiling Python, as well as how to optimize bottlenecks
with C extensions.
Chapter 22. Debugging
Debugging Python can be a bit different experience, as compared to
debugging code in a statically typed language. In a dynamically typed
language like Python, the type of a variable is determined at run-time,
rather than at compile-time. This can make it more difficult to catch certain
types of errors, such as type errors, before they occur.
However, Python also provides a number of tools and features that can
make debugging easier. By taking advantage of these tools and
familiarizing yourself with them, you can become more effective at finding
and fixing bugs.
pdb
The Python Debugger, also known as pdb, is a built-in module that allows
you to debug your code by stepping through it line by line and inspecting
the state of the program at each step. It is a command-line interface that
provides a set of commands for controlling the execution of the program
and for inspecting the program’s state.
When using pdb, you can set breakpoints in your code, which will cause the
interpreter to pause execution at that point, and drop you into a debugger
REPL. You can then use various commands to inspect the state of the
program, such as viewing the values of variables, inspecting the call stack,
and seeing the source code. You can also step through your code, line by
line, to see how each command mutates state.
1 # ./closure.py
2
3 def my_closure(value):
4 def my_decorator(fn):
5 def wrapper(*args, **kwargs):
6 _enclosed = (fn, value)
7 breakpoint()
8 return wrapper
9 return my_decorator
10
11 @my_closure("this")
12 def my_function(*args, **kwargs):
13 pass
14
15 my_function(None, kwarg=1)
From here, we have a set of commands which we can use to inspect the
state of our program. Here are some of the most common:
1 root@e08d854dfbfe:~# ls
2 script.py
3 root@e08d854dfbfe:~# python ./script.py
4 --Return--
5 > /root/script.py(5)wrapper()->None
6 -> breakpoint()
7 (Pdb) help
8
9 Documented commands (type help <topic>):
10 ========================================
11 EOF c d h list q rv
\
12 undisplay
13 a cl debug help ll quit s
\
14 unt
15 alias clear disable ignore longlist r source
\
16 until
17 args commands display interact n restart step
\
18 up
19 b condition down j next return tbreak
w
20 break cont enable jump p retval u
\
21 whatis
22 bt continue exit l pp run unalias
\
23 where
24
25 Miscellaneous help topics:
26 ==========================
27 exec pdb
28 (Pdb) help whatis
29 whatis arg
30 Print the type of the argument.
In addition, the pdb repl is able to run executable python. This includes
builtin functions, comprehensions, etc.
1 (Pdb) list
2 1 def my_closure(value):
3 2 def my_decorator(fn):
4 3 def wrapper(*args, **kwargs):
5 4 _enclosed = (fn, value)
6 5 -> breakpoint()
7 6 return wrapper
8 7 return my_decorator
9 8
10 9 @my_closure("this")
11 10 def my_function(*args, **kwargs):
12 11 pass
13
14 (pdb) where
15 /root/script.py(13)<module>()
16 -> my_function(0, kwarg=1)
17 > /root/script.py(5)wrapper()->None
18 -> breakpoint()
19
20 (pdb) p args
21 (0, )
22
23 (pdb) p value
24 'this'
25
26 (Pdb) dir(kwargs)
27 ['__class__', '__class_getitem__', '__contains__', '__delattr__',
28 '__delitem__', '__dir__', '__doc__', '__eq__', '__format__',
'__g\
29 e__',
30 '__getattribute__', '__getitem__', '__getstate__', '__gt__',
'__h\
31 ash__',
32 '__init__', '__init_subclass__', '__ior__', '__iter__',
'__le__',\
33 '__len__',
34 '__lt__', '__ne__', '__new__', '__or__', '__reduce__',
'__reduce_\
35 ex__',
36 '__repr__', '__reversed__', '__ror__', '__setattr__',
'__setitem_\
37 _',
38 '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy',
'fr\
39 omkeys',
40 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault',
'update',\
41 'values']
42
43 (pdb) [x for x in dir(kwargs) if not x.startswith('__')]
44 ['clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop',
'pop\
45 item',
46 'setdefault', 'update', 'values']
47
48 (Pdb) n
49 --Return--
50 > /root/script.py(13)<module>()->None
51 -> my_function(0, kwarg=1)
52
53 (Pdb) l
54 8
55 9 @my_closure("this")
56 10 def my_function(*args, **kwargs):
57 11 pass
58 12
59 13 -> my_function(0, kwarg=1)
60 14
Python programs can be run within the context of the python debugger. This
allows the debugger to catch unexpected exceptions, dropping you into the
debugger REPL at the point where an Exception is raised, as opposed to
python just printing the stack trace on exit. To do this, run your program via
the pdb module by executing python -m pdb ./script.py.
1 # ./script.py
2
3 def my_closure(value):
4 def my_decorator(fn):
5 def wrapper(*args, **kwargs):
6 _enclosed = (fn, value)
7 raise ValueError
8 return wrapper
9 return my_decorator
10
11 @my_closure("this")
12 def my_function(*args, **kwargs):
13 pass
14
15 my_function(0, kwarg=1)
Other Debuggers
There are several third-party python debuggers in the python ecosystem that
are also worth discussing. Pdb++ (pdbpp) is a drop-in replacement for the
built-in pdb, so you can use it in the same way you would use pdb. pdbpp
implements feature enhancements such as syntax highlighting, tab
completion, and smart command parsing. As of this writing though its main
branch is only compatible with python 3.10.
1 root@e08d854dfbfe:~# python -m pip install pdbpp
2 Collecting pdbpp
3 Downloading pdbpp-0.10.3-py2.py3-none-any.whl (23 kB)
4 ...
5
6 root@ab2771e522c8:~# python -m pdb script.py
7 [2] > /root/script.py(3)<module>()
8 -> def my_closure(value):
9 (Pdb++)
cProfile
One of the most commonly used tools for profiling Python code is the
cProfile module. It is a built-in library that generates statistics on the
number of calls and the time spent in each function. This information can
be used to identify which parts of the code are taking the most time to
execute, and make adjustments accordingly.
To use cProfile, you can run your script with the command python -m
cProfile ./script.py and it will output the statistics of the script’s
execution. You can pass an optional -s argument so as to control how the
output is sorted; by default the output is sorted by the call count, but can be
set to cumulative to sort by cumulative time, ncalls to sort by the call
count, etc. You can also pass -o ./file.prof to dump the results to a file,
though -s and -o are mutually exclusive.
1 import time
2
3
4 def slow_mult(a, b):
5 time.sleep(1.1)
6 return a * b
7
8
9 def fast_mult(a, b):
10 time.sleep(0.1)
11 return a * b
12
13
14 def run_mult(a, b):
15 x = slow_mult(a, b)
16 y = fast_mult(a, b)
17 _abs = abs(x - y)
18 return _abs < 0.001
19
20
21 def main():
22 a, b = 1, 2
23 run_mult(a, b)
24
25
26 if __name__ == "__main__":
27 main()
flameprof
The data dump of a cProfile run can be used to generate what’s called a
flame graph. Flame graphs are visual representations of how much time is
spent within the scope of a given function call. Each bar in a flame graph
represents a function and its subfunctions, with the width of the bar
representing the amount of time spent in that function. Functions that take
up more time are represented by wider bars, and functions that take up less
time are represented by narrower bars. The functions are stacked vertically,
with the main function at the bottom and the subfunctions at the top.
The python library flameprof can be used to generate flame graphs from
the output of cProfile. To generate one, first run cProfile with the -o
argument to dump results to a file. Next, use flameprof to ingest the dump
file. flameprof will use that profile to generate an svg file. You can open
this file in a web browser to see the results.
1 root@3668136f44b5:/code# python -m cProfile -o ./script.prof
./sc\
2 ript.py
3
4 root@3668136f44b5:/code# python -m pip install flameprof
5 Collecting flameprof
6 Downloading flameprof-0.4.tar.gz (7.9 kB)
7 Preparing metadata (setup.py) ... done
8 Building wheels for collected packages: flameprof
9 Building wheel for flameprof (setup.py) ... done
10 Created wheel for flameprof: filename=flameprof-0.4-py3-none-
an\
11 y.whl size=8009
sha256=487bbd51bcf377fa2f9e0795db03b46896d1b41adf\
12 719272ea8e187abbd85bb5
13 Stored in directory:
/root/.cache/pip/wheels/18/93/7e/afc52a495\
14 a87307d7b93f5e03ee364585b0edf120fb98eff99
15 Successfully built flameprof
16 Installing collected packages: flameprof
17 Successfully installed flameprof-0.4
18 WARNING: Running pip as the 'root' user can result in broken
perm\
19 issions and conflicting behaviour with the system package
manager\
20 . It is recommended to use a virtual environment instead:
https:/\
21 /pip.pypa.io/warnings/venv
22
23 root@3668136f44b5:/code# python -m flameprof ./script.prof >
scri\
24 pt.svg
25
26 root@3668136f44b5:/code# ls
27 script.prof script.py script.svg
flamegraph
snakeviz
snakeviz
memory_profiler
Hello World
To start, we’re going to create a new python package with the following
folder structure:
1 my_package/
2 __init__.py
3 hello_world.c
4 setup.py
The __init__.py file is an empty file designating my_package as a python
package. The hello_world.c file contains our extension code. The
setup.py file contains build instructions and metadata for our package.
Finally, the hello_world.c file defines a set of static constructs which
abstract C functions into objects with which the python interpreter knows
how to interact.
hello_world.c
setup.py
Now that we have our C extension written to a file, we need to tell python
how to compile this extension into a dynamically linked shared object
library. The most straightforward way to do this is to use setuptools to
handle compilation. This is done using the setup.py file, in which we
create a collection of Extension objects.
Each Extension object takes, at minimum, a name which python uses for
import purposes, and a list of C source files to be compiled into the
extension. Since python package imports are relative to the root of the
package, it’s typically convenient to have the name mimic the import path
of the extension file, i.e. ./my_package/hello_world.c maps to the
my_package.hello_world import statement. While not strictly necessary, I
find this makes it easier for compilation purposes. The filepaths to these C
files are relative to the setup.py file. The collection is then passed to the
setup() function under the keyword argument ext_modules.
1 # ./setup.py
2
3 import os.path
4 from setuptools import setup, Extension
5
6 extensions = [
7 Extension(
8 'my_package.hello_world',
9 [os.path.join('my_package', 'hello_world.c')]
10 )
11 ]
12
13 setup(
14 name="my_package",
15 ext_modules=extensions
16 )
Finally, to use our extension, install the package and call the module
function
in Python.
1 root@edc7d7fa9220:/code# ls
2 my_package setup.py
3
4 root@edc7d7fa9220:/code# python -m pip install -e .
5 Obtaining file:///code
6 Preparing metadata (setup.py) ... done
7 Installing collected packages: my-package
8 Running setup.py develop for my-package
9 Successfully installed my-package-0.0.0
10
11 [notice] A new release of pip available: 22.3.1 -> 23.0
12 [notice] To update, run: pip install --upgrade pip
13
14 root@edc7d7fa9220:/code# python
15 Python 3.11.1 (main, Jan 23 2023, 21:04:06) [GCC 10.2.1
20210110]\
16 on linux
17 Type "help", "copyright", "credits" or "license" for more
informa\
18 tion.
19 >>> from my_package import hello_world
20 >>> hello_world.__doc__
21 "module documentation for 'Hello World'"
22 >>> hello_world.hello_world.__doc__
23 "prints 'Hello World!' to the screen, from C."
24 >>> hello_world.hello_world()
25 Hello World!
26 >>>
In order to pass python objects into C functions, we need to set a flag in the
function signature to one of either METH_O (indicating a single PyObject),
METH_VARARGS (including positional arguments only), or
METH_VARARGS|METH_KEYWORDS (including both positional and keyword
arguments).
All objects in Python are PyObject C structs. In order to make use of our
python objects in C, the data from python needs to be unwrapped from their
PyObject shells and then cast into a corresponding C data type. The
primitive python data types have corresponding Py<Type>_<As/From>
<Type> functions for doing this cast operation.
1 #include <Python.h>
2
3 static PyObject* sum(PyObject* self, PyObject* args) {
4 PyObject* iter = PyObject_GetIter(args);
5 PyObject* item;
6
7 long res_i = 0;
8 double res_f = 0;
9
10 while ((item = PyIter_Next(iter))) {
11 if (PyLong_Check(item)) {
12 long val_i = PyLong_AsLong(item);
13 res_i += val_i;
14 }
15 else if (PyFloat_Check(item)) {
16 double val_f = PyFloat_AsDouble(item);
17 res_f += val_f;
18 }
19 Py_DECREF(item);
20 }
21 Py_DECREF(iter);
22
23 if (res_f) {
24 double result = res_f + res_i;
25 return PyFloat_FromDouble(result);
26 }
27 return PyLong_FromLong(res_i);
28 }
29
30 static PyMethodDef MethodTable[] = {
31 {
32 "sum",
33 (PyCFunction) sum,
34 METH_VARARGS,
35 "returns the sum of a series of numeric types"
36 },
37 {NULL, }
38 };
39
40
41 static struct PyModuleDef MyMathModule = {
42 PyModuleDef_HEAD_INIT,
43 "math",
44 "my math module",
45 -1,
46 MethodTable,
47 };
48
49 PyMODINIT_FUNC PyInit_math() {
50 return PyModule_Create(&MyMathModule);
51 }
Memory Management
Parsing Arguments
Creating PyObjects
In the previous section we looked at how we can take PyObjects and parse
them into C types for computation. We also need to know how to do the
inverse; given a collection of C types, create PyObjects which we can pass
back to the interpreter. As mentioned previously, primitive data types can be
packaged into PyObject’s using their corresponding Py<Type>_<As/From>
<Type> functions. These functions create owned references, which can be
returned directly from our C functions.
Primitive data types can also be packaged into collections. Each collection
type has a corresponding constructor, Py<Type>_New(), which can create a
new object of a specified type. Each datatype also has corresponding
functions to operate on their respected PyObjects; these functions mostly
correspond to the methods of a given type available at the python level -
PyList_Append() vs. list.append(), PySet_Add() vs. set.add, etc.
It should be noted that in this example each data type is only being assigned
once, if you wish to assign the same object to multiple collections, for
example appending the _float to the _list, as well as adding it to the
_set, you’ll need to increase the reference count.
Importing Modules
To create a new type, you first need to define a struct that represents the
type and initialize it using the PyTypeObject struct. The PyTypeObject
struct contains fields such as; tp_name, which is the name of the type;
tp_basicsize, which is the size of the type in bytes; tp_new, tp_init, and
tp_dealloc, which are functions that are called when an instance of the
type is being allocated, initialized, and destroyed respectively; and both
Member and Method tables for defining the attributes and methods which
are exposed by our object. Once you have defined the struct, you can create
the new type by calling the PyType_Ready() function and passing it a
reference to the struct. This function initializes the type and makes it
available to in the interpreter. Finally, to export the type from the module,
you define a module initialization function using the PyMODINIT_FUNC
macro and call the PyModule_Create() function to create a new module.
On this object you use the PyModule_AddObject() function to insert the
custom PyTypeObject into the module.
Next, we define a Person struct. This struct represents the new type in
Python that can be used to model a person. The struct is initialized using the
PyObject_HEAD macro for including the standard prefixes for all Python
objects in C, and two custom fields, first_name and last_name, which are
pointers to the Python Objects which will represent the first and last name
of our person.
1 static void Person_Destruct(Person* self) {
2 Py_XDECREF(self->first_name);
3 Py_XDECREF(self->last_name);
4 Py_TYPE(self)->tp_free((PyObject*)self);
5 }
Next, we define two structs which hold definitions of the attributes and
methods of the Person type. PersonMembers is an array of PyMemberDef
structs, where each struct of the array defines an attribute of the Person
type, including their name, type, offset, read/write access, and description.
PersonMethods is an array of PyMethodDef structs, which defines a
methods name, function pointer, flag, and description. The
Person_FullName function is included in this array as a method named
“full_name”. Both structs are NULL terminated.
1 static PyTypeObject PersonType = {
2 PyVarObject_HEAD_INIT(NULL, 0)
3 "person.Person", /* tp_name */
4 sizeof(Person), /* tp_basicsize */
5 0, /* tp_itemsize */
6 (destructor)Person_Destruct, /* tp_dealloc */
7 0, /* tp_print */
8 0, /* tp_getattr */
9 0, /* tp_setattr */
10 0, /* tp_reserved */
11 0, /* tp_repr */
12 0, /* tp_as_number */
13 0, /* tp_as_sequence */
14 0, /* tp_as_mapping */
15 0, /* tp_hash */
16 0, /* tp_call */
17 0, /* tp_str */
18 0, /* tp_getattro */
19 0, /* tp_setattro */
20 0, /* tp_as_buffer */
21 Py_TPFLAGS_DEFAULT
22 | Py_TPFLAGS_BASETYPE, /* tp_flags */
23 "Person objects", /* tp_doc */
24 0, /* tp_traverse */
25 0, /* tp_clear */
26 0, /* tp_richcompare */
27 0, /* tp_weaklistoffset */
28 0, /* tp_iter */
29 0, /* tp_iternext */
30 PersonMethods, /* tp_methods */
31 PersonMembers, /* tp_members */
32 0, /* tp_getset */
33 0, /* tp_base */
34 0, /* tp_dict */
35 0, /* tp_descr_get */
36 0, /* tp_descr_set */
37 0, /* tp_dictoffset */
38 (initproc)Person_Init, /* tp_init */
39 0, /* tp_alloc */
40 PyType_GenericNew, /* tp_new */
41 };
Stack type
It should be noted that this example reallocates the _data structure on each
item write; this is not optimized for purposes of brevity.
Since this object is responsible for its own data collection, it’s worth talking
a moment to look at it’s implementation in finer detail. Specifically, the
push() and pop() methods of the struct, as well as the Stack_Destruct
implementation that is called at garbage collection.
When an item is popped from the Stack, we can simply return it without
decreasing the reference count, because we are returning an owned
reference to the caller.
Debugging C extensions
Similar to pdb, it’s possible to stop the execution of the python interpreter
inside c extensions using a C debugger like gdb. gdb allows you to find and
fix errors in C code by providing you with information about the state of the
extension while it is running.
Once the C extension is compiled, use gdb to start the python interpreter. It
should be noted that gdb requires a binary executable, so modules (for
example pytest) and scripts need to be invoked from the python
executable as either a module or a script.
1 root@be45641b03f3:/code/extensions# gdb --args python script.py
2 GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
3 Copyright (C) 2021 Free Software Foundation, Inc.
4 License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licens\
5 es/gpl.html>
6 This is free software: you are free to change and redistribute
it.
7 There is NO WARRANTY, to the extent permitted by law.
8 Type "show copying" and "show warranty" for details.
9 This GDB was configured as "x86_64-linux-gnu".
10 Type "show configuration" for configuration details.
11 For bug reporting instructions, please see:
12 <https://www.gnu.org/software/gdb/bugs/>.
13 Find the GDB manual and other documentation resources online at:
14 <http://www.gnu.org/software/gdb/documentation/>.
15
16 For help, type "help".
17 Type "apropos word" to search for commands related to "word"...
18 Reading symbols from python...
19 (gdb)
Once in the gdb shell, we can do things such as set breakpoints, inspect the
call stack, observe variables, and run our program.
1 (gdb) b Stack_Push
2 Function "Stack_Push" not defined.
3 Make breakpoint pending on future shared library load? (y or
[n])\
4 y
5 Breakpoint 1 (Stack_Push) pending.
6 (gdb) run
7 Starting program: /usr/local/bin/python script.py
8 warning: Error disabling address space randomization: Operation
n\
9 ot permitted
10 [Thread debugging using libthread_db enabled]
11 Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_\
12 db.so.1".
13
14 Breakpoint 1, Stack_Push (self=0x7f0d84d2bde0, item=0) at
my_pack\
15 age/stack.c:13
16 13 size_t len = self->length + 1;
17 (gdb) b 17
18 Breakpoint 2 at 0x7f0d84adb25c: file my_package/stack.c, line 17.
19 (gdb) c
20 Continuing.
21
22 Breakpoint 2, Stack_Push (self=0x7f0d84d2bde0, item=0) at
my_pack\
23 age/stack.c:17
24 17 self->length = len;
25 (gdb) p len
26 $1 = 1
27 (gdb) l
28 12 static PyObject* Stack_Push(Stack* self, PyObject* item)
{
29 13 size_t len = self->length + 1;
30 14 self->_data = realloc(self->_data,
len*sizeof(PyObjec\
31 t*));
32 15 Py_INCREF(item);
33 16 self->_data[self->length] = item;
34 17 self->length = len;
35 18 Py_RETURN_NONE;
36 19 }
37 20
38 21 static PyObject* Stack_Pop(Stack* self) {
39 (gdb) p *self
40 $2 = {ob_base = {ob_refcnt = 2, ob_type = 0x7f0d84ade140
<StackTy\
41 pe>}, length = 0,
42 _data = 0x563ee09d7000, push = 0x0, pop = 0x0}
43 (gdb) p *item
44 $3 = {ob_refcnt = 1000000155, ob_type = 0x7f0d8564b760
<PyLong_Ty\
45 pe>}
46 <b>(gdb) p (long)PyLong_AsLong(item)</b>
47 $4 = 0
48 (gdb)