Book Numerical
Book Numerical
Book Numerical
A N N O TAT E D A L G O R I T H M S I N P Y T H O N
W I T H A P P L I C AT I O N S I N P H Y S I C S , B I O L O G Y, A N D F I N A N C E ( 2 N D E D )
EXPERTS4SOLUTIONS
Copyright 2013 by Massimo Di Pierro. All rights reserved.
THE CONTENT OF THIS BOOK IS PROVIDED UNDER THE TERMS OF THE CRE-
ATIVE COMMONS PUBLIC LICENSE BY-NC-ND 3.0.
http://creativecommons.org/licenses/by-nc-nd/3.0/legalcode
ISBN: 978-0-9911604-0-2
Build Date: June 6, 2017
to my parents
Contents
1 Introduction 15
1.1 Main Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 About Python . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Book Structure . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Book Software . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.6 lambda . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.1 Special methods and operator overloading . . . . 49
2.4.2 class Financial Transaction . . . . . . . . . . . . . 50
2.5 File input/output . . . . . . . . . . . . . . . . . . . . . . . 51
2.6 How to import modules . . . . . . . . . . . . . . . . . . . 52
2.6.1 math and cmath . . . . . . . . . . . . . . . . . . . . . 53
2.6.2 os . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6.3 sys . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.4 datetime . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.5 time . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6.6 urllib and json . . . . . . . . . . . . . . . . . . . . 55
2.6.7 pickle . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6.8 sqlite . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.6.9 numpy . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.6.10 matplotlib . . . . . . . . . . . . . . . . . . . . . . . 66
2.6.11 ocl . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3 Theory of Algorithms 75
3.1 Order of growth of algorithms . . . . . . . . . . . . . . . 76
3.1.1 Best and worst running times . . . . . . . . . . . . 79
3.2 Recurrence relations . . . . . . . . . . . . . . . . . . . . . 83
3.2.1 Reducible recurrence relations . . . . . . . . . . . 85
3.3 Types of algorithms . . . . . . . . . . . . . . . . . . . . . . 88
3.3.1 Memoization . . . . . . . . . . . . . . . . . . . . . 90
3.4 Timing algorithms . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Data structures . . . . . . . . . . . . . . . . . . . . . . . . 94
3.5.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.5.2 List . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.5.3 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.4 Queue . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.5 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.6 Tree algorithms . . . . . . . . . . . . . . . . . . . . . . . . 98
3.6.1 Heapsort and priority queues . . . . . . . . . . . . 98
3.6.2 Binary search trees . . . . . . . . . . . . . . . . . . 102
7
9 Appendices 367
9.1 Appendix A: Math Review and Notation . . . . . . . . . 367
9.1.1 Symbols . . . . . . . . . . . . . . . . . . . . . . . . 367
9.1.2 Set theory . . . . . . . . . . . . . . . . . . . . . . . 368
9.1.3 Logarithms . . . . . . . . . . . . . . . . . . . . . . 371
9.1.4 Finite sums . . . . . . . . . . . . . . . . . . . . . . 372
9.1.5 Limits (n → ∞) . . . . . . . . . . . . . . . . . . . . 373
Index 379
Bibliography 383
CONTENTS 13
1
Introduction
This book is assembled from lectures given by the author over a period of
10 years at the School of Computing of DePaul University. The lectures
cover multiple classes, including Analysis and Design of Algorithms, Sci-
entific Computing, Monte Carlo Simulations, and Parallel Algorithms.
These lectures teach the core knowledge required by any scientist inter-
ested in numerical algorithms and by students interested in computa-
tional finance.
The notes are not comprehensive, yet they try to identify and describe
the most important concepts taught in those courses using a few common
tools and unified notation.
In particular, these notes do not include proofs; instead, they provide
definitions and annotated code. The code is built in a modular way and
is reused as much as possible throughout the book so that no step of the
computations is left to the imagination. Each function defined in the code
is accompanied by one or more examples of practical applications.
We take an interdisciplinary approach by providing examples in finance,
physics, biology, and computer science. This is to emphasize that, al-
though we often compartmentalize knowledge, there are very few ideas
and methodologies that constitute the foundations of them all. Ultimately,
this book is about problem solving using computers. The algorithms you
16 annotated algorithms in python
Even if we cover many different algorithms and examples, there are a few
central ideas in this book that we try to emphasize over and over.
The first idea is that we can simplify the solution of a problem by using
an approximation and then systematically improve our approximation by
iterating and computing corrections.
The divide-and-conquer methodology can be seen as an example of this
approach. We do this with the insertion sort when we sort the first two
numbers, then we sort the first three, then we sort the first four, and so
on. We do it with merge sort when we sort each set of two numbers,
then each set of four, then each set of eight, and so on. We do it with the
Prim, Kruskal, and Dijkstra algorithms when we iterate over the nodes of
a graph, and as we acquire knowledge about them, we use it to update
the information about the shortest paths.
We use this approach in almost all our numerical algorithms because any
differentiable function can be approximated with a linear function:
the book. In all these algorithms, we start with a random guess for the
solution, and we iteratively find a better one until convergence.
The second idea of the book is that certain quantities are random, but even
random numbers have patterns that we can capture using instruments
like distributions and correlations. The presence of these patterns helps
us model those systems that may have a random output (e.g., nuclear
reactions, financial systems) and also helps us in computations. In fact,
we can use random numbers to compute quantities that are not random
(Monte Carlo methods). The most common approximation that we make
in different parts of the book is that when a random variable x is localized
at a point with a given uncertainty, δx, then its distribution is Gaussian.
Thanks to the properties of Gaussian random numbers, we conclude the
following:
• Using the linear approximation (our first big idea), if z = f ( x ), the
uncertainty in the output is
δz = f 0 ( x )δx (1.2)
We use this over and over, for example, when relating the volatility
over different time intervals (daily, yearly).
• If we compute an average of N independent and identically distributed
Gaussian random variables, z = 1/N ∑ xi , the uncertainty in the aver-
age is
√
δz = δx/ N (1.5)
18 annotated algorithms in python
We use this to estimate the error on the average in a Monte Carlo com-
√
putation. In that case, we write it as dµ = σ/ N, and σ is the standard
deviation of { xi }.
The third idea is that the time it takes to run an iterative algorithm is pro-
portional to the number of iterations. It is therefore our goal to minimize
the number of iterations required to reach a target precision. We develop
a language to compare algorithms based on their running time and clas-
sify algorithms into categories. This is useful to choose the best algorithm
based on the problem at hand.
In the chapter on parallel algorithms, we learn how to distribute those
iterations over multiple parallel processes and how to break individual
iterations into independent steps that can be executed concurrently on
parallel processes, to reduce the total time required to obtain a solution
within a given target precision. In the parallel case, the running time ac-
quires an overhead that depends on the communication patterns between
the parallel processes, the communication latency, and bandwidth.
In the ultimate analysis, we can even try to understand ourselves as a par-
allel machine that models the input from the world by approximations.
The brain is a graph that can be modeled by a neural network. The learn-
ing process is an ongoing optimization process in which the brain adjusts
its synapses to produce better and better responses. The decision process
mimics a search tree. We solve problems by searching for the most simi-
lar problems that we have encountered before, then we refine the solution.
Our DNA is a code that evolved to efficiently compress the information
necessary to grow us from a single cell into a complex being. We evolved
according to evolutionary mechanisms that can be modeled using genetic
algorithms. We can find our similarities with other organisms using the
longest common subsequence algorithm. We can reconstruct our evolu-
tionary tree using shortest-path algorithms and find out how we came to
be.
introduction 19
The programming language used in this book is Python [1] version 2.7.
This is because Python algorithms are very similar to the corresponding
pseudo-code, and therefore this language is easy to read and understand
compared to other languages such as C++ or Java. Moreover, Python
is a popular language in many Universities and Companies (including
Google).
The goal of the book is to explain the algorithms by building them from
scratch. It is not our goal to teach the user about existing libraries that may
be (and often are) faster than our implementation. Two notable examples
are NumPy [2] and SciPy [3]. These libraries provide a Python interface
to the BLAS and LAPACK libraries for linear algebra and applications.
Although we wholeheartedly recommend using them when developing
production code, we believe they are not appropriate for teaching the
algorithms themselves because those algorithms are written in C, FOR-
TRAN, and assembly languages and are not easy to read.
arrays, stacks, queues, trees, and graphs. We also review the classifi-
cation of basic algorithms such as divide-and-conquer, dynamic pro-
gramming, and greedy algorithms. In the examples, we peek into com-
plex algorithms such as Shannon–Fano compression, a maze solver, a
clustering algorithm, and a neural network.
• In chapter 4, we talk about traditional numerical algorithms, in particu-
lar, linear algebra, solvers, optimizers, integrators, and Fourier–Laplace
transformations. We start by reviewing the concept of Taylor series and
their convergence to understand approximations, sources of error, and
convergence. We then use those concepts to build more complex algo-
rithms by systematically improving their first-order (linear) approxima-
tion. Linear algebra serves us as a tool to approximate and implement
functions of many variables.
• In chapter 5, we provide a review of probability and statistics and im-
plement basic Python functions to perform statistical analysis of ran-
dom variables.
• In chapter 6, we discuss algorithms to generate random numbers from
many distributions. Python already has a built-in module to generate
random numbers, and in subsequent chapters, we utilize it, yet in this
chapter, we discuss in detail how pseudo random number generators
work and their pitfalls.
• In chapter 7, we write about Monte Carlo simulations. This is a numer-
ical technique that utilizes random numbers to solve otherwise deter-
ministic problems. For example, in chapter 4, we talk about numerical
integration in one dimension. Those algorithms can be extended to
perform numerical integration in a few (two, three, sometimes four)
dimensions, but they fail for very large numbers of dimensions. That
is where Monte Carlo integration comes to our rescue, as it increasingly
becomes the integration method of choice as the number of variables
increases. We present applications of Monte Carlo simulations.
• In chapter 8, we discuss parallel algorithms. There are many paradigms
for parallel programming these days, and the tendency is toward
inhomogeneous architectures. Although we review many different
introduction 21
Acknowledgments
Many thanks to Alan Etkins, Brian Fox, Dan Bowker, Ethan Sudman,
Holly Monteith, Konstantinos Moutselos, Michael Gheith, Paula Mikrut,
Sean Neilan, and John Plamondon for reviewing different editions of this
book. We also thank all the students of our classes for their useful com-
ments and suggestions. Finally, we thank Wikipedia, from which we bor-
rowed a few ideas and examples.
2
Overview of the Python Language
Java/C++ Python
assignment a = b; a=b
comparison if (a == b) if a == b:
loops for(a = 0; a < n; a + +) for a in range(0, n):
block Braces {...} indentation
function f loat f (float a) { def f ( a):
function call f ( a) f ( a)
arrays/lists a [i ] a [i ]
member a.member a.member
nothing null / void∗ None
As in Java, variables that are primitive types (bool, int, float) are passed by
copy, but more complex types, unlike C++, are passed by reference. This
means when we pass an object to a function, in Python, we do not make
a copy of the object, we simply define an alternate name for referencing
the object in the function.
4 class int(object)
5 | int(x[, base]) -> integer
6 |
7 | Convert a string or number to an integer, if possible. A floating point
8 | argument will be truncated towards zero (this does not include a string
9 | representation of a floating point number!) When converting a string, use
10 | the optional base. It is an error to supply a base when converting a
11 | non-string. If the argument is outside the integer range a long object
12 | will be returned instead.
13 |
14 | Methods defined here:
15 |
16 | __abs__(...)
17 | x.__abs__() <==> abs(x)
18 ...
and because “1” is an integer, we get a description about the int class and
all its methods. Here the output has been truncated because it is very
long and detailed.
Similarly, we can obtain a list of object attributes (including methods) for
any object using the command dir. For example:
1 >>> dir(1)
2 ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__',
3 '__delattr__', '__div__', '__divmod__', '__doc__', '__float__',
4 '__floordiv__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__',
5 '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__',
6 '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__',
7 '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__',
8 '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__',
9 '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__',
10 '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__',
11 '__str__', '__sub__', '__truediv__', '__xor__']
other hand, do have a type. You can query a variable for the type of value
it contains:
1 >>> a = 3
2 >>> print type(a)
3 <type 'int'>
4 >>> a = 3.14
5 >>> print type(a)
6 <type 'float'>
7 >>> a = 'hello python'
8 >>> print type(a)
9 <type 'str'>
Python also includes, natively, data structures such as lists and dictionar-
ies.
There are two types representing integer numbers: int and long. The dif-
ference is that int corresponds to the microprocessor’s native bit length.
Typically, this is 32 bits and can hold signed integers in range [−231 , +231 ),
whereas the long type can hold almost any arbitrary integer. It is impor-
tant that Python automatically converts one into the other as necessary,
and you can mix and match the two types in computations. Here is an
example:
1 >>> a = 1024
2 >>> type(a)
3 <type 'int'>
4 >>> b = a**128
5 >>> print b
6 20815864389328798163850480654728171077230524494533409610638224700807216119346720
7 59602447888346464836968484322790856201558276713249664692981627981321135464152584
8 82590187784406915463666993231671009459188410953796224233873542950969577339250027
9 68876520583464697770622321657076833170056511209332449663781837603694136444406281
10 042053396870977465916057756101739472373801429441421111406337458176
11 >>> print type(b)
12 <type 'long'>
4 bits = [0]*nbits
5 for i in range(nbits):
6 n, bits[i] = divmod(n,2)
7 if n: raise OverflowError
8 return bits
The case n < 0 is called two’s complement and is defined as the value
obtained by subtracting the number from the largest power of 2 (232 for
32 bits). Just by looking at the most significant bit, one can determine the
sign of the binary number (1 for negative and 0 for zero or positive).
There are two ways to represent decimal numbers in Python: using the
native double precision (64 bits) representation, float, or using the decimal
module.
Most numerical problems are dealt with simply using float:
1 >>> pi = 3.141592653589793
2 >>> two_pi = 2.0 * pi
x = ±m2e (2.1)
Because the exponent is stored in a fixed number of bits (11 for a 64-bit
floating point number), exponents smaller than −1022 and larger than
1023 cannot be represented. An arithmetic operation that returns a num-
ber smaller than 2−1022 ' 10−308 cannot be represented and results in
an underflow error. An operation that returns a number larger than
21023 ' 10308 also cannot be represented and results in an overflow er-
ror.
Here is an example of overflow:
1 >>> a = 10.0**200
2 >>> a*a
3 inf
To add 262.0 to 3.0, the exponents must be the same. The exponent of the
lesser number is increased to the exponent of the greater number. In this
case, 3’s exponent must be increased by 7. Increasing the exponent by 7
means the mantissa must be shifted seven binary digits to the right:
1 0 10000111 00000110000000000000000
2 0 10000111 00000011000000000000000 (The implied ``1'' is also pushed seven
places to the right)
3 ------------------------------------
4 0 10000111 00001001000000000000000 which is the IEEE 754 format for 265.0
In the case of two numbers in which the exponent is greater than the
number of digits in the mantissa, the smaller number is shifted right off
the end. The effect is a zero added to the larger number.
In some cases, only some of the bits of the smaller number’s mantissa are
lost if a partial addition occurs.
This precision issue is always present but not always obvious. It may
consist of a small discrepancy between the true value and the computed
value. This difference may increase during the computation, in particular,
in iterative algorithms, and may be sizable in the result of a complex
algorithm.
Python also has a module for decimal floating point arithmetic that al-
lows decimal numbers to be represented exactly. The class Decimal incor-
porates a notion of significant places (unlike the hardware-based binary
floating point, the decimal module has a user-alterable precision):
1 >>> from decimal import Decimal, getcontext
2 >>> getcontext().prec = 28 # set precision
3 >>> Decimal(1) / Decimal(7)
4 Decimal('0.1428571428571428571428571429')
3 >>> a*a
4 Decimal('1.000000000000000000000000000E+600')
2.2.3 complex
Python has native support for complex numbers. The imaginary unit is
represented by the character j:
1 >>> c = 1+2j
2 >>> print c
3 (1+2j)
4 >>> print c.real
5 1.0
6 >>> print c.imag
7 2.0
8 >>> print abs(c)
9 2.2360679775
The real and imaginary parts of a complex number are stored as 64-bit
floating point numbers.
Normal arithmetic operations are supported. The cmath module contains
trigonometric and other functions for complex numbers. For example,
1 >>> phi = 1j
2 >>> import cmath
3 >>> print cmath.exp(phi)
4 (0.540302305868+0.841470984808j)
2.2.4 str
Python supports the use of two different types of strings: ASCII strings
and Unicode strings. ASCII strings are delimited by ’...’, "...", ”’...”’,
or """...""". Triple quotes delimit multiline strings. Unicode strings start
with a u, followed by the string containing Unicode characters. A Unicode
string can be converted into an ASCII string by choosing an encoding (e.g.,
UTF8):
1 >>> a = 'this is an ASCII string'
2 >>> b = u'This is a Unicode string'
3 >>> a = b.encode('utf8')
The final notation is more explicit and less error prone and is to be pre-
ferred.
Many Python objects, for example, numbers, can be serialized into strings
using str or repr. These two commands are very similar but produce
slightly different output. For example,
1 >>> for i in [3, 'hello']:
2 ... print str(i), repr(i)
3 3 3
4 hello 'hello'
For user-defined classes, str and repr can be defined and redefined using
the special operators __str__ and __repr__. These are briefly described
later in this chapter. For more information on the topic, refer to the official
Python documentation [8].
Another important characteristic of a Python string is that it is an iterable
object, similar to a list:
1 >>> for i in 'hello':
2 ... print i
3 h
4 e
5 l
6 l
7 o
The main methods of Python lists are append, insert, and delete. Other
useful methods include count, index, reverse, and sort:
1 >>> b = [1, 2, 3]
2 >>> print type(b)
3 <type 'list'>
4 >>> b.append(8)
5 >>> b.insert(2, 7) # insert 7 at index 2 (3rd element)
6 >>> del b[0]
7 >>> print b
8 [2, 7, 3, 8]
9 >>> print len(b)
10 4
11 >>> b.append(3)
12 >>> b.reverse()
13 >>> print b," 3 appears ", b.count(3), " times. The number 7 appears at index "
, b.index(7)
14 [3, 8, 3, 7, 2] 3 appears 2 times. The number 7 appears at index 3
and concatenated/joined:
1 >>> a = [2, 7, 3, 8]
2 >>> a = [2, 3]
3 >>> b = [5, 6]
4 >>> print a + b
5 [2, 3, 5, 6]
This code clearly processes a list of items, selects and modifies a subset
of the input list, and creates a new result list. This code can be entirely
replaced with the following list comprehension:
1 >>> a = [1,2,3,4,5]
2 >>> b = [x * 3 for x in a if x % 2 == 0]
3 >>> print b
4 [6, 12]
An array object can be used in the same way as a list, but its elements
must all be of the same type, specified by the first argument of the con-
structor (“d” for double, “l” for signed long, “f” for float, and “c” for
character). For a complete list of available options, refer to the official
Python documentation.
Using “array” over “list” can be faster, but more important, the “array”
storage is more compact for large arrays.
2.2.6 tuple
A tuple is similar to a list, but its size and elements are immutable. If a
tuple element is an object, the object itself is mutable, but the reference to
the object is fixed. A tuple is defined by elements separated by a comma
34 annotated algorithms in python
The round brackets are required for a tuple of zero elements such as
1 >>> a = () # this is an empty tuple
A trailing comma is required for a one-element tuple but not for two or
more elements:
1 >>> a = (1) # not a tuple
2 >>> a = (1,) # this is a tuple of one element
3 >>> b = (1,2) # this is a tuple of two elements
Tuples are very useful for efficient packing of objects because of their
immutability. The brackets are often optional. You may easily get each
element of a tuple by assigning multiple variables to a tuple at one time:
1 >>> a = (2, 3, 'hello')
2 >>> (x, y, z) = a
3 >>> print x
4 2
5 >>> print z
overview of the python language 35
6 hello
7 >>> a = 'alpha', 35, 'sigma' # notice the rounded brackets are optional
8 >>> p, r, q = a
9 print r
10 35
2.2.7 dict
You will notice that the format to define a dictionary is the same as the
JavaScript Object Notation [JSON]. Dictionaries may be nested:
1 >>> a = {'x':3, 'y':54, 'z':{'a':1,'b':2}}
2 >>> print a['z']
3 {'a': 1, 'b': 2}
4 >>> print a['z']['a']
5 1
Keys can be of any hashable type (int, string, or any object whose class
implements the __hash__ method). Values can be of any type. Different
keys and values in the same dictionary do not have to be of the same type.
If the keys are alphanumeric characters, a dictionary can also be declared
with the alternative syntax:
1 >>> a = dict(k='v', h2=3)
2 >>> print a['k']
3 v
4 >>> print a
5 {'h2': 3, 'k': 'v'}
The items method produces a list of tuples, each containing a key and its
associated value.
Dictionary elements and list elements can be deleted with the command
del:
1 >>> a = [1, 2, 3]
2 >>> del a[1]
3 >>> print a
4 [1, 3]
5 >>> a = dict(k='v', h2=3)
6 >>> del a['h2']
7 >>> print a
8 {'k': 'v'}
Internally, Python uses the hash operator to convert objects into integers
and uses that integer to determine where to store the value. Using a key
that is not hashable will cause an un-hashable type error:
1 >>> hash("hello world")
2 -1500746465
3 >>> k = [1,2,3]
4 >>> a = {k:'4'}
5 Traceback (most recent call last):
6 File "<stdin>", line 1, in <module>
7 TypeError: unhashable type: 'list'
2.2.8 set
2 >>> print s
3 set([1,2,3,4,5])
4 >>> s = set((1,2,3,4,5))
5 >>> print s
6 set([1,2,3,4,5])
7 >>> s = set(i for i in range(1,6))
8 >>> print s
9 set([1, 2, 3, 4, 5])
Sets are not ordered lists therefore appending to the end is not applicable.
Instead of append, add elements to a set using the add method:
1 >>> s = set()
2 >>> s.add(2)
3 >>> s.add(3)
4 >>> s.add(2)
5 >>> print s
6 set([2, 3])
Notice that the same element cannot be added twice (2 in the example).
There is no exception or error thrown when trying to add the same ele-
ment more than once.
Because sets are not ordered, the order in which you add items is not
necessarily the order in which they will be returned:
1 >>> s = set([6,'b','beta',-3.4,'a',3,5.3])
2 >>> print (s)
3 set(['a', 3, 6, 5.3, 'beta', 'b', -3.4])
The set object supports normal set operations like union, intersection, and
difference:
1 >>> a = set([1,2,3])
2 >>> b = set([2,3,4])
3 >>> c = set([2,3])
4 >>> print a.union(b)
5 set([1, 2, 3, 4])
6 >>> print a.intersection(b)
7 set([2, 3])
8 >>> print a.difference(b)
9 set([1])
10 >>> if len(c) == len(a.intersection(c)):
11 ... print "c is a subset of a"
12 ... else:
13 ... print "c is not a subset of a"
14 ...
15 c is a subset of a
38 annotated algorithms in python
2.3.1 for...in
In the preceding example, you will notice that the loop index “i” takes on
the values of each element in the list [0, 1, ’hello’, ’python’] sequentially.
The Python range keyword creates a list of integers automatically that may
be used in a “for” loop without manually creating a long list of numbers.
1 >>> a = range(0,5)
2 >>> print a
3 [0, 1, 2, 3, 4]
overview of the python language 39
4 >>> for i in a:
5 ... print i
6 0
7 1
8 2
9 3
10 4
The parameters for range(a,b,c) are as follows: the first parameter is the
starting value of the list. The second parameter is the next value if the list
contains one more element. The third parameter is the increment value.
The keyword range can also be called with one parameter. It is matched
to “b” with the first parameter defaulting to 0 and the third to 1:
1 >>> print range(5)
2 [0, 1, 2, 3, 4]
3 >>> print range(53,57)
4 [53,54,55,56]
5 >>> print range(102,200,10)
6 [102, 112, 122, 132, 142, 152, 162, 172, 182, 192]
7 >>> print range(0,-10,-1)
8 [0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
The keyword range is very convenient for creating a list of numbers; how-
ever, as the list grows in length, the memory required to store the list
also grows. A more efficient option is to use the keyword xrange, which
generates an iterable range instead of the entire list of elements.
This is equivalent to the C/C++/C#/Java syntax:
1 for(int i=0; i<4; i=i+1) { ... }
4 1
You can jump to the next loop iteration without executing the entire code
block with continue:
1 >>> for i in [1, 2, 3]:
2 ... print i
3 ... continue
4 ... print 'test'
5 1
6 2
7 3
Python also supports list comprehensions, and you can build lists using
the following syntax:
1 >>> a = [i*i for i in [0, 1, 2, 3]:
2 >>> print a
3 [0, 1, 4, 9]
Sometimes you may need a counter to “count” the elements of a list while
looping:
1 >>> a = [e*(i+1) for (i,e) in enumerate(['a','b','c','d'])]
2 >>> print a
3 ['a', 'bb', 'ccc', 'dddd']
2.3.2 while
2.3.3 if...elif...else
The elif means “else if.” Both elif and else clauses are optional. There
can be more than one elif but only one else statement. Complex condi-
tions can be created using the not, and, and or logical operators:
1 >>> for i in range(3):
2 ... if i == 0 or (i == 1 and i + 1 == 2):
3 ... print '0 or 1'
2.3.4 try...except...else...finally
The finally clause is guaranteed to be executed while the except and else
are not. In the following example, the function returns within a try block.
This is bad practice, but it shows that the finally will execute regardless
of the reason the try block is exited:
1 >>> def f(x):
2 ... try:
3 ... r = x*x
4 ... return r # bad practice
5 ... except:
6 ... print "exception occurred %s" % e
7 ... else:
8 ... print "nothing else to do"
9 ... finally:
10 ... print "Finally we get here"
11 ...
12 >>> y = f(3)
13 Finally we get here
14 >>> print "result is ", y
15 result is 9
For every try, you must have either an except or a finally, while the else
is optional.
Here is a list of built-in Python exceptions:
1 BaseException
2 +-- SystemExit
3 +-- KeyboardInterrupt
4 +-- Exception
5 +-- GeneratorExit
6 +-- StopIteration
7 +-- StandardError
8 | +-- ArithmeticError
9 | | +-- FloatingPointError
10 | | +-- OverflowError
11 | | +-- ZeroDivisionError
12 | +-- AssertionError
13 | +-- AttributeError
14 | +-- EnvironmentError
15 | | +-- IOError
16 | | +-- OSError
17 | | +-- WindowsError (Windows)
18 | | +-- VMSError (VMS)
overview of the python language 43
19 | +-- EOFError
20 | +-- ImportError
21 | +-- LookupError
22 | | +-- IndexError
23 | | +-- KeyError
24 | +-- MemoryError
25 | +-- NameError
26 | | +-- UnboundLocalError
27 | +-- ReferenceError
28 | +-- RuntimeError
29 | | +-- NotImplementedError
30 | +-- SyntaxError
31 | | +-- IndentationError
32 | | +-- TabError
33 | +-- SystemError
34 | +-- TypeError
35 | +-- ValueError
36 | | +-- UnicodeError
37 | | +-- UnicodeDecodeError
38 | | +-- UnicodeEncodeError
39 | | +-- UnicodeTranslateError
40 +-- Warning
41 +-- DeprecationWarning
42 +-- PendingDeprecationWarning
43 +-- RuntimeWarning
44 +-- SyntaxWarning
45 +-- UserWarning
46 +-- FutureWarning
47 +-- ImportWarning
48 +-- UnicodeWarning
2.3.5 def...return
return value(s). In this example, a function f is defined that can take two
arguments.
Functions are the first code syntax feature described in this chapter to
introduce the concept of scope, or namespace. In the preceding example,
the identifiers a and b are undefined outside of the scope of function f:
1 >>> def f(a):
2 ... return a + 1
3 >>> print f(1)
4 2
5 >>> print a
6 Traceback (most recent call last):
7 File "<pyshell#22>", line 1, in <module>
8 print a
9 NameError: name 'a' is not defined
Identifiers defined outside of the function scope are accessible within the
function; observe how the identifier a is handled in the following code:
1 >>> a = 1
2 >>> def f(b):
3 ... return a + b
4 >>> print f(1)
5 2
6 >>> a = 2
7 >>> print f(1) # new value of a is used
8 3
9 >>> a = 1 # reset a
10 >>> def g(b):
11 ... a = 2 # creates a new local a
12 ... return a + b
13 >>> print g(2)
14 4
15 >>> print a # global a is unchanged
16 1
If a is modified, subsequent function calls will use the new value of the
global a because the function definition binds the storage location of the
identifier a, not the value of a itself at the time of function declaration;
however, if a is assigned-to inside function g, the global a is unaffected be-
cause the new local a hides the global value. The external-scope reference
can be used in the creation of closures:
1 >>> def f(x):
2 ... def g(y):
3 ... return x * y
overview of the python language 45
4 ... return g
5 >>> doubler = f(2) # doubler is a new function
6 >>> tripler = f(3) # tripler is a new function
7 >>> quadrupler = f(4) # quadrupler is a new function
8 >>> print doubler(5)
9 10
10 >>> print tripler(5)
11 15
12 >>> print quadrupler(5)
13 20
Function f creates new functions; note that the scope of the name g is
entirely internal to f. Closures are extremely powerful.
Function arguments can have default values and can return multiple re-
sults as a tuple (notice the parentheses are optional and are omitted in the
example):
1 >>> def f(a, b=2):
2 ... return a + b, a - b
3 >>> x, y = f(5)
4 >>> print x
5 7
6 >>> print y
7 3
Here the first two parameters (1 and 2) are matched with the parameters
a and b, while the tuple 5, 6 is placed into extra and the remaining items
(which are in a dictionary format) are placed into extraNamed.
In the opposite case, a list or tuple can be passed to a function that re-
quires individual positional arguments by unpacking them:
1 >>> def f(a, b):
2 ... return a + b
3 >>> c = (1, 2)
4 >>> print f(*c)
5 3
2.3.6 lambda
The only benefit of lambda is brevity; however, brevity can be very conve-
nient in certain situations. Consider a function called map that applies a
function to all items in a list, creating a new list:
1 >>> a = [1, 7, 2, 5, 4, 8]
2 >>> map(lambda x: x + 2, a)
3 [3, 9, 4, 7, 6, 10]
This code would have doubled in size had def been used instead of lambda.
The main drawback of lambda is that (in the Python implementation) the
syntax allows only for a single expression; however, for longer functions,
def can be used, and the extra cost of providing a function name decreases
as the length of the function grows.
Just like def, lambda can be used to curry functions: new functions can be
created by wrapping existing functions such that the new function carries
a different set of arguments:
1 >>> def f(a, b): return a + b
2 >>> g = lambda a: f(a, 3)
3 >>> g(2)
4 5
2.4 Classes
3 >>> myinstance.myvariable = 3
4 >>> print myinstance.myvariable
5 3
Functions declared inside the class are methods. Some methods have
special reserved names. For example, __init__ is the constructor. In the
example, we created a class to store the real and the imag part of a complex
number. The constructor takes these two variables and stores them into
self (not a keyword but a variable that plays the same role as this in Java
and (*this) in C++; this syntax is necessary to avoid ambiguity when
declaring nested classes, such as a class that is local to a method inside
another class, something Python allows but Java and C++ do not).
The self variable is defined by the first argument of each method. They
all must have it, but they can use another variable name. Even if we use
another name, the first argument of a method always refers to the object
calling the method. It plays the same role as the this keyword in Java and
C++.
Method __add__ is also a special method (all special methods start and
overview of the python language 49
• __getitem__
• __setitem__
They can be used, for example, to create a container object that acts like a
list:
1 >>> class MyList(object):
2 >>> def __init__(self, *a): self.a = list(a)
3 >>> def __len__(self): return len(self.a)
4 >>> def __getitem__(self, key): return self.a[key]
5 >>> def __setitem__(self, key, value): self.a[key] = value
6 >>> b = MyList(3, 4, 5)
7 >>> print b[1]
8 4
9 >>> b.a[1] = 7
10 >>> print b.a
11 [3, 7, 5]
the use of these operators, we refer the reader to the chapter on linear
50 annotated algorithms in python
Here we assume t and t0 are datetime.date objects that store a date. The
date constructor takes the year, the month, and the day separated by a
comma. The expression (t0-t).days computes the distance in days be-
tween t0 and t.
Similarly, we can implement a Cash Flow class to store a list of transactions,
overview of the python language 51
with the add method to add a new transaction to the list. The present value
of a cash flow is the sum of the present values of each transaction:
1 class CashFlow(object):
2 def __init__(self):
3 self.transactions = []
4 def add(self,transaction):
5 self.transactions.append(transaction)
6 def pv(self, t0, r=r_free):
7 return sum(x.pv(t0,r) for x in self.transactions)
8 def __str__(self):
9 return '\n'.join(str(x) for x in self.transactions)
What is the net present value at the beginning of 2012 for a bond that
pays $1000 the 20th of each month for the following 24 months (assuming
a fixed interest rate of 5% per year)?
1 >>> bond = CashFlow()
2 >>> today = date(2012,1,1)
3 >>> for year in range(2012,2014):
4 ... for month in range(1,13):
5 ... coupon = FinancialTransaction(date(year,month,20), 1000)
6 ... bond.add(coupon)
7 >>> print round(bond.pv(today,r=0.05/365),0)
8 22826
Alternatively, you can read in binary mode with “rb,” write in binary
mode with “wb,” and open the file in append mode “a” using standard
C notation.
The read command takes an optional argument, which is the number of
52 annotated algorithms in python
bytes. You can also jump to any location in a file using seek :
You can read back from the file with read:
1 >>> print file.seek(6)
2 >>> print file.read()
3 world
The real power of Python is in its library modules. They provide a large
and consistent set of application programming interfaces (APIs) to many
system libraries (often in a way independent of the operating system).
For example, if you need to use a random number generator, you can do
the following:
1 >>> import random
2 >>> print random.randint(0, 9)
3 5
This prints a random integer in the range of (0,9], 5 in the example. The
function randint is defined in the module random. It is also possible to
import an object from a module into the current namespace:
1 >>> from random import randint
2 >>> print randint(0, 9)
In the rest of this book, we will mainly use objects defined in modules
math, cmath, os, sys, datetime, time, and cPickle. We will also use the random
module, but we will describe it in a later chapter.
In the following subsections, we consider those modules that are most
overview of the python language 53
useful.
Here is a sampling of some of the methods available in the math and cmath
packages:
• math.isinf(x) returns true if the floating point number x is positive or
negative infinity
• math.isnan(x) returns true if the floating point number x is NaN; see
Python documentation or IEEE 754 standards for more information
• math.exp(x) returns e**x
• math.log(x[, base] returns the logarithm of x to the optional base; if
base is not supplied, e is assumed
2.6.2 os
Some of the os functions, such as chdir, are not thread safe, for example,
they should not be used in a multithreaded environment.
os.path.joinis very useful; it allows the concatenation of paths in an OS-
independent way:
1 >>> import os
2 >>> a = os.path.join('path', 'sub_path')
3 >>> print a
4 path/sub_path
2.6.3 sys
The sys module contains many variables and functions, but used the most
is sys.path. It contains a list of paths where Python searches for modules.
When we try to import a module, Python searches the folders listed in
sys.path. If you install additional modules in some location and want
Python to find them, you need to append the path to that location to
sys.path:
2.6.4 datetime
Occasionally you may need to time stamp data based on the UTC time as
opposed to local time. In this case, you can use the following function:
1 >>> import datetime
2 >>> print datetime.datetime.utcnow()
3 2008-07-04 14:03:90
1 >>> s = '2011-12-31'
2 >>> a = datetime.datetime.strptime(s,'%Y-%m-%d') #modified
3 >>> print a.year, a.day, a.month
4 2011 31 12 #modified
Notice that “%Y” matches the four-digit year, “%m” matches the month as
a number (1–12), “%d” matches the day (1–31), “%H” matches the hour,
“%M” matches the minute, and “%S” matches the seconds. Check the
Python documentation for more options.
2.6.5 time
The time module differs from date and datetime because it represents time
as seconds from the epoch (beginning of 1970):
1 >>> import time
2 >>> t = time.time()
3 1215138737.571
1 class YStock:
2 """
3 Class that downloads and stores data from Yahoo Finance
4 Examples:
5 >>> google = YStock('GOOG')
6 >>> current = google.current()
7 >>> price = current['price']
8 >>> market_cap = current['market_cap']
9 >>> h = google.historical()
10 >>> last_adjusted_close = h[-1]['adjusted_close']
11 >>> last_log_return = h[-1]['log_return']
12 previous version of this code user Yahoo for historical data
13 but Yahoo changed API and blocked them, moving to Google finance.
14 """
15 URL_CURRENT = 'http://finance.yahoo.com/d/quotes.csv?s=%(symbol)s&f=%(
columns)s'
16 URL_HISTORICAL = 'https://www.google.com/finance/historical?output=csv&q=%(
symbol)s'
17
18 def __init__(self,symbol):
19 self.symbol = symbol.upper()
20
21 def current(self):
22 import urllib
23 FIELDS = (('price', 'l1'),
24 ('change', 'c1'),
25 ('volume', 'v'),
26 ('average_daily_volume', 'a2'),
27 ('stock_exchange', 'x'),
28 ('market_cap', 'j1'),
29 ('book_value', 'b4'),
30 ('ebitda', 'j4'),
31 ('dividend_per_share', 'd'),
32 ('dividend_yield', 'y'),
33 ('earnings_per_share', 'e'),
34 ('52_week_high', 'k'),
35 ('52_week_low', 'j'),
36 ('50_days_moving_average', 'm3'),
37 ('200_days_moving_average', 'm4'),
38 ('price_earnings_ratio', 'r'),
39 ('price_earnings_growth_ratio', 'r5'),
40 ('price_sales_ratio', 'p5'),
41 ('price_book_ratio', 'p6'),
42 ('short_ratio', 's7'))
43 columns = ''.join([row[1] for row in FIELDS])
44 url = self.URL_CURRENT % dict(symbol=self.symbol, columns=columns)
45 raw_data = urllib.urlopen(url).read().strip().strip('"').split(',')
overview of the python language 57
46 current = dict()
47 for i,row in enumerate(FIELDS):
48 try:
49 current[row[0]] = float(raw_data[i])
50 except:
51 current[row[0]] = raw_data[i]
52 return current
53
94 log_return = log_return))
95 return series
96
97 @staticmethod
98 def download(symbol='goog',what='adjusted_close',start=None,stop=None):
99 return [d[what] for d in YStock(symbol).historical(start, stop)]
Many web services return data in JSON format. JSON is slowly replacing
XML as a favorite protocol for data transfer on the web. It is lighter,
simpler to use, and more human readable. JSON can be thought of as
serialized JavaScript. the JSON data can be converted to a Python object
using a library called json:
1 >>> import json
2 >>> a = [1,2,3]
3 >>> b = json.dumps(a)
4 >>> print type(b)
5 <type 'str'>
6 >>> c = json.loads(b)
7 >>> a == c
8 True
The module json has loads and dumps methods which work very much as
cPickle’s methods, but they serialize the objects into a string using JSON
instead of the pickle protocol.
2.6.7 pickle
and now:
1 >>> import cPickle as pickle
2 >>> b = pickle.dumps(a)
3 >>> c = pickle.loads(b)
2.6.8 sqlite
The Python dictionary type is very useful, but it lacks persistence because
it is stored in RAM (it is lost if a program ends) and cannot be shared by
more than one process running concurrently. Moreover, it is not transac-
tion safe. This means that it is not possible to group operations together
so that they succeed or fail as one.
Think for example of using the dictionary to store a bank account. The
key is the account number and the value is a list of transactions. We
want the dictionary to be safely stored on file. We want it to be accessible
by multiple processes and applications. We want transaction safety: it
should not be possible for an application to fail during a money transfer,
resulting in the disappearance of money.
Python provides a module called shelve with the same interface as dict,
which is stored on disk instead of in RAM. One problem with this module
is that the file is not locked when accessed. If two processes try to access
it concurrently, the data becomes corrupted. This module also does not
provide transactional safety.
The proper alternative consists of using a database. There are two types
of databases: relational databases (which normally use SQL syntax) and
non-relational databases (often referred to as NoSQL). Key-value persis-
tent storage databases usually follow under the latter category. Relational
databases excel at storing structured data (in the form of tables), estab-
lishing relations between rows of those tables, and searches involving
multiple tables linked by references. NoSQL databases excel at storing
and retrieving schemaless data and replication of data (redundancy for
fail safety).
Python comes with an embedded SQL database called SQLite [9]. All data
in the database are stored in one single file. It supports the SQL query
60 annotated algorithms in python
7 class PersistentDictionary(object):
8 """
9 A sqlite based key,value storage.
10 The value can be any pickleable object.
11 Similar interface to Python dict
12 Supports the GLOB syntax in methods keys(),items(), __delitem__()
13
14 Usage Example:
15 >>> p = PersistentDictionary(path='test.sqlite')
16 >>> key = 'test/' + p.uuid()
17 >>> p[key] = {'a': 1, 'b': 2}
18 >>> print p[key]
19 {'a': 1, 'b': 2}
20 >>> print len(p.keys('test/*'))
21 1
22 >>> del p[key]
23 """
24
25 CREATE_TABLE = "CREATE TABLE persistence (pkey, pvalue)"
26 SELECT_KEYS = "SELECT pkey FROM persistence WHERE pkey GLOB ?"
27 SELECT_VALUE = "SELECT pvalue FROM persistence WHERE pkey GLOB ?"
28 INSERT_KEY_VALUE = "INSERT INTO persistence(pkey, pvalue) VALUES (?,?)"
29 UPDATE_KEY_VALUE = "UPDATE persistence SET pvalue = ? WHERE pkey = ?"
30 DELETE_KEY_VALUE = "DELETE FROM persistence WHERE pkey LIKE ?"
31 SELECT_KEY_VALUE = "SELECT pkey,pvalue FROM persistence WHERE pkey GLOB ?"
32
33 def __init__(self,
34 path='persistence.sqlite',
35 autocommit=True,
36 serializer=pickle):
37 self.path = path
38 self.autocommit = autocommit
39 self.serializer = serializer
40 create_table = not os.path.exists(path)
41 self.connection = sqlite3.connect(path)
42 self.connection.text_factory = str # do not use unicode
43 self.cursor = self.connection.cursor()
44 if create_table:
45 self.cursor.execute(self.CREATE_TABLE)
46 self.connection.commit()
47
48 def uuid(self):
49 return str(uuid.uuid4())
50
51 def keys(self,pattern='*'):
52 "returns a list of keys filtered by a pattern, * is the wildcard"
53 self.cursor.execute(self.SELECT_KEYS,(pattern,))
54 return [row[0] for row in self.cursor.fetchall()]
55
62 annotated algorithms in python
56 def __contains__(self,key):
57 return True if self.get(key)!=None else False
58
59 def __iter__(self):
60 for key in self:
61 yield key
62
63 def __setitem__(self,key, value):
64 if key in self:
65 if value is None:
66 del self[key]
67 else:
68 svalue = self.serializer.dumps(value)
69 self.cursor.execute(self.UPDATE_KEY_VALUE, (svalue, key))
70 else:
71 svalue = self.serializer.dumps(value)
72 self.cursor.execute(self.INSERT_KEY_VALUE, (key, svalue))
73 if self.autocommit: self.connection.commit()
74
75 def get(self,key):
76 self.cursor.execute(self.SELECT_VALUE, (key,))
77 row = self.cursor.fetchone()
78 return self.serializer.loads(row[0]) if row else None
79
95 def dumps(self,pattern='*'):
96 self.cursor.execute(self.SELECT_KEY_VALUE, (pattern,))
97 rows = self.cursor.fetchall()
98 return self.serializer.dumps(dict((row[0], self.serializer.loads(row[1])
)
99 for row in rows))
100
101 def loads(self, raw):
102 data = self.serializer.loads(raw)
103 for key, value in data.iteritems():
overview of the python language 63
We will now use our persistence storage to download 2011 financial data
from the SP100 stocks. This will allow us to later perform various analysis
tasks on these stocks:
6 ... 'JNJ', 'JPM', 'KFT', 'KO', 'LMT', 'LOW', 'MA', 'MCD', 'MDT', 'MET',
7 ... 'MMM', 'MO', 'MON', 'MRK', 'MS', 'MSFT', 'NKE', 'NOV', 'NSC', 'NWSA',
8 ... 'NYX', 'ORCL', 'OXY', 'PEP', 'PFE', 'PG', 'PM', 'QCOM', 'RF', 'RTN', 'S',
9 ... 'SLB', 'SLE', 'SO', 'T', 'TGT', 'TWX', 'TXN', 'UNH', 'UPS', 'USB',
10 ... 'UTX', 'VZ', 'WAG', 'WFC', 'WMB', 'WMT', 'WY', 'XOM', 'XRX']
11 >>> from datetime import date
12 >>> storage = PersistentDictionary('sp100.sqlite')
13 >>> for symbol in SP100:
14 ... key = symbol+'/2011'
15 ... if not key in storage:
16 ... storage[key] = YStock(symbol).historical(start=date(2011,1,1),
17 ... stop=date(2011,12,31))
Notice that while storing one item may be slower than storing an individ-
ual item in its own files, accessing the file system becomes progressively
slower as the number of files increases. Storing data in a database, long
term, is a winning strategy as it scales better and it is easier to search for
and extract data than it is with multiple flat files. Which type of database
is most appropriate depends on the type of data and the type of queries
we need to perform on the data.
2.6.9 numpy
The library numpy [2] is the Python library for efficient arrays, multidimen-
sional arrays, and their manipulation. numpy does not ship with Python
and must be installed separately.
On most platforms, this is as easy as typing in the Bash Shell:
1 pip install numpy
The class ndarray is more efficient than Python’s list. It takes much less
space because their elements have a fixed given type (e.g., float64). Other
popular available types are: int8, int16, int32, int64, uint8, uint16, uint32,
overview of the python language 65
2.6.10 matplotlib
Library matplotlib [10] is the de facto standard plotting library for Python.
It is one of the best and most versatile plotting libraries available. It has
two modes of operation. One mode of operation, called pylab, follows
a MATLAB-like syntax. The other mode follows a more Python-style
syntax. Here we use the latter.
You can install matplotlib with
1 pip install matplotlib
Now we define a helper that can plot lines, points with error bars, his-
tograms, and scatter plots on a single canvas:
10 class Canvas(object):
11
12 def __init__(self, title='', xlab='x', ylab='y', xrange=None, yrange=None):
13 self.fig = Figure()
14 self.fig.set_facecolor('white')
15 self.ax = self.fig.add_subplot(111)
16 self.ax.set_title(title)
17 self.ax.set_xlabel(xlab)
18 self.ax.set_ylabel(ylab)
19 if xrange:
20 self.ax.set_xlim(xrange)
21 if yrange:
22 self.ax.set_ylim(yrange)
23 self.legend = []
24
35 return s.getvalue()
36
37 def binary(self):
38 return self.save(None)
39
Figure 2.1: Example of a line plot. Adjusted closing price for the AAPL stock in 2011
(source: Yahoo! Finance).
Here is a scatter plot showing the return and variance of the S&P100
stocks:
overview of the python language 71
Figure 2.2: Example of a histogram plot. Distribution of daily arithmetic returns for the
AAPL stock in 2011 (source: Yahoo! Finance).
Notice the daily log returns have been multiplied by the number of days
in one year to obtain the annual return. Similarly, the daily volatility has
been multiplied by the square root of the number of days in one year to
obtain the annual volatility (risk). The reason for this procedure will be
explained in a later chapter.
The class Canvas is both in nlib.py and in the Python module canvas [11].
2.6.11 ocl
One of the best features of Python is that it can introspect itself, and this
can be used to just-in-time compile Python code into other languages. For
example, the Cython [12] and the ocl libraries allow decorating Python
code and converting it to C code. This makes the decorated functions
much faster. Cython is more powerful, and it supports a richer subset
of the Python syntax; ocl instead supports only a subset of the Python
syntax, which can be directly mapped into the C equivalent, but it is
easier to use. Moreover, ocl can convert Python code to JavaScript and to
OpenCL (this is discussed in our last chapter).
Here is a simple example that implements the factorial function:
overview of the python language 73
Figure 2.4: Example of a scatter plot. Risk-return plot for the S&P100 stocks in 2011
(source: Yahoo! Finance).
4 @c99.define(n='int')
5 def factorial(n):
6 output = 1
7 for k in xrange(1, n + 1):
8 output = output * k
9 return output
10 compiled = c99.compile()
11 print compiled.factorial(10)
12 assert compiled.factorial(10) == factorial(10)
language)
3. Amount of space used: here we mean the amount of extra space (sys-
tem resources) beyond the size of the input (independent of hardware
and programming language); we will say that an algorithm is in place
if the amount of extra space is constant with respect to input size
4. Simplicity, clarity: unfortunately, the simplest is not always the best in
other ways
5. Optimality: can we prove that it does as well as or better than any
other algorithm?
Here is an example:
1 >>> import random
2 >>> a=[random.randint(0,100) for k in xrange(20)]
3 >>> insertion_sort(a)
4 >>> print a
5 [6, 8, 9, 17, 30, 31, 45, 48, 49, 56, 56, 57, 65, 66, 75, 75, 82, 89, 90, 99]
One important question is, how long does this algorithm take to run?
theory of algorithms 77
How does its running time scale with the input size?
Given any algorithm, we can define three characteristic functions:
• Tworst (n): the running time in the worst case
• Tbest (n): the running time in the best case
• Taverage (n): the running time in the average case
The best case for an insertion sort is realized when the input is already
sorted. In this case, the inner for loop exits (breaks) always at the first
iteration, thus only the most outer loop is important, and this is propor-
tional to n; therefore Tbest (n) ∝ n. The worst case for the insertion sort is
realized when the input is sorted in reversed order. In this case, we can
prove, and we do so subsequently, that Tworst (n) ∝ n2 . For this algorithm,
a statistical analysis shows that the worst case is also the average case.
Often we cannot determine exactly the running time function, but we may
be able to set bounds to the running time.
We define the following sets:
• O( g(n)): the set of functions that grow no faster than g(n) when n → ∞
• Ω( g(n)): the set of functions that grow no slower than g(n) when
n→∞
• Θ( g(n)): the set of functions that grow at the same rate as g(n) when
n→∞
• o ( g(n)): the set of functions that grow slower than g(n) when n → ∞
• ω ( g(n)): the set of functions that grow faster than g(n) when n → ∞
We can rewrite the preceding definitions in a more formal way:
78 annotated algorithms in python
We still have not solved the problem of computing the best, average, and
worst running times.
The procedure for computing the worst and best running times is simi-
lar. It is simple in theory but difficult in practice because it requires an
understanding of the algorithm’s inner workings.
Consider the following algorithm, which finds the minimum of an array
or list A:
1 def find_minimum(A):
2 minimum = a[0]
3 for element in A:
4 if element < minimum:
5 minimum = element
6 return minimum
To compute the running time in the worst case, we assume that the max-
imum number of computations is performed. That happens when the if
statements are always True. To compute the best running time, we assume
that the minimum number of computations is performed. That happens
when the if statement is always False. Under each of the two scenarios, we
compute the running time by counting how many times the most nested
operation is performed.
In the preceding algorithm, the most nested operation is the evaluation of
the if statement, and that is executed for each element in A; for example,
assuming A has n elements, the if statement will be executed n times.
Therefore both the best and worst running times are proportional to n,
80 annotated algorithms in python
i <n
T (n) = ∑ 1 = n ∈ Θ(n) ⇒ loop0 ∈ Θ(n) (3.12)
i =0
i < n2
T (n) = ∑ 1 = n2 ∈ Θ(n2 ) ⇒ loop1 ∈ Θ(n2 ) (3.13)
i =0
theory of algorithms 81
Here the time for the inner loop is directly determined by n and does not
depend on the outer loop’s counter; therefore
This is not always the case. In the following code, the inner loop does
depend on the value of the outer loop:
1 def loop3(n):
2 for i in xrange(0,n):
3 for j in xrange(0,i):
4 print i,j
Therefore, when we write its running time in terms of a sum, care must
be taken that the upper limit of the inner sum is the upper limit of the
outer sum:
The appendix of this book provides examples of typical sums that come
up in these types of formulas and their solutions.
Here is one more example falling in the same category, although the inner
loop depends quadratically on the index of the outer loop:
Example: loop4
1 def loop4(n):
2 for i in xrange(0,n):
3 for j in xrange(0,i*i):
4 print i,j
2
i < n j <i i <n
1
T (n) = ∑ ∑ 1 = ∑ i2 = 6 n(n − 1)(2n − 1) ∈ Θ(n3 ) (3.16)
i =0 j =0 i =0
If the algorithm does not contain nested loops, then we need to compute
the running time of each loop and take the maximum:
Example: concatenate0
1 def concatenate0(n):
2 for i in xrange(n*n):
3 print i
4 for j in xrange(n*n*n):
5 print j
The merge sort [13] is another sorting algorithm. It is faster than the inser-
tion sort. It was invented by John von Neumann, the physicist credited
for inventing also modern computer architecture and game theory.
The merge sort works as follows.
If the input array has length 0 or 1, then it is already sorted, and the
algorithm does not perform any other operation.
If the input array has a length greater than 1, it divides the array into two
subsets of about half the size. Each subarray is sorted by applying the
merge sort recursively (it calls itself!). It then merges the two subarrays
back into one sorted array (this step is called merge).
Consider the following Python implementation of the merge sort:
1 def mergesort(A, p=0, r=None):
2 if r is None: r = len(A)
3 if p<r-1:
4 q = int((p+r)/2)
5 mergesort(A,p,q)
6 mergesort(A,q,r)
7 merge(A,p,q,r)
8
9 def merge(A,p,q,r):
10 B,i,j = [],p,q
11 while True:
12 if A[i]<=A[j]:
13 B.append(A[i])
14 i=i+1
15 else:
16 B.append(A[j])
17 j=j+1
18 if i==q:
19 while j<r:
84 annotated algorithms in python
20 B.append(A[j])
21 j=j+1
22 break
23 if j==r:
24 while i<q:
25 B.append(A[i])
26 i=i+1
27 break
28 A[p:r]=B
We cannot compute the running time of the mergesort function using the
same direct analysis, but we can assume its running time is T (n), where
n = r − p and n is the size of the input data to be sorted and also the dif-
ference between its two arguments p and r. We can express this running
time in terms of its components:
• It calls itself twice on half of the input data, 2T (n/2)
• It calls the merge once on the entire data, Θ(n)
We can summarize this into
so that we can now apply the master theorem to S. We obtain that S(k) ∈
Θ(k log k). Once we have the order of growth of S, we can determine the
order of growth of T (n) by substitution:
Note that there are recurrence relations that cannot be solved with any of
the methods described.
Here are some examples of recursive algorithms and their corresponding
recurrence relations with solution:
1 def factorial1(n):
2 if n==0:
3 return 1
4 else:
5 return n*factorial1(n-1)
1 def recursive0(n):
2 if n==0:
3 return 1
4 else:
5 loop3(n)
6 return n*n*recursive0(n-1)
1 def recursive1(n):
2 if n==0:
3 return 1
4 else:
5 loop3(n)
6 return n*recursive1(n-1)*recursive1(n-1)
1 def recursive2(n):
2 if n==0:
3 return 1
4 else:
5 a=factorial0(n)
6 return a*recursive2(n/2)*recursive1(n/2)
Notice that this algorithm does not appear to be recursive, but in practice,
it is because of the apparently infinite while loop. The content of the while
loop runs in constant time and then loops again on a problem of half of
the original size:
The idea of the binary_search is used in the bisection method for solving
nonlinear equations.
Do not confuse T notation with Θ notation:
The theta notation can also be used to describe the memory used by an
algorithm as a function of the input, Tmemory , as well as its running time.
88 annotated algorithms in python
Notice that this has the same running time as the original mergesort be-
cause, although it is not recursive, it performs the same operations:
times used to generate approximate answers, rather than using the more
complicated algorithms generally required to generate an exact answer.
Even for problems that can be solved exactly by a greedy algorithm, es-
tablishing the correctness of the method may be a nontrivial process.
For example, computing change for a purchase in a store is a good case of
a greedy algorithm. Assume you need to give change back for a purchase.
You would have three choices:
• Give the smallest denomination repeatedly until the correct amount is
returned
• Give a random denomination repeatedly until you reach the correct
amount. If a random choice exceeds the total, then pick another de-
nomination until the correct amount is returned
• Give the largest denomination less than the amount to return repeat-
edly until the correct amount is returned
In this case, the third choice is the correct one.
Other types of algorithms do not fit into any of the preceding categories.
One is, for example, backtracking. Backtracking is not covered in this
course.
3.3.1 Memoization
One case of a top-down approach that is very general and falls under the
umbrella of dynamic programming is called memoization. Memoization
consists of allowing users to write algorithms using a naive divide-and-
conquer approach, but functions that may be called more than once are
modified so that their output is cached, and if they are called again with
the same initial state, instead of the algorithm running again, the output
is retrieved from the cache and returned without any computations.
Consider, for example, Fibonacci numbers:
theory of algorithms 91
Fib(0) = 0 (3.50)
Fib(1) = 1 (3.51)
Fib(n) = Fib(n − 1) + Fib(n − 2) for n > 1 (3.52)
1 class memoize_persistent(object):
2 STORAGE = 'memoize.sqlite'
3 def __init__ (self, f):
4 self.f = f
5 self.storage = PersistentDictionary(memoize_persistent.STORAGE)
6 def __call__ (self, *args, **kwargs):
7 key = str((self.f.__name__, args, kwargs))
8 if key in self.storage:
9 value = self.storage[key]
10 else:
11 value = self.f(*args, **kwargs)
12 self.storage[key] = value
13 return value
We can use it as we did before, but we can now start and stop the program
or run concurrent parallel programs, and as long as they have access to
the “memoize.sqlite” file, they will share the cache.
theory of algorithms 93
Since f1 is Θ(n) and f2 is Θ(n2 ), we may be led to conclude that the latter
is slower. It may very well be that g1 is 106 smaller than g2 and therefore
T f 1 (n) = c1 n, T f 2 (n) = c2 n2 , but if c1 = 106 c2 , then T f 1 (n) > T f 2 (n) when
n < 106 .
To time functions in Python, we can use this simple algorithm:
1 def timef(f, ns=1000, dt = 60):
2 import time
3 t = t0 = time.time()
4 for k in xrange(1,ns):
5 f()
6 t = time.time()
7 if t-t0>dt: break
8 return (t-t0)/k
This function calls and averages the running time of f() for the minimum
between ns=1000 iterations and dt=60 seconds.
It is now easy, for example, to time the fib function without memoize,
1 >>> def fib(n):
2 ... return n if n<2 else fib(n-1)+fib(n-2)
3 >>> for k in range(15,20):
4 ... print k,timef(lambda:fib(k))
5 15 0.000315684575338
6 16 0.000576375363706
7 17 0.000936052104732
8 18 0.00135168084153
9 19 0.00217730337912
3.5.1 Arrays
3.5.2 List
A list is a data structure in which data are not stored contiguously, and
each element has knowledge of the location of the next element (and per-
haps of the previous element, in a doubly linked list). This means that
accessing any element for (read and write) requires finding the element
and therefore looping. In the worst case, the time to find an element is
proportional to the size of the list. Once an element has been found, any
operation on the element, including read, write, delete, and insert, before
or after can be done in constant time.
Lists are the appropriate choice when the number of elements can vary
theory of algorithms 95
often and when their elements are usually accessed sequentially via iter-
ations.
In Python, what is called a list is actually an array of pointers to the
elements.
3.5.3 Stack
3.5.4 Queue
A queue data structure is similar to a stack but, whereas the stack returns
the most recent item added, a queue returns the oldest item in the list.
This is commonly called first-in, first-out, or FIFO. To use Python lists to
implement a queue, insert the element to add in the first position of the
list as follows:
96 annotated algorithms in python
1 >>> que = []
2 >>> que.insert(0,"One")
3 >>> que.insert(0,"Two")
4 >>> print que.pop()
5 One
6 >>> que.insert(0,"Three")
7 >>> print que.pop()
8 Two
9 >>> print que.pop()
10 Three
3.5.5 Sorting
In the previous sections, we have seen the insertion sort and the merge sort.
Here we consider, as examples, other sorting algorithms: the quicksort [13],
the randomized quicksort, and the counting sort:
1 def quicksort(A,p=0,r=-1):
2 if r is -1:
3 r=len(A)
4 if p<r-1:
5 q=partition(A,p,r)
6 quicksort(A,p,q)
7 quicksort(A,q+1,r)
8
theory of algorithms 97
9 def partition(A,i,j):
10 x=A[i]
11 h=i
12 for k in xrange(i+1,j):
13 if A[k]<x:
14 h=h+1
15 A[h],A[k] = A[k],A[h]
16 A[h],A[i] = A[i],A[h]
17 return h
The quicksort can also be randomized by picking the pivot, A[r], at ran-
dom:
1 def quicksort(A,p=0,r=-1):
2 if r is -1:
3 r=len(A)
4 if p<r-1:
5 q = random.randint(p,r-1)
6 A[p], A[q] = A[q], A[p]
7 q=partition(A,p,r)
8 quicksort(A,p,q)
9 quicksort(A,q+1,r)
In this case, the best and the worst running times do not change, but the
average improves when the input is already almost sorted.
The counting sort algorithm is special because it only works for arrays of
positive integers. This extra requirement allows it to run faster than other
sorting algorithms, under some conditions. In fact, this algorithm is linear
in the range span by the elements of the input array.
Here is a possible implementation:
1 def countingsort(A):
2 if min(A)<0:
3 raise '_counting_sort List Unbound'
4 i, n, k = 0, len(A), max(A)+1
5 C = [0]*k
98 annotated algorithms in python
6 for j in xrange(n):
7 C[A[j]] = C[A[j]]+1
8 for j in xrange(k):
9 while C[j]>0:
10 (A[i], C[j], i) = (j, C[j]-1, i+1)
Notice that here we have also computed Tmemory , for example, the order of
growth of memory (not of time) as a function of the input size. In fact, this
algorithm differs from the previous ones because it requires a temporary
array C.
Figure 3.1: Example of a heap data structure. The number represents not the data in the
heap but the numbering of the nodes.
theory of algorithms 99
It starts from the top node, called the root. Each node has zero, one, or
two children. It is called complete because nodes have been added from
top to bottom and left to right, filling available slots. We can think of each
level of the tree as a generation, where the older generation consists of one
node, the next generation of two, the next of four, and so on. We can also
number nodes from top to bottom and left to right, as in the image. This
allows us to map the elements of a complete binary tree into the elements
of an array.
We can implement a complete binary tree using a list, and the child–
parent relations are given by the following formulas:
1 def heap_parent(i):
2 return int((i-1)/2)
3
4 def heap_left_child(i):
5 return 2*i+1
6
7 def heap_right_child(i):
8 return 2*i+2
We can store data (e.g., numbers) in the nodes (or in the corresponding
array). If the data are stored in such a way that the value at one node is
always greater or equal than the value at its children, the array is called a
heap and also a priority queue.
First of all, we need an algorithm to convert a list into a heap:
1 def heapify(A):
2 for i in xrange(int(len(A)/2)-1,-1,-1):
3 heapify_one(A,i)
4
5 def heapify_one(A,i,heapsize=None):
6 if heapsize is None:
7 heapsize = len(A)
8 left = 2*i+1
9 right = 2*i+2
10 if left<heapsize and A[left]>A[i]:
11 largest = left
12 else:
13 largest = i
14 if right<heapsize and A[right]>A[largest]:
15 largest = right
16 if largest!=i:
17 (A[i], A[largest]) = (A[largest], A[i])
100 annotated algorithms in python
18 heapify_one(A,largest,heapsize)
Now we can call build_heap on any array or list and turn it into a heap.
Because the first element is by definition the smallest, we can use the heap
to sort numbers in three steps:
• We turn the array into a heap
• We extract the largest element
• We apply recursion by sorting the remaining elements
Instead of using the preceding divide-and-conquer approach, it is better
to use a dynamic programming approach. When we extract the largest
element, we swap it with the last element of the array and make the heap
one element shorter. The new, shorter heap does not need a full build_heap
step because the only element out of order is the root node. We can fix
this by a single call to heapify.
This is a possible implementation for the heapsort [15]:
1 def heapsort(A):
2 heapify(A)
3 n = len(A)
4 for i in xrange(n-1,0,-1):
5 (A[0],A[i]) = (A[i],A[0])
6 heapify_one(A,0,i)
In the average and worst cases, it runs as fast as the quicksort, but in the
best case, it is linear:
10 def heap_push(A,value):
11 A.append(value)
12 i = len(A)-1
13 while i>0:
14 j = heap_parent(i)
15 if A[j]<A[i]:
16 (A[i],A[j],i) = (A[j],A[i],j)
17 else:
18 break
The running times for heap_pop and heap_push are the same:
Here is an example:
1 >>> a = [6,2,7,9,3]
2 >>> heap = []
3 >>> for element in a: heap_push(heap,element)
4 >>> while heap: print heap_pop(heap)
5 9
6 7
7 6
8 3
9 2
mum):
1 >>> from heapq import heappop, heappush
2 >>> a = [6,2,7,9,3]
3 >>> heap = []
4 >>> for element in a: heappush(heap,element)
5 >>> while heap: print heappop(heap)
6 9
7 7
8 6
9 3
10 2
A binary tree is a tree in which each node has at most two children (left
and right). A binary tree is called a binary search tree if the value of a node
is always greater than or equal to the value of its left child and less than
or equal to the value of its right child.
A binary search tree is a kind of storage that can efficiently be used for
searching if a particular value is in the storage. In fact, if the value for
which we are looking is less than the value of the root node, we only have
to search the left branch of the tree, and if the value is greater, we only
have to search the right branch. Using divide-and-conquer, searching each
branch of the tree is even simpler than searching the entire tree because it
is also a tree, but smaller.
This means that we can search simply by traversing the tree from top to
bottom along some path down the tree. We choose the path by moving
down and turning left or right at each node, until we find the element for
which we are looking or we find the end of the tree. We can search T (d),
where d is the depth of the tree. We will see later that it is possible to
build binary trees where d = log n.
To implement it, we need to have a class to represent a binary tree:
1 class BinarySearchTree(object):
2 def __init__(self):
3 self.left = self.right = None
theory of algorithms 103
10 bbb
11 >>> print root.max()
12 8 ccc
Until now, we have considered binary trees (each node has two children
and stores one value). We can generalize this to k trees, for which each
node has k children and stores more than one value.
B-trees are a type of k-tree optimized to read and write large blocks of
data. They are normally used to implement database indices and are
designed to minimize the amount of data to move when the tree is rebal-
anced.
A graph G is a set of vertices V and a set of links (also called edges) con-
necting those vertices E. Each link connects one vertex to another.
theory of algorithms 105
Vertices are stored in a list or array and so are links. Each link is a tuple
containing the ID of the source vertex, the ID of the target vertex, and
perhaps optional parameters. Optional parameters are discussed later, but
for now, they may include link details such as length, speed, reliability, or
billing rate.
travel strategy to make sure we visit every city reachable by roads, once
and only once.
The algorithm begins at one vertex, the origin, and expands out, eventu-
ally visiting each node in the graph that is somehow connected to the ori-
gin vertex. Its main feature is that it explores the neighbors of the current
vertex before moving on to explore remote vertices and their neighbors.
It visits other vertices in the same order in which they are discovered.
The algorithm starts by building a table of neighbors so that for each
vertex, it knows which other vertices it is connected to. It then maintains
two lists, a list of blacknodes (defined as vertices that have been visited)
and graynodes (defined as vertices that have been discovered because the
algorithm has visited its neighbor). It returns a list of blacknodes in the
order in which they have been visited.
Here is the algorithm:
1 def breadth_first_search(graph,start):
2 vertices, links = graph
3 blacknodes = []
4 graynodes = [start]
5 neighbors = [[] for vertex in vertices]
6 for link in links:
7 neighbors[link[0]].append(link[1])
8 while graynodes:
9 current = graynodes.pop()
10 for neighbor in neighbors[current]:
11 if not neighbor in blacknodes+graynodes:
12 graynodes.insert(0,neighbor)
13 blacknodes.append(current)
14 return blacknodes
The depth-first search [17] (DFS) algorithm is very similar to the BFS, but
it takes the opposite approach and explores as far as possible along each
branch before backtracking.
In the cities analogy, if the BFS was exploring cities in the neighborhood
before moving farther away, the DFS does the opposite and brings us first
to distant places before visiting other nearby cities.
Here is a possible implementation:
1 def depth_first_search(graph,start):
2 vertices, links = graph
3 blacknodes = []
4 graynodes = [start]
5 neighbors = [[] for vertex in vertices]
6 for link in links:
7 neighbors[link[0]].append(link[1])
8 while graynodes:
9 current = graynodes.pop()
10 for neighbor in neighbors[current]:
11 if not neighbor in blacknodes+graynodes:
12 graynodes.append(neighbor)
13 blacknodes.append(current)
14 return blacknodes
Notice that the BFS and the DFS differ for a single line, which determines
whether graynodes is a queue (BSF) or a stack (DFS). When graynodes is
a queue, the first vertex discovered is the first visited. When it is a stack,
the last vertex discovered is the first visited.
The DFS algorithm goes as follows:
This is a data structure that can be used to store a set of sets and imple-
ments efficiently the join operation between sets. Each element of a set
is identified by a representative element. The algorithm starts by placing
each element in a set of its own, so there are n initial disjoint sets. Each
is represented by itself. When two sets are joined, the representative el-
ement of the latter is made to point to the representative element of the
former. The set of sets is stored as an array of integers. If at position i the
array stores a negative number, this number is interpreted as being the
representative element of its own set. If the number stored at position i is
instead a nonnegative number j, it means that it belongs to a set that was
joined with the set containing j.
Here is the implementation:
each time. We also override the __len__ operator so that we can check the
value of the counter using the len function on a DisjointSet.
As an example of application, here is a code that builds a nd maze. It may
be easier to picture it with d = 2, a two-dimensional maze. The algorithm
works by assuming there is a wall connecting any couple of two adjacent
cells. It labels the cells using an integer index. It puts all the cells into a
DisjointSets data structure and then keeps tearing down walls at random.
Two cells on the maze belong to the same set if they are connected, for
example, if there is a path that connects them. At the beginning, each
cell is its own set because it is isolated by walls. Walls are torn down by
being removed from the list wall if the wall was separating two disjoint
sets of cells. Walls are torn down until all cells belong to the same set, for
example, there is a path connecting any cell to any cell:
1 def make_maze(n,d):
2 walls = [(i,i+n**j) for i in xrange(n**2) for j in xrange(d) if (i/n**j)%n
+1<n]
3 torn_down_walls = []
4 ds = DisjointSets(n**d)
5 random.shuffle(walls)
6 for i,wall in enumerate(walls):
7 if ds.join(wall[0],wall[1]):
8 torn_down_walls.append(wall)
9 if len(ds)==1:
10 break
11 walls = [wall for wall in walls if not wall in torn_down_walls]
12 return walls, torn_down_walls
Here is an example of how to use it. This example also draws the walls
and the border of the maze:
1 >>> walls, torn_down_walls = make_maze(n=20,d=2)
implies that there is only one path connecting each couple of vertices.
Figure 3.3: Example of a minimum spanning tree subgraph of a larger graph. The
numbers on the links indicate their weight or length.
1 def Kruskal(graph):
2 vertices, links = graph
3 A = []
4 S = DisjointSets(len(vertices))
5 links.sort(cmp=lambda a,b: cmp(a[2],b[2]))
6 for source,dest,length in links:
7 if S.join(source,dest):
8 A.append((source,dest,length))
9 return A
The Prim [19] algorithm solves the same problem as the Kruskal algo-
rithm, but the Prim algorithm works on a directed graph. It works by
placing all vertices in a minimum priority queue where the queue met-
ric for each vertex is the length, or weighted value, of a link connecting
the vertex to the closest known neighbor vertex. At each iteration, the
algorithm pops a vertex from the priority queue, loops over its neighbors
(adjacent links), and, if it finds that one of its neighbors is already in the
queue and it is possible to connect it to the current vertex using a shorter
link than the one connecting the neighbor to its current closest vertex, the
neighbor information is then updated. The algorithm loops until there
are no vertices in the priority queue.
The Prim algorithm also differs from the Kruskal algorithm because the
former needs a starting vertex, whereas the latter does not. The result
when interpreted as a subgraph does not depend on the starting vertex:
5 self.closest = None
6 self.closest_dist = PrimVertex.INFINITY
7 self.neighbors = [link[1:] for link in links if link[0]==id]
8 def __cmp__(self,other):
9 return cmp(self.closest_dist, other.closest_dist)
10
The Prim algorithm, when using a priority queue for Q, goes as follows:
One can select a pool of animals and, for each two of them, compute the
similarity of the DNA of their hemoglobin genes using the lcs algorithm
discussed later. One can then link each two animals by a metric that rep-
resents how similar the two animals are. We can then run the Prim or
the Kruskal algorithm to find the minimum spanning tree. The tree rep-
resents the most likely evolutionary tree connecting those animal species.
Actually, three genes are responsible for hemoglobin (HBA1, HBA2, and
HBB). By performing the analysis on different genes and comparing the
results, it is possible to establish a consistency check of the results. [20]
Similar studies are performed routinely in evolutionary biology. They
can also be applied to viruses to understand how viruses evolved over
time. [21]
The Dijkstra [22] algorithm solves a similar problem to the Kruskal and
Prim algorithms. Given a graph, it computes, for each vertex, the shortest
path connecting the vertex to a starting (or source, or root) vertex. The
collection of links on all the paths defines the single-source shortest paths.
It works, like Prim, by placing all vertices in a min priority queue where
the queue metric for each vertex is the length of the path connecting the
vertex to the source. At each iteration, the algorithm pops a vertex from
the priority queue, loops over its neighbors (adjacent links), and, if it
finds that one of its neighbors is already in the queue and it is possible
to connect it to the current vertex using a link that makes the path to the
source shorter, the neighbor information is updated. The algorithm loops
until there are no more vertices in the priority queue.
The implementation of this algorithm is almost identical to the Prim algo-
rithm, except for two lines:
Figure 3.4: The result shows an application of the Dijkstra algorithm for the single
source shortest path applied to solve a maze.
12 symbols = {}
13 for symbol in input:
14 symbols[symbol] = symbols.get(symbol,0)+1
15 heap = []
16 for (k,f) in symbols.items():
118 annotated algorithms in python
17 heappush(heap,(f,k))
18 while len(heap)>1:
19 (f1,k1) = heappop(heap)
20 (f2,k2) = heappop(heap)
21 heappush(heap,(f1+f2,((f1,k1),(f2,k2))))
22 symbol_map = {}
23 inorder_tree_walk(heap[0],'',symbol_map)
24 encoded = ''.join(symbol_map[symbol] for symbol in input)
25 return symbol_map, encoded
26
E = − ∑ wi log2 wi (3.85)
u
where wi is the relative frequency of each symbol. In our case, this is easy
to compute as
How could we have done better? Notice for example that the Huffman
encoding does not take into account the order in which symbols appear.
The original string contains the triple “is” twice, and we could have taken
advantage of that pattern, but we did not.
Our choice of using characters as symbols is arbitrary. We could have
used a couple of characters as symbols or triplets or any other subse-
quences of bytes of the original input. We could also have used symbols
of different lengths for different parts of the input (we could have used a
single symbol for “is”). A different choice would have given a different
compression ratio, perhaps better, perhaps worse.
From this we can observe the following simple fact: if the two strings start
with the same letter, it’s always safe to choose that starting letter as the
first character of the subsequence. This is because, if you have some other
subsequence, represented as a collection of lines as drawn here, you can
“push” the leftmost line to the start of the two strings without causing any
other crossings and get a representation of an equally long subsequence
that does start this way.
Conversely, suppose that, like in the preceding example, the two first
characters differ. Then it is not possible for both of them to be part of a
common subsequence. There are three possible choices: remove the first
letter from either one of the strings or remove the letter from both strings.
Finally, observe that once we’ve decided what to do with the first char-
acters of the strings, the remaining subproblem is again a LCS problem
on two shorter strings. Therefore we can solve it recursively. However,
because we don’t know which choice of the three to take, we will take
them all and see which choice returns the best result.
Rather than finding the subsequence itself, it turns out to be more efficient
theory of algorithms 121
to find the length of the longest subsequence. Then, in the case where
the first characters differ, we can determine which subproblem gives the
correct solution by solving both and taking the max of the resulting sub-
sequence lengths. Once we turn this into a dynamic programming algo-
rithm, we get the following:
Here is an example:
3.8.3 Needleman–Wunsch
ternating rows (c and d for storing the temporary results, we store all
temporary results in an array z; when two matching symbols are found
and they are not consecutive, we apply a penalty equal to pm , where m
is the distance between the two matches and is also the size of the gap in
the matching subsequence:
1 def needleman_wunsch(a,b,p=0.97):
2 z=[]
3 for i,r in enumerate(a):
4 z.append([])
5 for j,c in enumerate(b):
6 if r==c:
7 e = z[i-1][j-1]+1 if i*j>0 else 1
8 else:
9 e = p*max(z[i-1][j] if i>0 else 0,
10 z[i][j-1] if j>0 else 0)
11 z[-1].append(e)
12 return z
Figure 3.5: A Needleman and Wunsch plot sequence alignment. The arrow-like patterns
indicate the point in the two sequences (represented by the X- and Y-coordinates) where
the two sequences are more likely to align.
Assume you want to fill your knapsack such that you will maximize the
value of its contents [28]. However, you are limited by the volume your
knapsack can hold. In the continuous knapsack, the amount of each prod-
uct can vary continuously. In the discrete one, each product has a finite
size, and you either carry it or no.
The continuous knapsack problem can be formulated as the problem of
maximizing
f ( x ) = a0 x0 + a1 x1 + · · · + a n x n (3.86)
given the constraint
b0 x0 + b1 x1 + · · · + bn xn ≤ c (3.87)
which means that investment opportunities for each firm are sorted ac-
cording to their cost.
k f [0, k]
0 0
1 5
2∗ 6∗ (3.91)
3 6
4 6
5 6
There are many algorithms available to cluster data [29]. They are all
based on empirical principles because the cluster themselves are defined
by the algorithm used to identify them. Normally we distinguish three
categories:
• Hierarchical clustering: These algorithms start by considering each point
a cluster of its own. At each iteration, the two clusters closest to each
other are joined together, forming a larger cluster. Hierarchical cluster-
ing algorithms differ from each other about the rule used to determine
the distance between clusters. The algorithm returns a tree represent-
ing the clusters that are joined, called a dendrogram.
• Centroid-based clustering: These algorithms require that each point be
represented by a vector and each cluster also be represented by a vector
(centroid of the cluster). With each iteration, a better estimation for
the centroids is given. An example of centroid-based clustering is k-
means clustering. These algorithms require an a priori knowledge of
the number of clusters and return the position of the centroids as well
the set of points belonging to each cluster.
theory of algorithms 129
2 def __init__(self,points,metric,weights=None):
3 self.points, self.metric = points, metric
4 self.k = len(points)
5 self.w = weights or [1.0]*self.k
6 self.q = dict((i,[i]) for i,e in enumerate(points))
7 self.d = []
8 for i in xrange(self.k):
9 for j in xrange(i+1,self.k):
10 m = metric(points[i],points[j])
11 if not m is None:
12 self.d.append((m,i,j))
13 self.d.sort()
14 self.dd = []
15 def parent(self,i):
16 while isinstance(i,int): (parent, i) = (i, self.q[i])
17 return parent, i
18 def step(self):
19 if self.k>1:
20 # find new clusters to join
21 (self.r,i,j),self.d = self.d[0],self.d[1:]
22 # join them
23 i,x = self.parent(i) # find members of cluster i
24 j,y = self.parent(j) # find members if cluster j
25 x += y # join members
26 self.q[j] = i # make j cluster point to i
27 self.k -= 1 # decrease cluster count
28 # update all distances to new joined cluster
29 new_d = [] # links not related to joined clusters
30 old_d = {} # old links related to joined clusters
31 for (r,h,k) in self.d:
32 if h in (i,j):
33 a,b = old_d.get(k,(0.0,0.0))
34 old_d[k] = a+self.w[k]*r,b+self.w[k]
35 elif k in (i,j):
36 a,b = old_d.get(h,(0.0,0.0))
37 old_d[h] = a+self.w[h]*r,b+self.w[h]
38 else:
39 new_d.append((r,h,k))
40 new_d += [(a/b,i,k) for k,(a,b) in old_d.items()]
41 new_d.sort()
42 self.d = new_d
43 # update weight of new cluster
44 self.w[i] = self.w[i]+self.w[j]
45 # get new list of cluster members
46 self.v = [s for s in self.q.values() if isinstance(s,list)]
47 self.dd.append((self.r,len(self.v)))
48 return self.r, self.v
49
50 def find(self,k):
132 annotated algorithms in python
Given a set of points, we can determine the most likely number of clusters
representing the data, and we can make a plot of the number of clusters
versus distance and look for a plateau in the plot. In correspondence with
the plateau, we can read from the y-coordinate the number of clusters.
This is done by the function cluster in the preceding algorithm, which
returns the average distance between clusters and a list of clusters.
For example:
ganized in the layers with one input layer of neurons connected only with
the input and the next layer. Another one, the output layer, comprises neu-
rons connected only with the output and previous layers, or many hidden
layers of neurons connected only with other neurons. Each neuron is char-
acterized by input links and output links. Each output of a neuron is a
function of its inputs. The exact shape of that function depends on the
network and on parameters that can be adjusted. Usually this function is
chosen to be a monotonic increasing function on the sum of the inputs,
where both the inputs and the outputs take values in the [0,1] range. The
inputs can be thought as electrical signals reaching the neuron. The out-
put is the electrical signal emitted by the neuron. Each neuron is defined
by a set of parameters a which determined the relative weight of the input
signals. A common choice for this characteristic function is:
where i labels the neuron, j labels the output, k labels the input, and aijk
134 annotated algorithms in python
Figure 3.8: Visual representation of the clusters where the points coordinates are pro-
jected in 2D.
13 return (b-a)*random.random() + a
14
15 @staticmethod
16 def sigmoid(x):
17 """ our sigmoid function, tanh is a little nicer than the standard 1/(1+
e^-x) """
18 return math.tanh(x)
19
20 @staticmethod
21 def dsigmoid(y):
22 """ # derivative of our sigmoid function, in terms of the output """
23 return 1.0 - y**2
24
25 def __init__(self, ni, nh, no):
26 # number of input, hidden, and output nodes
27 self.ni = ni + 1 # +1 for bias node
28 self.nh = nh
29 self.no = no
30
31 # activations for nodes
32 self.ai = [1.0]*self.ni
33 self.ah = [1.0]*self.nh
34 self.ao = [1.0]*self.no
35
36 # create weights
37 self.wi = Matrix(self.ni, self.nh, fill=lambda r,c: self.rand(-0.2, 0.2)
)
38 self.wo = Matrix(self.nh, self.no, fill=lambda r,c: self.rand(-2.0, 2.0)
136 annotated algorithms in python
)
39
48 # input activations
49 for i in xrange(self.ni-1):
50 self.ai[i] = inputs[i]
51
52 # hidden activations
53 for j in xrange(self.nh):
54 s = sum(self.ai[i] * self.wi[i,j] for i in xrange(self.ni))
55 self.ah[j] = self.sigmoid(s)
56
57 # output activations
58 for k in xrange(self.no):
59 s = sum(self.ah[j] * self.wo[j,k] for j in xrange(self.nh))
60 self.ao[k] = self.sigmoid(s)
61 return self.ao[:]
62
In the following example, we teach the network the XOR function, and
we create a network with two inputs, two intermediate neurons, and one
output. We train it and check what it learned:
Now, we use our neural network to learn patterns in stock prices and
predict the next day return. We then check what it has learned, comparing
the sign of the prediction with the sign of the actual return for the same
days used to train the network:
3 class Chromosome:
4 alphabet = 'ATGC'
5 size = 32
6 mutations = 2
7 def __init__(self,father=None,mother=None):
8 if not father or not mother:
9 self.dna = [choice(self.alphabet) for i in xrange(self.size)]
10 else:
11 self.dna = father.dna[:self.size/2]+mother.dna[self.size/2:]
12 for mutation in xrange(self.mutations):
13 self.dna[randint(0,self.size-1)] = choice(self.alphabet)
14 def fitness(self,target):
15 return sum(1 for i,c in enumerate(self.dna) if c==target.dna[i])
16
17 def top(population,target,n=10):
18 table = [(chromo.fitness(target), chromo) for chromo in population]
19 table.sort(reverse = True)
20 return [row[1] for row in table][:n]
21
22 def oneof(population):
23 return population[randint(0, len(population)-1)]
24
25 def main():
26 GENERATIONS = 10000
27 OFFSPRING = 20
28 SEEDS = 20
29 TARGET = Chromosome()
30
40 population = [Chromosome(father=oneof(fittest),mother=oneof(fittest)) \
41 for i in xrange(OFFSPRING)]
42
43 if __name__=='__main__': main()
There are a number of open problems about the relations among these
sets. Is the set co-NP equivalent to NP? Or perhaps is the intersection
between co-NP and NP equal to P? Are NP and NPC the same set? These
questions are very important in computer science because if, for example,
NP turns out to be the same set as NPC, it means that it must be possible
to find algorithms that solve in polynomial time problems that currently
do not have a polynomial time solution. Conversely, if one could prove
that NP is not equivalent to NPC, we would know that a polynomial time
solution to NPC problems does not exist [32].
Cantor proved that the real numbers in any interval (e.g., in [0, 1)) are
more than the integer numbers, therefore real numbers are uncount-
able [33]. The proof proceeds as follows:
1. Consider the real numbers in the interval [0, 1) not including 1.
2. Assume that these real numbers are countable. Therefore it is possible
to associate each of them to an integer
1 ←→ 0.xxxxxxxxxxx . . .
2 ←→ 0.xxxxxxxxxxx . . .
3 ←→ 0.xxxxxxxxxxx . . .
(3.99)
4 ←→ 0.xxxxxxxxxxx . . .
5 ←→ 0.xxxxxxxxxxx . . .
... ... ...
1 ←→ 0.xxxxxxxxxxx . . .
2 ←→ 0.xxxxxxxxxxx . . .
3 ←→ 0.xxxxxxxxxxx . . .
(3.100)
4 ←→ 0.xxxxxxxxxxx . . .
5 ←→ 0.xxxxxxxxxxx . . .
... ... ...
Gödel used a similar diagonal argument to prove that there are as many
problems (or theorems) as real numbers and as many algorithms (or
proofs) as natural numbers [33]. Because there is more of the former
than the latter, it follows that there are problems for which there is no
corresponding solving algorithm. Another interpretation of Gödel’s the-
orem is that, in any formal language, for example, mathematics, there are
theorems that cannot be proved.
Another consequence of Gödel’s theorem is the following: it is impossible
to write a computer program to test if a given algorithm stops or enters
into an infinite loop.
Consider the following code:
theory of algorithms 143
1 def next(i):
2 while len(set(str(i*i))) > 2:
3 i=i+2
4 print i
5
6 next(81621)
This code check searches for a number equal or greater than 81621 which
square is comprised of only two digits. Nobody knows whether such
number exists, therefore nobody knows if this code stops.
Although one day this problem may be solved, there are many other prob-
lems that are still unsolved; actually, there are an infinite number of them.
4
Numerical Algorithms
the right or to the left can make the ball roll to the right or to the left,
respectively. Therefore this is not a well posed problem.
A problem is said to be stable if the solution is not just continuous but also
weakly sensitive to input data. This means that the change of the output
(in percent) is smaller than the change in the input (in percent).
Numerical algorithms work best with stable problems.
We can quantify this as follows. Let x be an input and y be the output of
a function:
y = f (x) (4.1)
|dy/y|
cond( f , x ) ≡ = | x f 0 ( x )/ f ( x )| (4.2)
|dx/x |
(the latter equality only holds if f is differentiable in x).
A problem with a low condition number is said to be well-conditioned,
while a problem with a high condition number is said to be ill-
conditioned. XXX
We say that a problem characterized by a function f is well conditioned
in a domain D if the condition number is less than 1 for every input in
the domain. We also say that a problem is stable if it is well conditioned.
In this book, we are mostly concerned with stable (well-conditioned)
problems. If a problem is well-conditioned in for all input in a domain, it
is also stable.
δy = | f ( x̄ ) − f n ( x̄ )| + | f n0 ( x̄ )|δx (4.3)
The first term is the computational error (we use f n instead of the true
f ), and the second term is the propagated data error (δx, the uncer-
tainty in x, propagates through f n ).
When a variable x has a finite Gaussian uncertainty δx, how does the
uncertainty propagate through a function f ? Assuming the uncertainty is
small, we can always expand using a Taylor series:
We have used this formula before for the propagated data error. For
functions of two variables z = f ( x, y) and assuming the uncertainties in x
and y are independent,
s
∂ f ( x, y) 2 ∂ f ( x, y) 2
δz =
2
δx +
δy2 (4.6)
∂x ∂y
4.2.2 buckingham
Here are some strategies that are normally employed in numerical algo-
rithms:
• Approximate a continuous system with a discrete system
• Replace integrals with sums
• Replace derivatives with finite differences
• Replace nonlinear with linear + corrections
• Transform a problem into a different one
• Approach the true result by iterations
Here are some examples of each of the strategies.
when i = n.
d f + (x) f ( x + h) − f ( x )
= lim (4.7)
dx h →0 h
the left derivative
d f − (x) f ( x ) − f ( x − h)
= lim (4.8)
dx h →0 h
and the average of the two
1 d f + (x) d f − (x) f ( x + h) − f ( x − h)
d f (x)
= + = lim (4.9)
dx 2 dx dx h →0 2h
d f (x) f ( x + h) − f ( x − h)
= + O(h) (4.10)
dx 2h
The three definitions are equivalent for functions that are differentiable in
x, and the latter is preferable because it is more symmetric.
Notice that even more definitions are possible as long as they agree in the
limit h → 0. Definitions that converge faster as h goes to zero are referred
to as “improvement.”
We can easily implement the concept of a numerical derivative in code
by creating a functional D that takes a function f and returns the function
d f (x)
dx (a functional is a function that returns another function):
d2 f ( x ) f ( x + h) − 2 f ( x ) − f ( x − h)
= + O(h) (4.11)
dx2 h2
Here is an example:
Notice how composing the first derivative twice or computing the second
derivative directly yields a similar result.
We could easily derive formulas for higher-order derivatives and imple-
numerical algorithms 153
Here the first column is the value of x, the second column is the corre-
sponding sin( x ), and the third column is the relative difference (in per-
cent) between x and sin( x ). The difference is always less than 20%; there-
fore, if we are happy with this precision, then we can replace sin( x ) with
x.
This works because any function f ( x ) can be expanded using a Taylor
series. The first order of the Taylor expansion is linear. For values of
x sufficiently close to the expansion point, the function can therefore be
approximated with its Taylor expansion.
Expanding on the previous example, consider the following code:
1 >>> from math import sin
2 >>> points = [0.01*i for i in xrange(0,11)]
3 >>> for x in points:
4 ... s = x - x*x*x/6
5 ... print x, math.sin(x), s, ``%.6f'' % (abs(s-sin(x))/(sin(x))*100)
6 0.01 0.009999833... 0.009999... 0.000000
7 0.02 0.019998666... 0.019998... 0.000000
8 0.03 0.029995500... 0.029995... 0.000001
9 0.04 0.039989334... 0.039989... 0.000002
154 annotated algorithms in python
q
sin( x ) = 1 − sin(π/2 − x )2 (4.16)
to reduce the domain to x ∈ [0, π/4), where the latter is a subset of [0, 1).
The approximations sin( x ) ' x and sin( x ) ' x − x3 /6 came from lin-
earizing the function sin( x ) and adding a correction to the previous ap-
proximation, respectively. In general, we can iterate the process of finding
corrections and approximating the true result.
Here is an example of a general iterative algorithm:
1 result=guess
2 loop:
3 compute correction
4 result=result+correction
5 if result sufficiently close to true result:
6 return result
f (k) ( x̄ )
f ( x ) = f ( x̄ ) + f (0) ( x̄ )( x − x̄ ) + · · · + ( x − x̄ )k + Rk (4.17)
n!
f ( k +1) ( ξ )
Rk = ( x − x̄ )k+1 (4.18)
( k + 1) !
| Rk | < max x∈ D f (k+1) |( x − x̄ )k+1 | (4.19)
f (x) = ex (4.20)
(1) x
f (x) = e (4.21)
... = ... (4.22)
f (k) ( x ) = e x (4.23)
1 1
e x = 1 + x + x2 + · · · + x k + . . . (4.24)
2 k!
Sin for x̄ = 0:
numerical algorithms 157
f ( x ) = sin( x ) (4.25)
(1)
f ( x ) = cos( x ) (4.26)
(2)
f ( x ) = − sin( x ) (4.27)
f (3) ( x ) = − cos( x ) (4.28)
... = ... (4.29)
1 3 (−1)n
sin( x ) = x − x +···+ x (2k+1) + . . . (4.30)
3! (2k + 1)!
Notice that we can very well expand in Taylor around any other point, for
example, x̄ = π/2, and we get
1 π (−1)n π
sin( x ) = 1 − ( x − )2 + · · · + ( x − )(2k) + . . . (4.31)
2 2 (2k)! 2
Figure 4.1: The figure shows the sin function and its approximation using the Taylor
expansion around x = 0 at different orders.
Figure 4.2: The figure shows the sin function and its approximation using the Taylor
expansion around x = π/2 at different orders.
f ( x ) = cos( x ) (4.32)
f (1) ( x ) = − sin( x ) (4.33)
(2)
f ( x ) = − cos( x ) (4.34)
f (3) ( x ) = sin( x ) (4.35)
... = ... (4.36)
1 (−1)n
cos( x ) = 1 − x2 + · · · + x (2k) + . . . (4.37)
2 (2k)!
which will be useful when we talk about Fourier and Laplace transforms.
Now let’s consider the kth term in Taylor expansion of e x . It can be rear-
160 annotated algorithms in python
1 n x 1 x
Tk ( x ) = x = x k−1 = Tk−1 ( x ) (4.39)
k! k ( k − 1) ! k
For x < 0, the terms in the sign have alternating sign and are decreasing
in magnitude; therefore, for x < 0, Rk < Tk+1 (1). This allows for an easy
implementation of the Taylor expansion and its stopping condition:
This code presents all the features of many of the algorithms we see later
in the chapter:
• It deals with the special case e0 = 1.
• It reduces difficult problems to easier problems (exponential of a posi-
tive number to the exponential of a negative number via e x = 1/e− x ).
• It approximates the “true” solution by iterations.
• The max number of iterations is limited.
• There is a stopping condition.
• It detects failure to converge.
Here is a test of its convergence:
because the derivatives of sin are always sin and cos and their image is
always between [−1,1].
Also notice that the stopping condition is only true when 0 ≤ x < 1.
Therefore, for other values of x, we must use trigonometric relations to
reduce the problem to a domain where the Taylor series converges.
Hence we write:
In the code, we use the computed result as an estimate of the [true value]
and, occasionally, the computed correction is an estimate of the [absolute
error]. The target absolute precision is an input value that we use as an
upper limit for the absolute error. The target relative precision is an input
value we use as an upper limit for the relative error. When absolute error
falls below the target absolute precision or the relative error falls below
the target relative precision, we stop looping and assume the result is
sufficiently precise:
1 def generic_looping_function(guess, ap, rp, ns):
2 result = guess
3 for k in xrange(ns):
4 correction = ...
5 result = result+correction
6 remainder = ...
7 if norm(remainder) < max(ap,norm(result)*rp): return result
8 raise ArithmeticError('no convergence')
and if it is homogeneous,
f (αx ) = α f ( x ) (4.45)
numerical algorithms 165
In simpler words, we can say that the output is proportional to the input.
As discussed in the introduction to this chapter, one of the simplest tech-
niques for approximating any unknown system consists of approximating
it with a linear system (and this approximation will be correct for some
system and not for others).
When we try to model a new system, approximating the system with a
linear system is often the first step in describing it in a quantitative way,
even if it may turn out that this is not a good approximation.
This is the same as approximating the function f describing the system
with the first-order Taylor expansions f ( x + h) − f ( x ) = f 0 ( x )h.
For a multidimensional system with input x (now a vector) and output y
(also a vector, not necessarily of the same size as x), we can still approxi-
mate y = f (x) with f (y + h) − y ' Ah, yet we need to clarify what this
latter equation means.
Given
x0 y0
x y1
x≡ 1 y≡ (4.46)
... ...
x n −1 y m −1
a00 a01 ... a0,n−1
a10 a11 ... a1,n−1
A≡ (4.47)
... ... ... ...
am−1,0 am−1,1 ... am−1,n−1
which means
But such an object (a list of lists) does not have the mathematical proper-
ties we want, so we have to define them.
First, we define a class representing a matrix:
1 class Matrix(object):
2 def __init__(self,rows,cols=1,fill=0.0):
3 """
4 Constructor a zero matrix
5 Examples:
6 A = Matrix([[1,2],[3,4]])
7 A = Matrix([1,2,3,4])
8 A = Matrix(10,20)
9 A = Matrix(10,20,fill=0.0)
10 A = Matrix(10,20,fill=lambda r,c: 1.0 if r==c else 0.0)
11 """
12 if isinstance(rows,list):
13 if isinstance(rows[0],list):
14 self.rows = [[e for e in row] for row in rows]
15 else:
16 self.rows = [[e] for e in rows]
17 elif isinstance(rows,int) and isinstance(cols,int):
18 xrows, xcols = xrange(rows), xrange(cols)
19 if callable(fill):
20 self.rows = [[fill(r,c) for c in xcols] for r in xrows]
21 else:
22 self.rows = [[fill for c in xcols] for r in xrows]
23 else:
24 raise RuntimeError("Unable to build matrix from %s" % repr(rows))
25 self.nrows = len(self.rows)
26 self.ncols = len(self.rows[0])
Notice that the constructor takes the number of rows and columns (cols)
of the matrix but also a fill value, which can be used to initialize the
matrix elements and defaults to zero. It can be callable in case we need to
initialize the matrix with row,col dependent values.
The actual matrix elements are stored as a list or array into the data mem-
ber variable. If optimize=True, the data are stored in an array of double
precision floating point numbers (“d”). This optimization will prevent
you from building matrices of complex numbers or matrices of arbitrary
precision decimal numbers.
Now we define a getter method, a setter method, and a string representa-
tion for the matrix elements:
1 def __getitem__(A,coords):
2 " x = A[0,1]"
3 i,j = coords
168 annotated algorithms in python
4 return A.rows[i][j]
5
6 def __setitem__(A,coords,value):
7 " A[0,1] = 3.0 "
8 i,j = coords
9 A.rows[i][j] = value
10
11 def tolist(A):
12 " assert(Matrix([[1,2],[3,4]]).tolist() == [[1,2],[3,4]]) "
13 return A.rows
14
15 def __str__(A):
16 return str(A.rows)
17
18 def flatten(A):
19 " assert(Matrix([[1,2],[3,4]]).flatten() == [1,2,3,4]) "
20 return [A[r,c] for r in xrange(A.nrows) for c in xrange(A.ncols)]
21
22 def reshape(A,n,m):
23 " assert(Matrix([[1,2],[3,4]]).reshape(1,4).tolist() == [[1,2,3,4]]) "
24 if n*m != A.nrows*A.ncols:
25 raise RuntimeError("Impossible reshape")
26 flat = A.flatten()
27 return Matrix(n,m,fill=lambda r,c,m=m,flat=flat: flat[r*m+c])
28
29 def swap_rows(A,i,j):
30 " assert(Matrix([[1,2],[3,4]]).swap_rows(1,0).tolist() == [[3,4],[1,2]])
"
31 A.rows[i], A.rows[j] = A.rows[j], A.rows[i]
1 @staticmethod
2 def identity(rows=1,e=1.0):
3 return Matrix(rows,rows,lambda r,c,e=e: e if r==c else 0.0)
4
5 @staticmethod
6 def diagonal(d):
7 return Matrix(len(d),len(d),lambda r,c,d=d:d[r] if r==c else 0.0)
1 def __add__(A,B):
2 """
3 Adds A and B element by element, A and B must have the same size
4 Example
5 >>> A = Matrix([[4,3.0], [2,1.0]])
6 >>> B = Matrix([[1,2.0], [3,4.0]])
7 >>> C = A + B
8 >>> print C
9 [[5, 5.0], [5, 5.0]]
10 """
11 n, m = A.nrows, A.ncols
12 if not isinstance(B,Matrix):
13 if n==m:
14 B = Matrix.identity(n,B)
15 elif n==1 or m==1:
16 B = Matrix([[B for c in xrange(m)] for r in xrange(n)])
17 if B.nrows!=n or B.ncols!=m:
18 raise ArithmeticError('incompatible dimensions')
19 C = Matrix(n,m)
20 for r in xrange(n):
21 for c in xrange(m):
22 C[r,c] = A[r,c]+B[r,c]
23 return C
24
25 def __sub__(A,B):
26 """
27 Adds A and B element by element, A and B must have the same size
28 Example
29 >>> A = Matrix([[4.0,3.0], [2.0,1.0]])
30 >>> B = Matrix([[1.0,2.0], [3.0,4.0]])
31 >>> C = A - B
32 >>> print C
33 [[3.0, 1.0], [-1.0, -3.0]]
34 """
35 n, m = A.nrows, A.ncols
36 if not isinstance(B,Matrix):
37 if n==m:
38 B = Matrix.identity(n,B)
39 elif n==1 or m==1:
40 B = Matrix(n,m,fill=B)
41 if B.nrows!=n or B.ncols!=m:
42 raise ArithmeticError('Incompatible dimensions')
43 C = Matrix(n,m)
44 for r in xrange(n):
45 for c in xrange(m):
46 C[r,c] = A[r,c]-B[r,c]
47 return C
170 annotated algorithms in python
4 M = copy.deepcopy(A)
5 for r in xrange(M.nrows):
6 for c in xrange(M.ncols):
7 M[r,c] *= x
8 return M
9
10 def __mul__(A,B):
11 "multiplies a number of matrix A by another matrix B"
12 if isinstance(B,(list,tuple)):
13 return (A*Matrix(len(B),1,fill=lambda r,c:B[r])).nrows
14 elif not isinstance(B,Matrix):
15 return B*A
16 elif A.ncols == 1 and B.ncols==1 and A.nrows == B.nrows:
17 # try a scalar product ;-)
18 return sum(A[r,0]*B[r,0] for r in xrange(A.nrows))
19 elif A.ncols!=B.nrows:
20 raise ArithmeticError('Incompatible dimension')
21 M = Matrix(A.nrows,B.ncols)
22 for r in xrange(A.nrows):
23 for c in xrange(B.ncols):
24 for k in xrange(A.ncols):
25 M[r,c] += A[r,k]*B[k,c]
26 return M
Figure 4.3: Example of the effect of different linear transformations on the same set of
points. From left to right, top to bottom, they show stretching along both the X- and
Y-axes, scaling across both axes, a rotation, and a generic transformation that does not
preserve angles.
Ax = b (4.54)
174 annotated algorithms in python
S0 Ax = S0 Bb (4.56)
Because the preceding expression must be true for every b and because x
is the solution of Ax = b, by definition, Sn−1 . . . S1 S0 B ≡ A−1 .
The Gauss-Jordan algorithm works exactly this way: given A, it computes
A −1 :
26
27 def __div__(A,B):
28 if isinstance(B,Matrix):
29 return A*(1.0/B) # matrix/matrix
30 else:
31 return (1.0/B)*A # matrix/scalar
Notice the new matrix is defined with the number of rows and columns
switched from matrix A. Notice that in Python, a property is a method
that is called like an attribute, therefore without the () notation. This can
be used as follows:
For later use, we define two functions to check whether a matrix is sym-
metrical or zero within a given precision.
Another typical transformation for matrices of complex numbers is the
Hermitian operation, which is a transposition combined with complex
conjugation of the elements:
176 annotated algorithms in python
Ax = b (4.61)
numerical algorithms 177
where
1 2 2 3
A = 4 4 2 and b=6 (4.62)
4 6 4 10
x = A −1 b (4.63)
but not
1 >>> b = Matrix([[3,6,10]]) # wrong
!1
p
|| x || p ≡ ∑ abs(xi ) p (4.65)
i
178 annotated algorithms in python
We can extend the notation of a norm to any function that maps a vector
into a vector, as follows:
The 2-norm is difficult to compute for a matrix, but the 1-norm provides
an approximation. It is computed by adding up the magnitude of the
elements per each column and finding the maximum sum.
This allows us to define a generic function to compute the norm of lists,
matrices/vectors, and scalars:
1 def condition_number(f,x=None,h=1e-6):
2 if callable(f) and not x is None:
3 return D(f,h)(x)*x/f(x)
4 elif isinstance(f,Matrix): # if is the Matrix
5 return norm(f)*norm(1/f)
6 else:
7 raise NotImplementedError
Having the norm for matrices also allows us to extend the definition of
convergence of a Taylor series to a series of matrices:
1 def exp(x,ap=1e-6,rp=1e-4,ns=40):
2 if isinstance(x,Matrix):
3 t = s = Matrix.identity(x.ncols)
4 for k in xrange(1,ns):
5 t = t*x/k # next term
6 s = s + t # add next term
7 if norm(t)<max(ap,norm(s)*rp): return s
8 raise ArithmeticError('no convergence')
9 elif type(x)==type(1j):
10 return cmath.exp(x)
11 else:
12 return math.exp(x)
1 >>> A = Matrix([[1,2],[3,4]])
2 >>> print exp(A)
3 [[51.96..., 74.73...], [112.10..., 164.07...]]
180 annotated algorithms in python
The Cholesky algorithm fails if and only if the input matrix is not sym-
metric or not positive definite, therefore it can be used to check whether
a symmetric matrix is positive definite.
Consider for example a generic covariance matrix A. It is supposed to be
numerical algorithms 181
1
p(x) ∝ exp − xt A−1 x (4.69)
2
where A is a symmetric and positive definite matrix. In fact, if A = LLt ,
then
1 −1 t −1
p(x) ∝ exp − ( L x) ( L x) (4.70)
2
and with a change of variable u = L−1 x, we obtain
1
p(x) ∝ exp − ut u (4.71)
2
and
1 2 1 2 1 2
p(x) ∝ e− 2 u0 e− 2 u1 e− 2 u2 ... (4.72)
The RandomList is a generator. You can iterate over it. The yield keyword
is used like return, except the function will return a generator.
Here r̄ is the current risk-free investment rate. Usually the risk is mea-
sured as the standard deviation of its daily (or monthly or yearly) return.
Consider the stock price pit of stock i at time t and its arithmetic daily
return rit = ( pi,t+1 − pit )/pit given a risk-free interest equal to r̄.
For each stock, we can compute the average return and average risk (vari-
ance of daily returns) and display it in a risk-return plot as we did in
chapter 2.
We can try to build arbitrary portfolios by investing in multiple stocks at
the same time. Modern portfolio theory states that there is a maximum
Sharpe ratio and there is one portfolio that corresponds to it. It is called
the tangency portfolio.
A portfolio is identified by fractions of $1 invested in each stock in the
portfolio. Our goal is to determine the tangent portfolio.
If we assume that daily returns for the stocks are Gaussian, then the solv-
ing algorithm is simple.
numerical algorithms 183
All we need is to compute the average return for each stock, defined as
The inputs of the formula are the covariance matrix (A), a vector or arith-
metic returns for the assets in the portfolio (r), the risk free rate (r̄). The
output is a vector (x) whose elements are the percentages to be invested
in each asset to obtain a tangency portfolio. Notice that some elements of
x can be negative and this corresponds to short position (sell, not buy, the
asset).
Here is the algorithm:
and the following expected returns (arithmetic returns, not log returns,
because the former are additive, whereas the latter are not):
1 >>> mu = Matrix([[0.10],[0.12],[0.15]])
we compute the tangent portfolio (highest Sharpe ratio), its return, and
its risk with one function call:
1 >>> x, ret, risk = Markowitz(mu, cov, r_free)
2 >>> print x
3 [0.5566343042071198, 0.27508090614886727, 0.16828478964401297]
4 >>> print ret, risk
5 0.113915857605 0.186747095412
6 >>> print (ret-r_free).risk
7 0.34225891152
8 >>> for r in xrange(3): print (mu[r,0]-r_free)/sqrt(cov[r,r])
9 0.25
10 0.233333333333
11 0.25
Investing 55% in asset 0, 27% in asset 1, and 16% in asset 2, the resulting
portfolio has an expected return of 11.39% and a risk of 18.67%, which
corresponds to a Sharpe ratio of 0.34, much higher than 0.25, 0.23, and
0.23 for the individual assets.
Notice that the tangency portfolio is not the only one with the highest
Sharpe ratio (return for unit of risk). In fact, any linear combination of
the tangency portfolio with a risk-free asset (putting money in the bank)
has the same Sharpe ratio. For any target risk, one can find a linear
combination of the risk-free asset and the tangent portfolio that has a
better Sharpe ratio than any other possible portfolio comprising the same
assets.
If we call α the fraction of the money to invest in the tangency portfolio
and 1 − α the fraction to keep in the bank at the risk free rate, the resulting
numerical algorithms 185
αx · r + (1 − α)r̄ (4.77)
and risk
√
α xt Ax (4.78)
We want to find the {ci } that minimizes the sum of the squared distances
between the actual “observed” data o j and the predicted “expected” data
e j , in units of do j . This metric is called χ2 in general [35]. An algorithm
that minimizes the χ2 and is linear in the ci coefficients (our case here) is
called linear least squares or linear regression.
e − o 2
j j
χ = ∑
2
(4.83)
do j
j
186 annotated algorithms in python
f 0 ( t0 ) f 1 ( t0 ) f 2 ( t0 )
o0
do0 do0 do0 ... do0
f 0 ( t1 ) f 1 ( t1 ) f 2 ( t1 ) o1
do1 do1 do1 ... do1
A= f 0 ( t2 ) f 1 ( t2 ) f 2 ( t2 )
b= o2
(4.84)
...
do2 do2 do2 do2
... ... ... ... ...
Its solution is
c = ( A t A ) −1 ( A t b ) (4.90)
6 parameters:
7 - a list of fitting functions
8 - a list with points (x,y,dy)
9
10 returns:
11 - column vector with fitting coefficients
12 - the chi2 for the fit
13 - the fitting function as a lambda x: ....
14 """
15 def eval_fitting_function(f,c,x):
16 if len(f)==1: return c*f[0](x)
17 else: return sum(func(x)*c[i,0] for i,func in enumerate(f))
18 A = Matrix(len(points),len(f))
19 b = Matrix(len(points))
20 for i in xrange(A.nrows):
21 weight = 1.0/points[i][2] if len(points[i])>2 else 1.0
22 b[i,0] = weight*float(points[i][1])
23 for j in xrange(A.ncols):
24 A[i,j] = weight*f[j](float(points[i][0]))
25 c = (1.0/(A.T*A))*(A.T*b)
26 chi = A*c-b
27 chi2 = norm(chi,2)**2
28 fitting_f = lambda x, c=c, f=f, q=eval_fitting_function: q(f,c,x)
29 if isinstance(c,Matrix): return c.flatten(), chi2, fitting_f
30 else: return c, chi2, fitting_f
31
32 # examples of fitting functions
33 def POLYNOMIAL(n):
34 return [(lambda x, p=p: x**p) for p in xrange(n+1)]
35 CONSTANT = POLYNOMIAL(0)
36 LINEAR = POLYNOMIAL(1)
37 QUADRATIC = POLYNOMIAL(2)
38 CUBIC = POLYNOMIAL(3)
39 QUARTIC = POLYNOMIAL(4)
Here is how we can generate some random points and solve the problem
for a polynomial of degree 2 (or quadratic fit):
Fig. 4.4.9 is a plot of the first 10 points compared with the best fit:
Figure 4.4: Random data with their error bars and the polynomial best fit.
numerical algorithms 189
20 def simulate(self,data,cash=1000.0,shares=0.0,daily_rate=0.03/360):
21 "find fitting parameters that optimize the trading strategy"
22 for t in xrange(len(data)):
23 suggestion = self.strategy(data[:t])
24 today_close = data[t-1]['adjusted_close']
25 # and we buy or sell based on our strategy
26 if cash>0 and suggestion=='buy':
27 # we keep track of finances
28 shares_bought = int(cash/today_close)
29 shares += shares_bought
30 cash -= shares_bought*today_close
31 elif shares>0 and suggestion=='sell':
32 cash += shares*today_close
33 shares = 0.0
34 # we assume money in the bank also gains an interest
35 cash*=math.exp(daily_rate)
36 # we return the net worth
37 return cash+shares*data[-1]['adjusted_close']
Now we back test the strategy using financial data for AAPL for the year
2011:
Our strategy did considerably better than the risk-free return of 3% but
not as well as investing and holding AAPL shares over the same period.
Of course, we can always engineer a strategy based on historical data that
will outperform holding the stock, but past performance is never a guarantee
of future performance.
According to the definition from investopedia.com, “technical analysts be-
lieve that the historical performance of stocks and markets is an indication
of future performance.”
The efficacy of both technical and fundamental analysis is disputed by
the efficient-market hypothesis, which states that stock market prices are
numerical algorithms 191
Axi = ei xi (4.95)
For example:
! !
1 −2 −1
A= and xi = (4.96)
1 4 1
! ! !
1 −2 −1 −1
∗ = 3∗ (4.97)
1 4 1 1
A = UDU t (4.98)
1 def sqrt(x):
2 try:
3 return math.sqrt(x)
4 except ValueError:
5 return cmath.sqrt(x)
6
7 def Jacobi_eigenvalues(A,checkpoint=False):
8 """Returns U end e so that A=U*diagonal(e)*transposed(U)
9 where i-column of U contains the eigenvector corresponding to
10 the eigenvalue e[i] of A.
11
12 from http://en.wikipedia.org/wiki/Jacobi_eigenvalue_algorithm
13 """
14 def maxind(M,k):
15 j=k+1
16 for i in xrange(k+2,M.ncols):
17 if abs(M[k,i])>abs(M[k,j]):
18 j=i
19 return j
20 n = A.nrows
21 if n!=A.ncols:
22 raise ArithmeticError('matrix not squared')
23 indexes = xrange(n)
24 S = Matrix(n,n, fill=lambda r,c: float(A[r,c]))
25 E = Matrix.identity(n)
26 state = n
27 ind = [maxind(S,k) for k in indexes]
28 e = [S[k,k] for k in indexes]
29 changed = [True for k in indexes]
30 iteration = 0
31 while state:
32 if checkpoint: checkpoint('rotating vectors (%i) ...' % iteration)
33 m=0
34 for k in xrange(1,n-1):
35 if abs(S[k,ind[k]])>abs(S[m,ind[m]]): m=k
36 pass
37 k,h = m,ind[m]
38 p = S[k,h]
39 y = (e[h]-e[k])/2
40 t = abs(y)+sqrt(p*p+y*y)
41 s = sqrt(p*p+t*t)
42 c = t/s
numerical algorithms 193
43 s = p/s
44 t = p*p/t
45 if y<0: s,t = -s,-t
46 S[k,h] = 0
47 y = e[k]
48 e[k] = y-t
49 if changed[k] and y==e[k]:
50 changed[k],state = False,state-1
51 elif (not changed[k]) and y!=e[k]:
52 changed[k],state = True,state+1
53 y = e[h]
54 e[h] = y+t
55 if changed[h] and y==e[h]:
56 changed[h],state = False,state-1
57 elif (not changed[h]) and y!=e[h]:
58 changed[h],state = True,state+1
59 for i in xrange(k):
60 S[i,k],S[i,h] = c*S[i,k]-s*S[i,h],s*S[i,k]+c*S[i,h]
61 for i in xrange(k+1,h):
62 S[k,i],S[i,h] = c*S[k,i]-s*S[i,h],s*S[k,i]+c*S[i,h]
63 for i in xrange(h+1,n):
64 S[k,i],S[h,i] = c*S[k,i]-s*S[h,i],s*S[k,i]+c*S[h,i]
65 for i in indexes:
66 E[k,i],E[h,i] = c*E[k,i]-s*E[h,i],s*E[k,i]+c*E[h,i]
67 ind[k],ind[h]=maxind(S,k),maxind(S,h)
68 iteration+=1
69 # sort vectors
70 for i in xrange(1,n):
71 j=i
72 while j>0 and e[j-1]>e[j]:
73 e[j],e[j-1] = e[j-1],e[j]
74 E.swap_rows(j,j-1)
75 j-=1
76 # normalize vectors
77 U = Matrix(n,n)
78 for i in indexes:
79 norm = sqrt(sum(E[i,j]**2 for j in indexes))
80 for j in indexes: U[j,i] = E[i,j]/norm
81 return U,e
Here is an example that shows, for a particular case, the relation between
the input, A, of the output of the U, e of the Jacobi algorithm:
Eigenvalues can be used to filter noise out of data and find hidden depen-
dencies in data. Following are some examples.
The image shows that one eigenvalue, the last one, is much larger than
the others. It tells us that the data series have something in common. In
fact, the arithmetic returns for stock j at time t can be written as
pt = ∑ Un−1,j r jt (4.100)
i
βi = ∑ rit pt (4.101)
t
αit = rit − β i pt (4.102)
Figure 4.5: Eigenvalues of the correlation matrix for 20 of the S&P100 stocks, sorted by
their magnitude.
Sometimes we have to invert matrices that are very large, and the Gauss-
Jordan algorithms fails. Yet, if the matrix is sparse, in the sense that most
of its elements are zeros, than two algorithms come to our rescue: the
minimum residual and the biconjugate gradient (for which we consider a
variant called the stabilized bi-conjugate gradient).
We will also assume that the matrix to be inverted is given in some im-
plicit algorithmic as y = f (x) because this is always the case for sparse
matrices. There is no point to storing all its elements because most of
them are zero.
numerical algorithms 197
xi = y + α1 q1 + α2 q2 + · · · + αi qi ∈ K ( f , y, i + 1) (4.104)
r = f ( xi ) − y (4.105)
1 def invert_minimum_residual(f,x,ap=1e-4,rp=1e-4,ns=200):
2 import copy
3 y = copy.copy(x)
4 r = x-1.0*f(x)
5 for k in xrange(ns):
6 q = f(r)
7 alpha = (q*r)/(q*q)
8 y = y + alpha*r
9 r = r - alpha*q
10 residue = sqrt((r*r)/r.nrows)
11 if residue<max(ap,norm(y)*rp): return y
12 raise ArithmeticError('no convergence')
198 annotated algorithms in python
Notice that the minimum residual and the stabilized biconjugate gradient,
if they converge, converge to the same value.
As an example, consider the following. We take a picture using a cam-
era, but we take the picture out of focus. The image is represented
by a set of m2 pixels. The defocusing operation can be modeled as a
first approximation with a linear operator acting on the “true” image, x,
and turning it into an “out of focus” image, y. We can store the pixels
in a one-dimensional vector (both for x and y) as opposed to a matrix
by mapping the pixel (r, c) into vector component i using the relation
(r, c) = (i/m, i%m).
numerical algorithms 199
y = Ax (4.106)
Here the linear operator A represents the effects of the lens, which trans-
forms one set of pixels into another.
We can model the lens as a sequence of β smearing operators:
A = Sβ (4.107)
Figure 4.6: An out-of-focus image (left) and the original image (image) computed from
the out-of-focus one, using sparse matrix inversion.
When the Hubble telescope was first put into orbit, its mirror was not
installed properly and caused the telescope to take pictures out of focus.
Until the defect was physically corrected, scientists were able to fix the
images using a similar algorithm.
f (x) = 0 (4.109)
numerical algorithms 201
x1 = g ( x0 ) (4.110)
x2 = g ( x1 ) (4.111)
x3 = g ( x2 ) (4.112)
... = ... (4.113)
| xi − x | = | g( xi−1 ) − g( x )| (4.114)
= | g( x ) + g0 (ξ )( xi−1 − x ) − g( x )| (4.115)
0
= | g (ξ )|| xi−1 − x | (4.116)
The goal of the bisection [39] method is to solve f ( x ) = 0 when the func-
tion is continuous and it is known to change sign in between x = a and
x = b. The bisection method is the continuous equivalent of the binary
search algorithm seen in chapter 3. The algorithm iteratively finds the
middle point of the domain x = (b + a)/2, evaluates the function there,
and decides whether the solution is on the left or the right, thus reducing
the size of the domain from ( a, b) to ( a, x ) or ( x, b), respectively:
f ( x ) ' f ( x0 ) + f 0 ( x0 )( x − x0 ) (4.118)
f ( x0 )
f ( x ) = 0 → x = x0 − (4.119)
f 0 ( x0 )
thus finding a new and better estimate for the solution. The algorithm
iterates the preceding equation, and when it converges, it approximates
the exact solution better and better:
The secant method is very similar to the Newton’s method, except that
f 0 ( x ) is replaced by a numerical estimate computed using the current
point x and the previous point visited by the algorithm:
f ( x i ) − f ( x i −1 )
f 0 ( xi ) = (4.120)
x i − x i −1
f (x )
xi +i = xi − 0 i (4.121)
f ( xi )
Here is an example:
For this reason, if the function is differentiable twice, we can simply re-
name all previous solvers and replace f ( x ) with f 0 ( x ) and f 0 ( x ) with
f 00 ( x ).
Here is an example:
As in the Newton case, the secant method can also be used to find ex-
trema, by replacing f with f 0 :
206 annotated algorithms in python
2 a,b=float(a),float(b)
3 tau = (sqrt(5.0)-1.0)/2.0
4 x1, x2 = a+(1.0-tau)*(b-a), a+tau*(b-a)
5 fa, f1, f2, fb = f(a), f(x1), f(x2), f(b)
6 for k in xrange(ns):
7 if f1 > f2:
8 a, fa, x1, f1 = x1, f1, x2, f2
9 x2 = a+tau*(b-a)
10 f2 = f(x2)
11 else:
12 b, fb, x2, f2 = x2, f2, x1, f1
13 x1 = a+(1.0-tau)*(b-a)
14 f1 = f(x1)
15 if k>2 and norm(b-a)<max(ap,norm(b)*rp): return b
16 raise ArithmeticError('no convergence')
Here is an example:
Figure 4.7: Pictorial representation of the golden search method. If the function is
concave ( f 00 ( x ) > 0), then knowledge of the function in 4 points (a,x1 ,x2 ,b) permits us
to determine whether a minimum is between [ a, x2 ] or between [ x1 , b].
.
208 annotated algorithms in python
∂ f (x) f ( x + hi ) − f ( x − hi )
= lim (4.122)
∂xi h →0 2h
∂ f (x̄)
f ( x0 , x1 , x2 , . . . ) = f ( x̄0 , x̄1 , x̄2 , . . . ) + ∑ ( xi − x̄i )
i
∂xi
1 ∂2 f
+∑ (x̄)( xi − x̄i )( x j − x̄ j ) + . . . (4.123)
ij
2 ∂xi ∂x j
1
f (x) = f (x̄) + ∇ f (x̄)(x − x̄) + (x − x̄)t H f (x̄)(x − x̄) + . . . (4.124)
2
∂ f ( x )/∂x0
∂ f ( x )/∂x
1
∇ f (x) ≡ (4.125)
∂ f ( x )/∂x2
...
Here is an example:
f ( x ) = ( f 0 ( x ), f 1 ( x ), f 2 ( x ), . . . ) (4.127)
f 0 (x) f 0 (x̄) + ∇ f0 (x − x̄) + . . .
f (x) f (x̄) + ∇ (x − x̄) + . . .
f (x) = 1 = 1 f1
(4.128)
f 2 (x) f 2 (x̄) + ∇ f2 (x − x̄) + . . .
... ...
∂ f 0 ( x )/∂x0 ∂ f 0 ( x )/∂x1 ∂ f 0 ( x )/∂x2 ...
∂ f ( x )/∂x ∂ f 1 ( x )/∂x1 ∂ f 1 ( x )/∂x2 . . .
0
Jf ≡ 1 (4.130)
∂ f 2 ( x )/∂x0 ∂ f 2 ( x )/∂x1 ∂ f 2 ( x )/∂x2
. . .
... ... ... ...
Here is an example:
We can now solve eq. 4.129 iteratively as we did for the one-dimensional
Newton solver with only one change—the first derivative of f is replaced
by the Jacobian:
Here is an example:
As for the one-dimensional case, we can approximate f (x) with its Taylor
expansion at the first order,
1
f (x) = f (x̄) + ∇ f (x̄)(x − x̄) + (x − x̄)t H f (x̄)(x − x̄) (4.131)
2
set its derivative to zero, and solve it, thus obtaining
x = x̄ − H − 1
f ∇f (4.132)
5 Parameters
6 f is a function that takes a list and returns a scalar
7 x is a list
8
5 Parameters
6 f is a function that takes a list and returns a scalar
7 x is a list
8
9 Returns x, which maximizes of minimizes f(x)=0, as a list
10 """
11 x = Matrix(list(x))
12 fx = f(x.flatten())
13 for k in xrange(ns):
14 (grad,H) = (gradient(f,x.flatten()), hessian(f,x.flatten()))
15 if norm(H) < ap:
16 raise ArithmeticError('unstable solution')
17 (fx_old, x_old, x) = (fx, x, x-(1.0/H)*grad)
18 fx = f(x.flatten())
19 while fx>fx_old: # revert to steepest descent
20 (fx, x) = (fx_old, x_old)
21 norm_grad = norm(grad)
22 (x_old, x) = (x, x - grad/norm_grad*h)
23 (fx_old, fx) = (fx, f(x.flatten()))
24 h = h/2
25 h = norm(x-x_old)*2
26 if k>2 and h/2<max(ap,norm(x)*rp): return x.flatten()
27 raise ArithmeticError('no convergence')
defined as
yi − f ( xi , a, b) 2
χ2 (a, b) = ∑
δyi
(4.133)
i
f ( x, a, b) = ∑ a j f j (x, b) (4.134)
j
Here is an example:
y0 /δy0
y /δy
z= 1 1
(4.136)
y2 /δy2
...
and
f 0 ( x0 , b)/δy0 f 1 ( x0 , b)/δy0 f 2 ( x0 , b)/δy0 ...
f ( x , b)/δy f 1 ( x1 , b)/δy1 f 2 ( x1 , b)/δy1 . . .
A(b) = 0 1 1
(4.137)
f 0 ( x2 , b)/δy2 f 1 ( x2 , b)/δy2 f 2 ( x2 , b)/δy2
. . .
... ... ... ...
We can minimize this function in a using the linear least squares algo-
rithm, exactly:
a ( b ) = ( A ( b ) A ( b ) t ) −1 ( A ( b ) t z ) (4.139)
Here is an example:
1 >>> data = [(i, i+2.0*i**2+300.0/(i+10), 2.0) for i in xrange(1,10)]
2 >>> fs = [(lambda b,x: x), (lambda b,x: x*x), (lambda b,x: 1.0/(x+b[0]))]
3 >>> ab, chi2 = fit(data,fs,[5])
4 >>> print ab, chi2
5 [0.999..., 2.000..., 300.000..., 10.000...] ...
4.10 Integration
Z b
I= f ( x )dx (4.141)
a
and which measures the area under the curve y = f ( x ) delimited on the
left by x = a and on the right by x = b.
Figure 4.8: Visual representation of the concept of an integral as the area under a curve.
i <n
h
In ' ∑ 2 ( f i + f i +1 ) (4.142)
i =0
lim In → I (4.143)
n→∞
Figure 4.9: Visual representation of the trapezoid method for numerical integration.
And here we implement the limit by doubling the number of points until
convergence is achieved:
4.10.1 Quadrature
0
xdx = h2 /2 = ∑ ci (i/n)1 (4.145)
i
... = ... = ... (4.146)
Z 1
0
x n−1 dx = hn /n = ∑ ci (i/n)2 (4.147)
i
18 def integrate_quadrature_naive(f,a,b,n=20,order=4):
19 a,b = float(a),float(b)
20 h = float(b-a)/n
21 q = QuadratureIntegrator(order=order)
22 return sum(q.integrate(f,a+i*h,a+i*h+h) for i in xrange(n))
u f ≡ {c f ( x0 ), c f ( x1 ), c f ( x2 ), . . . c f ( x N )} (4.148)
p
where c is an arbitrary constant that we choose to be c = (b − a)/N.
This choice simplifies our later algebra. Summarizing, we define
r
b−a
ufk ≡ f ( xk ) (4.149)
N
Given any two functions, we can define their scalar product as the limit
of N → ∞ of the scalar product between their corresponding vectors:
b−a
f · g ≡ lim u f · u g = lim
N →∞ N →∞ N ∑ f ( xk ) g( xk ) (4.150)
k
Z b
f ·g= f ( x ) g( x )dx (4.151)
a
ui = ∑ v j bji (4.153)
i
vj = ∑ ui bji (4.154)
i
vj = ∑ uk bjk (4.155)
k
= ∑(∑ vi bik )b jk (4.156)
k i
= ∑ vi (∑ bik b jk ) (4.157)
i k
= ∑ vi δij (4.158)
i
= vj (4.159)
In other words, once we have a basis of vectors, the vector u can be rep-
resented in terms of the vector v of v j coefficients and, conversely, v can
be computed from u; u and v contain the same information.
numerical algorithms 223
4 def antitransform(v,b):
5 return [sum(v[i]*bi[k] for i,bi in enumerate(b)) for k in xrange(len(v))]
1
b ji = √ e2πIij/N (4.161)
2π
where I is the imaginary unit. With this choice, the T + and T − functions
become
1
vj = N− 2
FT +
∑ ui e2π Iij/N (4.162)
i
1
ui = N − 2
FT −
∑ v j e−2π Iij/N (4.163)
j
and they take the names of Fourier transform and anti-transform [42],
respectively; we can implement them as follows:
224 annotated algorithms in python
Here 1j is the Python notation for I and cexp is the exponential function
for complex numbers.
Notice how the transformation works even when u is a vector of complex
numbers.
Something special happens when u is real:
We can speed up the code even more using recursion and by observing
that if N is a power of 2, then
1 1
vj = N− 2 ∑ u2i e2πI (2i) j/N + N − 2 ∑ u2i+1 e2π I (2i+1) j/N (4.166)
i i
1
= 2− 2 (veven
j + e2πj/N veven
j ) (4.167)
where veven
j is the Fourier transform of the even terms and vodd
j is the
Fourier transform of the odd terms.
The preceding recursive expression can be implemented using dynamic
programming, thus obtaining
1 from cmath import exp as cexp
2
3 def fast_fourier(u, sign=1):
4 N, sqrtN, D = len(u), sqrt(len(u)), xrange(len(u))
5 v = [ui/sqrtN for ui in u]
6 k = N/2
7 while k:
8 omega = cexp(2.0*pi*1j*k/N)
numerical algorithms 225
9 for i in D:
10 j = i ^ k
11 if i < k:
12 ik, jk = int(i/k), int(j/k)
13 v[i], v[j] = v[i]+(omega**ik)*v[j], v[i]+(omega**jk)*v[j]
14 k/=2
15 return v
16
17 def fast_anti_fourier(v):
18 return fast_fourier(v, sign=-1)
a0 f ( x ) + a1 f 0 ( x ) + a2 f 00 ( x ) + · · · = g( x ) (4.168)
f 00 ( x ) − 4 f 0 ( x ) + f ( x ) = sin( x ) (4.169)
∑ ak f (k) ( x ) = g ( x ) (4.170)
k
we obtain
g̃(y)
f˜(y) = (4.172)
(∑k ak (iy)k )
This is fine and useful when the Fourier transformations are easy to com-
pute.
A more practical numerical solution is the following. We define
y i ( x ) ≡ f (i ) ( x ) (4.174)
y00 = y1 (4.175)
y10 = y2 (4.176)
y20 = y3 (4.177)
... = ... (4.178)
y0N −1 = y N = ( g( x ) − ∑ ak yk ( x ))/a N ( x ) (4.179)
k< N
numerical algorithms 227
or equivalently
y0 = F (y) (4.180)
where
y1
y2
F (y) = y + y3 (4.181)
...
( g( x ) − ∑k< N ak ( x )yk ( x ))/a N ( x )
y( x + h) = y( x ) + hF (y, x ) (4.182)
k1 = F (y, x ) (4.184)
k2 = F (y + hk1 /2, x + h/2) (4.185)
k3 = F (y + hk2 /2, x + h/2) (4.186)
k4 = F (y + hk3 , x + h) (4.187)
5
Probability and Statistics
5.1 Probability
Probability derives from the Latin probare (to prove or to test). The word
probably means roughly “likely to occur” in the case of possible future
occurrences or “likely to be true” in the case of inferences from evidence.
See also probability theory.
What mathematicians call probability is the mathematical theory we use
to describe and quantify uncertainty. In a larger context, the word prob-
ability is used with other concerns in mind. Uncertainty can be due to
our ignorance, deliberate mixing or shuffling, or due to the essential ran-
domness of Nature. In any case, we measure the uncertainty of events on
a scale from zero (impossible events) to one (certain events or no uncer-
tainty).
There are three standard ways to define probability:
• (frequentist) Given an experiment and a set of possible outcomes S, the
probability of an event A ⊂ S is computed by repeating the experiment
N times, counting how many times the event A is realized, NA , then
taking the limit
NA
Prob( A) ≡ lim (5.1)
N →∞ N
230 annotated algorithms in python
c( A)
Prob( A) ≡ (5.2)
c(S)
Here Prob(A) computes the probability that the event is set A using N=1000
simulated experiments. The random.choice function picks one of the
choices at random with equal probability.
We can compute the same quantity using the a priori definition:
As stated before, the latter is more precise because it produces results for
an “ideal” dice while the frequentist’s approach produces results for a
real dice (in our case, a simulated dice).
Prob( A)Prob( B); therefore the probability that two independent events
occur is the product of the probability that each individual event occurs.
We can experiment with conditional probability using Python. Let’s con-
sider two dices, X and Y. The space of all possible outcomes is given by
S2 = S × S. And we are interested in the probability of the second die
giving a 6 given that the first dice is also a 6:
p( x ) ≡ Prob( X = x ) (5.4)
and
1
E[( X − 3.5)2 ] = ∑(xi − 3.5)2 p(xi ) = ∑ ( xi − 3.5)2
6
= 2.9167
i xi ∈{1,2,3,4,5,6}
(5.7)
We call E[ X ] the mean of X and usually denote it with µ X . We call E[( X −
µ X )2 ] the variance of X and denote it with σX2 . Note that
σX = E[ X 2 ] − E[ X ]2 (5.8)
As another example, let’s consider a simple bet on a dice. We roll the dice
once and win $20 if the dice returns 6; we lose $5 otherwise:
F ( x ) ≡ Prob( X ≤ x ) (5.9)
dF ( x )
p( x ) ≡ (5.10)
dx
and the probability that X falls into an interval [ a, b] can be computed as
Z b
Prob( a ≤ X ≤ b) = p( x )dx (5.11)
a
and
Z ∞ Z 1
1 1 1 1
E[( X − )2 ] = ( x − )2 p( x )dx = ( x2 − x + )dx = (5.14)
2 −∞ 2 0 4 12
σX2 = E[ X 2 ] − E[ X ]2 (5.15)
probability and statistics 235
By definition,
F (∞) ≡ Prob( X ≤ ∞) = 1 (5.16)
therefore Z ∞
Prob(−∞ ≤ X ≤ ∞) = p( x )dx = 1 (5.17)
−∞
The distribution p is always normalized to 1.
Moreover,
Z ∞
E[ aX + b] = ( ax + b) p( x )dx (5.18)
−∞
Z ∞ Z ∞
=a xp( x )dx + b p( x )dx (5.19)
−∞ −∞
= aE[ X ] + b (5.20)
Z ∞ Z b
1
E[ f ] = f ( x ) p( x )dx = f ( x )dx (5.21)
−∞ b−a a
We can also compute the same integral by using the definition of expec-
tation value for a discrete distribution:
1
E[ f ] = ∑ f ( xi ) p ( xi ) = N ∑ f ( xi ) (5.22)
xi xi
Z ∞ Z b
1 1
lim
N →∞ N
∑ f ( xi ) = −∞
f ( x ) p( x )dx =
b−a a
f ( x )dx (5.23)
xi
This is the simplest case of Monte Carlo integration, which is the subject
of a following chapter.
Given two random variables, X and Y, we define the covariance (cov) and
the correlation (corr) between them as
= E [ X ] E [Y ] (5.29)
therefore
Therefore
Moreover,
E [ ∑ a i Xi ] = ∑ a i E [ Xi ] (5.36)
i i
E[(∑ Xi )2 ] = ∑ E[Xi2 ] (5.37)
i i
Here is an example:
X1 + X2 + · · · + X n
lim =µ (5.38)
n→∞ n
This theorem means that “the average of the results obtained from a large
number of trials should be close to the expected value, and will tend to
become closer as more trials are performed.” The name of this law is due
to Poisson [43].
i< N
1
Y = lim
N →∞ N ∑ Xi (5.39)
i =0
We can numerically verify this for the simple case in which Xi are uniform
random variables with mean equal to 0:
probability and statistics 239
Figure 5.1: Example of distributions for sums of 1, 2, 4, and 8 uniform random variables.
The more random variables are added, the better the result approximates a Gaussian
distribution.
1
p( x ) ∼ , 0<α<2 (5.42)
x →∞ | x |1+ α
N Σ i ( Xi
1
where σ2 = E[( X − µ)2 ] = − µ )2 .
combinations.
• Case 2, n ≤ m. All plugs have to be used. We consider the first plug,
and we can select any of the m sockets (m combinations). We consider
the second plug, and we can select any of the remaining m − 1 sockets
(m − 1 combinations), and so on, until we are left with no spare plugs
and m − n free sockets; therefore there are
We have 52 cards, 26 black and 26 red. We shuffle the cards and pick
three.
• What is the probability that they are all red?
26 25 24 2
Prob(3red) = × × = (5.48)
52 51 50 17
2
Prob(3black) = Prob(3red) = (5.49)
17
• What is the probability that they are not all black or all red?
Here is an example of how we can simulate the deck of cards using Python
to compute an answer to the last questions:
5 ... counter = 0
6 ... for k in xrange(n):
7 ... c = pick_three_cards()
8 ... if not (c[0]==c[1] and c[1]==c[2]): counter += 1
9 ... return float(counter)/n
10 >>> print simulate_cards()
24 12
Prob(red) = = (5.54)
24 + 26 25
6
Random Numbers and Distributions
In the previous chapters, we have seen how using the Python random mod-
ule, we can generate uniform random numbers. This module can also
generate random numbers following other distributions. The point of this
chapter is to understand how random numbers are generated.
Before we proceed further, there are four important concepts that should
be defined because of their implications:
• Randomness is the characteristic of a process whose outcome is un-
predictable (e.g., at the moment I am writing this sentence, I cannot
predict the exact time and date when you will be reading it).
• Determinism is the characteristic of a process whose outcome can be
predicted from the initial conditions of the system (e.g., if I throw a ball
from a known position, at a known velocity and in a known direction,
I can predict—calculate—its entire future trajectory).
• Chaos is the emergence of randomness from order [46] (e.g., if I am on
the top of a hill and I throw the ball in a vertical direction, I cannot
predict on which side of the hill it is going to end up). Even if the
equations that describe a phenomenon are known and are determinis-
246 annotated algorithms in python
tic, it may happen that a small variation in the initial conditions causes
a large difference in the possible deterministic evolution of the system.
Therefore the outcome of a process may depend on a tiny variation
of the initial parameters. These variations may not be measurable in
practice, thus making the process unpredictable and chaotic. Chaos is
generally regarded as a characteristic of some differential equations.
• Order is the opposite of chaos. It is the emergence of regular and
reproducible patterns from a process that, in itself, may be random or
chaotic (e.g., if I keep throwing my ball in a vertical direction from
the top of a hill and I record the final location of the ball, I eventually
find a regular pattern, a probability distribution associated with my
experiment, which depends on the direction of the wind, the shape of
the hill, my bias in throwing the ball, etc.).
These four concepts are closely related, and they do not necessarily come
in opposite pairs as one would expect.
A deterministic process may cause chaos. We can use chaos to gener-
ate randomness (we will see examples when covering random number
generation). We can study randomness and extract its ordered properties
(probability distributions), and we can use randomness to solve determin-
istic problems (Monte Carlo) such as computing integrals and simulating
a system.
Note that randomness does not necessarily come from chaos. Random-
ness exists in nature [47][48]. For example, a radioactive atom “decays”
into a different atom at some random point in time. For example, an
atom of carbon 14 decays into nitrogen 14 by emitting an electron and a
neutrino
14 14 −
6 C −→7 N + e + νe (6.1)
at some random time t; t is unpredictable. It can be proven that the
randomness in the nuclear decay time is not due to any underlying deter-
ministic process. In fact, constituents of matter are described by quantum
random numbers and distributions 247
t0 , t1 , t2 , t3 , t4 , t5 . . . , (6.2)
One could study the probability distribution of the ti and find that it
follows an exponential probability distribution like
Note that the procedure can be applied to map any random sequence into
a Bernoulli sequence even if the numbers in the original sequence do not
follow an exponential distribution, as long as ti is independent of t j for
any j < i (memoryless distribution).
01011110110101010111011010 (6.5)
01011110-11010101-01110110-..... (6.6)
ing:
The Linux/Unix operating system provides its own entropy source acces-
sible via “/dev/urandom.” This data source is available in Python via
os.urandom().
Here we define a class that can access this entropy source and use it to
generate uniform random numbers. It follows the same process outlined
for the radioactive days:
1 class URANDOM(object):
2 def __init__(self, data=None):
3 if data: open('/dev/urandom','wb').write(str(data))
4 def random(self):
5 import os
6 n = 16
7 random_bytes = os.urandom(n)
8 random_integer = sum(ord(random_bytes[k])*256**k for k in xrange(n))
9 random_float = float(random_integer)/256**n
6.4 Pseudo-randomness
The output numbers “look” random but are not truly random. Running
the same code with the same seed generates the same output. Notice the
following:
• PRNGs are typically implemented as a recursive expression that, given
xi−1 , produces xi .
• PRNGs have to start from an initial value, x0 , called the seed. A typical
choice is to set the seed equal to the number of seconds from the con-
ventional date and time “Thu Jan 01 01:00:00 1970.” This is not always
a good choice.
• PRNGs are periodic. They generate numbers in a finite set and then
they repeat themselves. It is desirable to have this set as large as possi-
ble.
• PRNGs depend on some parameters (e.g., a and m). Some parameter
choices lead to trivial random number generators. In general, some
choices are better than others, and a few are optimal. In particular,
the values of a and m determine the period of the random number
generator. An optimal choice is the one with the longest period.
For a linear congruential generator, because of the mod operation, the pe-
riod is always less than or equal to m. When c is nonzero, the period is
equal to m only if c and m are relatively prime, a − 1 is divisible by all
prime factors of m, and a − 1 is a multiple of 4 when m is a multiple of 4.
In the case of RANDU, the period is m/4. A better choice is using a = 75
and m = 231 − 1 (known as the Mersenne prime number) because it can
be proven that m is in fact the period of the generator:
Figure 6.1: In this plot, each three consecutive random numbers (from RANDU) are
interpreted as ( x, y, z) coordinates of a random point. The image clearly shows the
points are not distributed at random. Image from ref. [51].
v i = ( x i , x i +1 , . . . , x i + k ) (6.16)
and the coordinates of the point vi are numbers closer to each other then
the coordinates of the point vi+k are also close to each other. This indicates
that there is an unwanted correlation between the points xi , xi+1 , . . . , xi+k .
Lüscher observed [50] that the Marsaglia’s subtract-and-borrow is equiv-
alent to a chaotic discrete dynamic system, and the preceding correlation
dies off for points that distance themselves more than k. Therefore he pro-
posed to modify the generator as follows: instead of taking all xi numbers,
read k successive elements of the sequence, discard p − k numbers, read k
numbers, and so on. The number p has to be chosen to be larger than k.
When p = k, the original Marsaglia generator is recovered.
x i = f ( x i −1 , x i −2 , . . . , x i − k ) (6.19)
For example, the next random number depends on the past k numbers.
Requirements for CSPRNGs used in cryptography are that
• Given xi−1 , xi−2 , . . . , xi−k , xi can be computed in polynomial time,
while
• Given xi , xi−2 , . . . , xi−k , xi−1 must not be computable in polynomial
time.
256 annotated algorithms in python
The first requirement means that the PRNG must be fast. The second
requirement means that if a malicious agent discovers a random number
used as a key, he or she cannot easily compute all previous keys generated
using the same PRNG.
One of the best PRNG algorithms (because of its long period, uniform
distribution, and speed) is the Mersenne twister, which produces a 53-bit
random number, and it has a period of 219937 − 1 (this number is 6002
digits long!). The Python random module uses the Mersenne twister. Al-
though discussing the inner working of this algorithm is beyond the scope
of these notes, we provide a pure Python implementation of the Mersenne
twister:
17 K = [0x0, 0x9908b0df]
18 y = 0
19 if wi >= N:
20 for kk in xrange((N-M) + 1):
21 y = (w[kk]&U)|(w[kk+1]&L)
22 w[kk] = w[kk+M] ^ (y >> 1) ^ K[y & 0x1]
23
24 for kk in xrange(kk, N):
25 y = (w[kk]&U)|(w[kk+1]&L)
26 w[kk] = w[kk+(M-N)] ^ (y >> 1) ^ K[y & 0x1]
27 y = (w[N-1]&U)|(w[0]&L)
28 w[N-1] = w[M-1] ^ (y >> 1) ^ K[y & 0x1]
29 wi = 0
30 y = w[wi]
31 wi += 1
32 y ^= (y >> 11)
33 y ^= (y << 7) & 0x9d2c5680
34 y ^= (y << 15) & 0xefc60000
35 y ^= (y >> 18)
36 return (float(y)/0xffffffff )
Note that the second sequence is the same as the first but shifted by two
lines.
Three standard techniques for generating independent sequences are non-
overlapping blocks, leapfrogging, and Lehmer trees.
x 0 , x 1 , . . . , x k −1 (6.23)
xk , xk+1 , . . . , x2k−1 (6.24)
x2k , x2k+1 , . . . , x3k−1 (6.25)
... (6.26)
if the seed of the arbitrary sequence is xnk , xnk+1 , . . . , xnk−1 . This is partic-
ularly convenient for parallel computers where one computer generates
random numbers and distributions 259
the seeds for the subsequences and the processing nodes, independently,
generated the subsequences.
6.5.2 Leapfrogging
The seeds x1 , x2 , ..xk−1 are generated from x0 , and the independent se-
quences can be generated independently using the formula
xi+k = ak xi modm (6.34)
Therefore leapfrogging is a viable technique for parallel random number
generators.
Here is an example of a usage of leapfrog:
Lehmer trees are binary trees, generated recursively, where each node
contains a random number. We start from the root containing the seed,
x0 , and we append two children containing, respectively,
there is the most room for the darts to land where the curve is highest
and thus the probability density is greatest.
The g distribution is nothing but a shape so that all darts we throw are
below it. There are two particular cases. In one case, g = p, we only throw
darts below the p that we want; therefore we accept them all. This is the
most efficient case, but it is not of practical interest because it means the
accept–reject is not doing anything, as we already know now to generate
numbers according to p. The other case is g( x ) = constant. This means
we generate the x uniformly before the accept–reject. This is equivalent to
throwing the darts everywhere on the square board, without even trying
to be below the curve p.
The inversion method instead is more efficient but requires some math. It
states that if F ( x ) is a cumulative distribution function and u is a uniform
random number between 0 and 1, then x = F −1 (u) is a random number
with distribution p( x ) = F 0 ( x ). For those distributions where F can be
expressed in analytical terms and inverted, the inversion method is the
best way to generate random numbers. An example is the exponential
distribution.
We will create a new class RandomSource that includes methods to generate
the random number.
2 def __init__(self,generator=None):
3 if not generator:
4 import random as generator
5 self.generator = generator
6 def random(self):
7 return self.generator.random()
8 def randint(self,a,b):
9 return int(a+(b-a+1)*self.random())
p
if k = 1
p(k) ≡ 1− p if k = 0 (6.37)
0 if not k ∈ {0, 1}
Let’s say we want a random number generator that can only produce the
outcomes 0, 1 or 2 with known probabilities:
5 frequency[0]=0.600000
6 frequency[1]=0.140000
7 frequency[2]=0.260000
a00 a01 a02 ... a0,n−1
... ... ... ...
aij = (6.41)
at−2,0 at−2,1 at−2,2 at−2,3
at−1,0 at−1,1
In other words, we can say that
• aij represents the probability
Prob( X = x j ) (6.42)
This algorithm works like the binary search, and at each step, it confronts
the uniform random number u with aij and decides if u falls in the range
{ xk |2i j ≤ k < 2i ( j + 1)} or in the complementary range { xk |2i ( j + 1) ≤
k < 2i ( j + 2)} and decreases i.
Here is the algorithm implemented as a class member function. The con-
structor of the class creates an array a once and for all. The method
discrete_map maps a uniform random number u into the desired discrete
integer:
1 class FishmanYarberry(object):
2 def __init__(self,table=[[0,0.2], [1,0.5], [2,0.3]]):
3 t=log(len(table),2)
4 while t!=int(t):
5 table.append([0,0.0])
266 annotated algorithms in python
6 t=log(len(table),2)
7 t=int(t)
8 a=[]
9 for i in xrange(t):
10 a.append([])
11 if i==0:
12 for j in xrange(2**t):
13 a[i].append(table[j,1])
14 else:
15 for j in xrange(2**(t-i)):
16 a[i].append(a[i-1][2*j]+a[i-1][2*j+1])
17 self.table=table
18 self.t=t
19 self.a=a
20
21 def discrete_map(self, u):
22 i=int(self.t)-1
23 j=0
24 b=0
25 while i>0:
26 if u>b+self.a[i][j]:
27 b=b+self.a[i][j]
28 j=2*j+2
29 else:
30 j=2*j
31 i=i-1
32 if u>b+self.a[i][j]:
33 j=j+1
34 return self.table[j][0]
n k
p(k ) = Prob( X = k) ≡ p (1 − p ) n − k (6.46)
k
for k = 0, 1, 2, . . . , n.
The formula can be understood as follows: we want k successes (pk ) and
n − k failures ((1 − p)n−k ). However, the k successes can occur anywhere
among the n trials, and there are (nk) different ways of distributing k suc-
cesses in a sequence of n trials.
The mean is µ X = np, and the variance is σX2 = np(1 − p).
If X and Y are independent binomial variables, then X + Y is again a
binomial variable; its distribution is
n X + nY k
p(k ) = Prob( X = k) = p (1 − p ) n − k (6.47)
k
For large n, it may be convenient to avoid storing the table and use the for-
mula directly to compute its elements on a need-to-know basis. Moreover,
because the table is accessed sequentially by the table lookup algorithm,
one may just notice that the current recursive relation holds:
Prob( X = 0) = (1 − p ) n (6.49)
n p
Prob( X = k + 1) = Prob( X = k) (6.50)
k+11− p
2 u = self.random()
3 q = (1.0-p)**n
4 for k in xrange(n+1):
5 if u<q+epsilon:
6 return k
7 u = u - q
8 q = q*(n-k)/(k+1)*p/(1.0-p)
9 raise ArithmeticError('invalid probability')
n−1 k
p(n) = Prob( X = n) = p (1 − p ) n − k (6.51)
k−1
Here is an example:
John, a kid, is required to sell candy bars in his neighborhood to raise
money for a field trip. There are thirty homes in his neighborhood, and
he is told not to return home until he has sold five candy bars. So the
boy goes door to door, selling candy bars. At each home he visits, he has
a 0.4 probability of selling one candy bar and a 0.6 probability of selling
nothing.
• What’s the probability of selling the last candy bar at the nth house?
n−1
p(n) = 0.45 0.6n−5 (6.52)
4
Prob( X = k) = pk (6.56)
n
Prob( X = n + 1) = (1 − p)Prob( X = n) (6.57)
n−k+1
1 def negative_binomial(self,k,p,epsilon=1e-6):
2 u = self.random()
3 n = k
4 q = p**k
5 while True:
6 if u<q+epsilon:
7 return n
8 u = u - q
9 q = q*n/(n-k+1)*(1-p)
10 n = n + 1
11 raise ArithmeticError('invalid probability')
Notice once again that, unlike the binomial case, here k is fixed, not n,
and the random variable has a minimum value of k but no upper bound.
270 annotated algorithms in python
λk
p(k) = Prob( X = k ) = e−λ (6.58)
k!
λ n−k
k
λk
n! λ 1
1− ' e−λ + O ( ) (6.59)
(n − k)!k! n n k! n
lim pn (6.60)
n→∞
Note how this algorithm may take an arbitrary amount of time to generate
a Poisson distributed random number, but eventually it stops. If u is very
close to 1, it is possible that errors due to finite machine precision cause
the algorithm to enter into an infinite loop. The +ε term can be used to
correct this unwanted behavior by choosing ε relatively small compared
with the precision required in the computation, but larger than machine
precision.
1 def uniform(self,a,b):
2 return a+(b-a)*self.random()
p( x ) = λe−λx (6.64)
1 ( x − µ )2
−
p( x ) = √ e 2σ2 (6.69)
σ 2π
11 self.other = sqrt(-2.0*log(r)/r)*v1
12 return mu+sigma*this
Note how the first time the method next is called, it generates two Gaus-
sian numbers (this and other), stores other, and returns this. Every other
time, the method next is called if other is stored, and it returns a number;
otherwise it recomputes this and other again.
To map a random Gaussian number y with mean 0 and standard deviation
1 into another Gaussian number y0 with mean µ and standard deviation
σ,
y0 = µ + yσ (6.70)
We used this relation in the last line of the code.
This is also an important distribution, and Python has a function for it:
1 random.gauss(mu,sigma)
Z µ+ aσ
c= p( x )dx (6.71)
µ− aσ
x α
m
F ( x ) ≡ Prob( X < x ) = 1 − (6.72)
x
x = cos(2πu) (6.73)
y = sin(2πu) (6.74)
1 def point_in_sphere(self,radius=1.0):
2 while True:
3 x = self.uniform(-radius,radius)
4 y = self.uniform(-radius,radius)
5 z = self.uniform(-radius,radius)
6 if x*x+y*y*z*z < radius*radius:
7 return x,y,z
8
6.8 Resampling
1 def resample(S,size=None):
2 return [random.choice(S) for i in xrange(size or len(S))]
random numbers and distributions 281
6.9 Binning
Note that
282 annotated algorithms in python
Note that these frequencies differ from 0.1 for less than 3%, whereas some
of the preceding numbers differ from 0.11 for more than 20%.
7
Monte Carlo Simulations
7.1 Introduction
Monte Carlo methods are a class of algorithms that rely on repeated ran-
dom sampling to compute their results, which are otherwise determinis-
tic.
7.1.1 Computing π
π = 4 arctan 1 (7.1)
x2i+1
arctan x = ∑ (−1)i 2i + 1 (7.8)
i =0
1 Taylor expansion:
1 00 1
f ( x ) = f (0) + f 0 (0) x + f (0) x 2 + · · · + f (i ) (0) x 2 + . . . . (7.2)
2! i!
284 annotated algorithms in python
4
π= ∑ (−1)i 2i + 1 (7.9)
i =0
1 4 2 1 1
π= ∑ 16i − − −
8i + 1 8i + 4 8i + 5 8i + 6
(7.10)
i =0
3 pi=Decimal(0)
4 a,b,c,d = Decimal(4),Decimal(2),Decimal(1),Decimal(1)/Decimal(16)
5 for i in xrange(n):
6 i8 = Decimal(8)*i
7 pi=pi+(d**i)*(a/(i8+1)-b/(i8+4)-c/(i8+5)-c/(i8+6))
8 return pi
9 >>> pi_Plauffe(1000)
The preceding formula works and converges very fast and already in 100
iterations produces
π = 3.1415926535897932384626433 . . . (7.11)
There is a different approach based on the fact that π is also the area of a
circle of radius 1. We can draw a square or area containing a quarter of a
circle of radius 1. We can randomly generate points ( x, y) with uniform
distribution inside the square and check if the points fall inside the circle.
The ratio between the number of points that fall in the circle over the
total number of points is proportional to the area of the quarter of a circle
(π/4) divided by the area of the square (1).
Here is a program that implements this strategy:
1 from random import *
2
3 def pi_mc(n):
4 pi=0
5 counter=0
6 for i in xrange(n):
7 x=random()
8 y=random()
9 if x**2 + y**2 < 1:
10 counter=counter+1
11 pi=4.0*float(counter)/(i+1)
12 print i,pi
13
14 pi_mc(1000)
The merchant sells only one type of item that costs $100 per unit. The
merchant maintains N items in stock. The merchant pays $30 a day to
store each unit item in stock. What is the optimal N to maximize the
average daily income of the merchant?
This problem cannot easily be formulated analytically or reduced to the
computation of an integral, but it can easily be simulated.
In particular, we simulate many days and, for each day i, we start with N
items in stock, and we loop over each simulated visitor. If the visitor finds
an item in stock, he buys it with a 5% probability (producing an income
of $70), whereas if the item is not in stock, he buys it with 2% probability
(producing an income of $100). At the end of each day, we pay $30 for
each item remaining in stock.
Here is a program that takes N (the number of items in stock) and d (the
number of simulated days) and computes the average daily income:
1 def simulate_once(N):
2 profit = 0
3 loss = 30*N
4 instock = N
5 for j in xrange(int(gauss(976,352))):
6 if instock>0:
7 if random()<0.05:
8 instock=instock-1
9 profit = profit + 100
10 else:
11 if random()<0.02:
12 profit = profit + 100
13 return profit-loss
14
15 def simulate_many(N,ap=1,rp=0.01,ns=1000):
16 s = 0.0
17 for k in xrange(1,ns):
18 x = simulate_once(N)
19 s += x
20 mu = s/k
21 if k>10 and mu-mu_old<max(ap,rp*mu):
22 return mu
23 else:
24 mu_old = mu
25 raise ArithmeticError('no convergence')
From this we deduce that the optimal number of items to carry in stock is
about 50. We could increase the resolution and precision of the simulation
by increasing the number of simulated days and reducing the step of the
amount of items in stock.
Note that the statement gauss(976,352) generates a random floating point
number with a Gaussian distribution centered at 976 and standard devia-
tion equal to 352, whereas the statement
1 if random()<0.05:
1
the results until they converge µ = N ∑ xi
• A function to estimate the accuracy of the result and determine when
to stop the simulation, δµ < precision
1
µ=
N ∑ xi (7.12)
s
σ 1 1
δµ = √ =
N N N ∑ xi2 − µ2 (7.13)
1
∑ xi
[k]
µk = (7.14)
N
[k]
where xi is the ith element of resample k and µk is the average of that
resample.
290 annotated algorithms in python
We can now combine everything we have seen so far into a generic pro-
gram that can be used to perform the most generic Monte Carlo compu-
tation/simulation:
Our engine finds that the value of π with 68% confidence level is between
2.18 and 3.63, with the most likely value of 2.90. Of course, this is incor-
rect, because it generates too few samples, but the bounds are correct, and
that is what matters.
• There is no correlation between the time when a loss event occurs and
the amount of the loss.
• The time interval between losses is given by the exponential distribu-
tion (this is a Poisson process).
• The distribution of the loss amount is a Pareto distribution (there is a
fat tail for large losses).
• The average number of losses is 10 per day.
• The minimum recorded loss is $5000. The average loss is $15,000.
Our goal is to simulate one year of losses and to determine
• The average total yearly loss
• How much to save to make sure that in 95% of the simulated scenarios,
the losses can be covered without going broke
From these assumptions, we determine that the λ = 10 for the exponential
distribution and xm = 3000 for the Pareto distribution. The mean of the
Pareto distribution is αxm /(α − 1) = 15, 000, from which we determine
that α = 1.5.
We can answer the first questions (the average total loss) simply multiply-
ing the average number of losses per year, 52 × 5, by the number of losses
in one day, 10, and by the average individual loss, $15,000, thus obtaining
To answer the second question, we would need to study the width of the
distribution. The problem is that, for α = 1.5, the standard deviation of
the Pareto distribution is infinity, and analytical methods do not apply.
We can do it using a Monte Carlo simulation:
4 class RiskEngine(MCEngine):
5 def __init__(self,lamb,xm,alpha):
294 annotated algorithms in python
6 self.lamb = lamb
7 self.xm = xm
8 self.alpha = alpha
9 def simulate_once(self):
10 total_loss = 0.0
11 t = 0.0
12 while t<260:
13 dt = random.expovariate(self.lamb)
14 amount = self.xm*random.paretovariate(self.alpha)
15 t += dt
16 total_loss += amount
17 return total_loss
18
19 def main():
20 s = RiskEngine(lamb=10, xm=5000, alpha=1.5)
21 print s.simulate_many(rp=1e-4,ns=1000)
22 print s.var(95)
23
24 main()
successful path to reach stop. A path is successful if, for a given simula-
tion, all links in the path succeed in carrying the packet.
The key trick in solving this problem is in finding the proper representa-
tion for the network. Since we are not requiring to determine the exact
path but only proof of existence, we use the concept of equivalence classes.
We say that two nodes are in the same equivalence class if and only if
there is a successful path that connects the two nodes.
The optimal data structure to implement equivalence classes is DisjSets,
discussed in chapter 3.
To simulate the system, we create a class Network that extends MCEngine. It
has a simulate_once method that tries to send a packet from start to stop
and simulates the network once. During the simulation each link of the
network may be up or down with given probability. If there is a path con-
necting the start node to the stop node in which all links of the network
are up, than the packet transfer succeeds. We use the DisjointSets to rep-
resent sets of nodes connected together. If there is a link up connecting a
node from a set to a node in another set, than the two sets are joined. If,
in the end, the start and stop nodes are found to belong to the same set,
then there is a path and simulate_once returns 1, otherwise it returns 0.
19 def main():
20 s = NetworkReliability(100,start=0,stop=1)
21 for k in range(300):
22 s.add_link(random.randint(0,99),
23 random.randint(0,99),
24 random.random())
25 print s.simulate_many()
26
27 main()
Figure 7.3: Example of chain reaction within a fissile material. If the mass is small, most
of the decay products escape (left, sub-criticality), whereas if the mass exceeds a certain
critical mass, there is a self-sustained chain reaction (right).
monte carlo simulations 297
6 def __init__(self,radius,mean_free_path=1.91,threshold=200):
7 self.radius = radius
8 self.density = 1.0/mean_free_path
9 self.threshold = threshold
10 def point_on_sphere(self):
11 while True:
12 x,y,z = random.random(), random.random(), random.random()
13 d = math.sqrt(x*x+y*y+z*z)
14 if d<1: return (x/d,y/d,z/d) # project on surface
15 def simulate_once(self):
16 p = (0,0,0)
17 events = [p]
18 while events:
19 event = events.pop()
20 v = self.point_on_sphere()
21 d1 = random.expovariate(self.density)
22 d2 = random.expovariate(self.density)
23 p1 = (p[0]+v[0]*d1,p[1]+v[1]*d1,p[2]+v[2]*d1)
24 p2 = (p[0]-v[0]*d2,p[1]-v[1]*d2,p[2]-v[2]*d2)
25 if p1[0]**2+p1[1]**2+p1[2]**2 < self.radius:
26 events.append(p1)
27 if p2[0]**2+p2[1]**2+p2[2]**2 < self.radius:
28 events.append(p2)
29 if len(events) > self.threshold:
30 return 1.0
31 return 0.0
32
33 def main():
34 s = NuclearReactor(MCEngine)
35 data = []
36 s.radius = 0.01
37 while s.radius<21:
38 r = s.simulate_many(ap=0.01,rp=0.01,ns=1000,nm=100)
39 data.append((s.radius, r[1], (r[2]-r[0])/2))
40 s.radius *= 1.2
41 c = Canvas(title='Critical Mass',xlab='Radius',ylab='Probability Chain
Reaction')
42 c.plot(data).errorbar(data).save('nuclear.png')
43
44 main()
Fig. 7.3.3 shows the output of the program, the probability of a chain
reaction as function of the size of the uranium mass. We find a critical
radius between 2 cm and 10 cm, which corresponds to a critical mass
between 0.5 kg and 60 kg. The official number is 15 kg for uranium 233
and 60 kg for uranium 235. The lesson to learn here is that it is not safe
to accumulate too much fissile material together. This simulation can be
monte carlo simulations 299
and Z +∞
p( x )dx = 1 (7.18)
−∞
and
g( x ) = f ( x )/p( x ) (7.19)
g( x ) ≡ (b − a) f ( x )
300 annotated algorithms in python
12 def main():
13 s = MCIntegrator(f lambda x: math.sin(x),a=0,b=1)
14 print s.simulate_many()
15
16 main()
monte carlo simulations 301
This technique is very general and can be extended to almost any integral
assuming the integrand is smooth enough on the integration domain.
The choice (7.21) is not always optimal because the integrand may be very
small in some regions of the integration domain and very large in other
regions. Clearly some regions contribute more than others to the average,
and one would like to generate points with a probability mass function
that is as close as possible to the original integrand. Therefore we should
choose a p( x ) according to the following conditions:
• p( x ) is very similar and proportional to f ( x )
Rx
• given F ( x ) = −∞ p( x )dx, F −1 ( x ) can be computed analytically.
Any choice for p( x ) that makes the integration algorithm converge faster
with less calls to simulate_once is called a variance reduction technique.
p0 ( x0 ) = 0 or p1 ( x1 ) = 0 for x ∈
/D (7.27)
and Z
p0 ( x0 ) p1 ( x1 )dx0 dx1 = 1 (7.28)
and
f ( x0 , x1 )
g ( x0 , x1 ) = (7.29)
p0 ( x0 ) p1 ( x1 )
Therefore
i< N
1
I = E[ g( X0 , X1 )] =
N ∑ g( xi0 , xi1 ) (7.31)
i =0
p( x0 , . . . , xn−1 ) = 0 for x ∈
/D (7.33)
and Z
p( x0 , . . . , xn−1 )dx0 . . . dxn−1 = 1 (7.34)
and
g( x0 , . . . , xn−1 ) = f ( x0 , . . . , xn−1 )/p( x0 , . . . , xn−1 ) (7.35)
Therefore
i< N
1
I = E[ g( X0 , .., Xn−1 )] =
N ∑ g ( xi ) (7.38)
i =0
where for every point xi is a tuple ( xi0 , xi1 , . . . , xi,n−1 ).
As an example, we consider the integral
Z 1 Z 1 Z 1 Z 1
I= dx0 dx1 dx2 dx3 sin( x0 + x1 + x2 + x3 ) (7.39)
0 0 0 0
monte carlo simulations 303
X n +1 = X n + µ + ε n ∆ x (7.40)
n!
p n + (1 − p ) n − n + (7.42)
n+ !(n − n+ )!
n = n + + ni (7.43)
k = n+ − n− (7.44)
n!
Prob(n, k ) = p(n+k)/2 (1 − p)(n−k)/2 (7.45)
((n + k)/2)!((n − k)/2)!
Let’s assume an Ito process for our random walk: ε n is normally (Gaus-
sian) distributed. We consider discrete time intervals of equal length ∆t ,
2
at each time step if ε n = ε with probability mass function p(ε) = e−ε /2 .
It turns out that eq.(7.40) gives
i<n
Xn = nµ + ∆ x ∑ εi (7.46)
i =0
1 2 / (2n∆2 )
p ( Xn ) = p e−(Xn −nµ) x (7.47)
2πn∆2x
306 annotated algorithms in python
Notice how the mean and the variance of Xn are both proportional to n,
√
whereas the standard deviation is proportional to n.
Z b
1 2 2
Prob( a ≤ Xn ≤ b) = p e−(Xn −nµ) /(2n∆x ) dx (7.48)
2
2πn∆ x a
Z (b−nµ)/(√n∆ x )
1 2
= √ √ e− x /2 dx (7.49)
2π ( a − nµ ) / ( n∆ x )
b − nµ a − nµ
= erf( √ ) − erf( √ ) (7.50)
n∆ x n∆ x
max(Sτ − A, 0) (7.51)
where r is the risk-free interest rate. This value corresponds to how much
we would have to borrow today from a bank so that we can repay the
bank at time τ with the profit from the option.
All our knowledge about the future spot price x = Sτ of the underlying
asset can be summarized into a probability mass function pτ ( x ). Under
the assumption that pτ ( x ) is known to both the buyer and the seller of
the option, it has to be that the averaged net present value of the option is
zero for any of the two parties to want to enter into the contract. Therefore
Z +∞
Ccall = e−rτ max( x − A, 0) pτ ( x )dx (7.53)
−∞
Similarly, we can perform the same computations for a put option. A put
option gives the buyer the option to sell the asset on a given day at a fixed
price. This is an insurance against the price going down instead of going
up. The value of this option at expiration is
max( A − Sτ , 0) (7.54)
Z +∞
−rτ
C put = e max( A − x, 0) pτ ( x )dx (7.55)
−∞
Also notice that Ccall − C put = S0 − Ae−rτ . This relation is called the
call-put parity.
308 annotated algorithms in python
where u > 1 and 0 < d < 1 are measures for historic data. It follows that
for τ = n∆t , the probability that the spot price of the asset at expiration is
Su ui dn−i is given by
i n −i n i
Prob(Sτ = Su u d )= p (1 − p ) n −i (7.58)
i
and therefore
i ≤n
1 n i
Ccall = e −rτ
n ∑ i
p (1 − p)n−i max(Su ui dn−i − A, 0) (7.59)
i =0
and
i≤n
1 n i
C put = e −rτ
n ∑ i
p (1 − p)n−i max( A − Su ui dn−i , 0) (7.60)
i =0
monte carlo simulations 309
The parameters of this model are u, d and p, and they must be determined
from historical data. For example,
er∆t − d
p= (7.61)
u−d
√
∆t
u = eσ (7.62)
√
∆t
d = e−σ (7.63)
where ∆t is the length of the time interval, r is the risk-free rate, and σ is
the volatility of the asset, that is, the standard deviation of the log returns.
Here is a Python code to simulate an asset price using a binomial tree:
1 def BinomialSimulation(S0,u,d,p,n):
2 data=[]
3 S=S0
4 for i in xrange(n):
5 data.append(S)
6 if uniform()<p:
7 S=u*S
8 else:
9 S=d*S
10 return data
The function takes the present spot value, S0 , of the asset, the values of u, d
and p, and the number of simulation steps and returns a list containing
the simulated evolution of the stock price. Note that because of the exact
formulas, eqs.(7.59) and (7.60), one does not need to perform a simulation
unless the underlying asset is a stock that pays dividends or we want to
include some other variable in the model.
This method works fine for European call options, but the method is not
easy to generalize to other options, when its depends on the path of the
asset (e.g., the asset is a stock that pays dividends). Moreover, to increase
precision, one has to decrease ∆t or redo the computation from the begin-
ning.
The Monte Carlo method that we see next is slower in the simple cases
but is more general and therefore more powerful.
310 annotated algorithms in python
return assumption.
If we are pricing a European call option, we are only interested in ST and
not in St for 0 < t < T; therefore we can choose ∆t = T. In this case, we
obtain
ST = S0 exp(r T ) (7.65)
and
(r − µT )2
p(r T ) ∝ exp − T 2 (7.66)
2σ T
16 def main():
17 pricer = EuropeanCallOptionPricer()
18 # parameters of the underlying
19 pricer.spot_price = 100 # dollars
20 pricer.mu = 0.12/250 # daily drift term
21 pricer.sigma = 0.30/sqrt(250) # daily variance
22 # parameters of the option
23 pricer.strike = 110 # dollars
24 pricer.time_to_expiration = 90 # days
25 # parameters of the market
26 pricer.risk_free_rate = 0.05 # 5% annual return
27
28 result = pricer.simulate_many(ap=0.01,rp=0.01) # precision: 1c or 1%
312 annotated algorithms in python
29 print result
30
31 main()
An option is a contract, and one can write a contract with many different
clauses. Each of them can be implemented into an algorithm. Yet we can
group them into three different categories:
• Non-path-dependent: They depend on the price of the underlying asset
at expiration but not on the intermediate prices of the asset (path).
• Weakly path-dependent: They depend on the price of the underlying
asset and events that may happen to the price before expiration, but
they do not depend on when the events exactly happen.
• Strongly path-dependent: They depend on the details of the time vari-
ation of price of the underlying asset before expiration.
Because non-path-dependent options do not depend on details, it is often
possible to find approximate analytical formulas for pricing the option.
For weakly path-dependent options, usually the binomial tree approach of
the previous section is a preferable approach. The Monte Carlo approach
applies to the general case, for example, that of strongly path-dependent
options.
We will use our MCEngine to implement a generic option pricer.
First we need to recognize the following:
• The value of an option at expiration is defined by a payoff function f ( x )
of the spot price of the asset at the expiration date. The fact that a call
option has payoff f ( x ) = max( x − A, 0) is a convention that defined the
European call option. A different type of option will have a different
payoff function f ( x ).
• The more accurately we model the underlying asset, the more accu-
rate will be the computed value of the option. Some options are more
sensitive than others to our modeling details.
monte carlo simulations 313
Note one never model the option. One only model the underlying asset.
The option payoff is given. We only choose the most efficient algorithm
based on the model and the option:
13 def model(self,dt=1.0):
14 return random.gauss(self.mu*dt, self.sigma*sqrt(dt))
15
16 def present_value(self,payoff):
17 daily_return = self.risk_free_rate/250
18 return payoff*exp(-daily_return*self.time_to_expiration)
19
20 def payoff_european_call(self, path):
21 return max(path[-1]-self.strike,0)
22 def payoff_european_put(self, path):
23 return max(path[-1]-self.strike,0)
24 def payoff_exotic_call(self, path):
25 last_5_days = path[-5]
26 mean_last_5_days = sum(last_5_days)/len(last_5_days)
27 return max(mean_last_5_days-self.strike,0)
28
29 def main():
30 pricer = GenericOptionPricer()
31 # parameters of the underlying
32 pricer.spot_price = 100 # dollars
33 pricer.mu = 0.12/250 # daily drift term
34 pricer.sigma = 0.30/sqrt(250) # daily variance
35 # parameters of the option
36 pricer.strike = 110 # dollars
37 pricer.time_to_expiration = 90 # days
38 pricer.payoff = pricer.payoff_european_call
39 # parameters of the market
40 pricer.risk_free_rate = 0.05 # 5% annual return
41
42 result = pricer.simulate_many(ap=0.01,rp=0.01) # precision: 1c or 1%
314 annotated algorithms in python
43 print result
44
45 main()
This code allows us to price any option simply by changing the payoff
function.
One can also change the model for the underlying using different as-
sumptions. For example, a possible choice is that of including a model
for market crashes, and on random days, separated by intervals given
by the exponential distribution, assume a negative jump that follows the
Pareto distribution (similar to the losses in our previous risk model). Of
course, a change of the model requires a recalibration of the parameters.
Figure 7.5: Price for a European call option for different spot prices and different values
of σ.
ber independently of the others because all the random variables were
uncorrelated. There are cases when we have the following problem.
We have to generate x = x0 , x1 , . . . , xn−1 , where x0 , x1 , . . . , xn−1 are n cor-
related random variables whose probability mass function
p ( x ) = p ( x 0 , x 1 , . . . , x n −1 ) (7.67)
p ( x ) = p ( x 0 , x 1 , . . . , x n −1 ) (7.68)
9 def P(x):
10 return 6.0*(x[0]-x[1])**2
11
12 def Q(x):
13 return [random.random(), random.random()]
14
15 for i, x in enumerate(metropolis(P,Q)):
16 print x
17 if i==100: break
E ( s ) = − ∑ si h − ∑ si s j (7.69)
i ij|distij =1
where h is the external magnetic field, the first sum is over all spin sites,
and the second is about all couples of next neighbor sites. In the absence
of spin-spin interaction, only the first term contributes, and the energy is
lower when the direction of the si (their sign) is the same as h. In absence
of h, only the second term contributes. The contribution to each couple of
spins is positive if their sign is the opposite, and negative otherwise.
In the absence of external forces, each system evolves toward the state
of minimum energy, and therefore, for a spin system, each spin tends to
align itself in the same direction as its neighbors and in the same direction
as the external field h.
Things change when we turn on heat. Feeding energy to the system
makes the atoms vibrate and the spins randomly flip. The higher the
318 annotated algorithms in python
E(s)
p(s) = exp − (7.70)
KT
where K is the Boltzmann constant.
We can now use the Metropolis algorithm to generate possible states of
the system s compatible with a given temperature T and measure the
effects on the average magnetization (the average spin) as a function of T
and possibly an external field h.
Also notice that in the case of the Boltzmann distribution,
5 class Ising:
6 def __init__(self, n):
7 self.n = n
8 self.s = [[[1 for x in xrange(n)] for y in xrange(n)]
9 for z in xrange(n)]
10 self.magnetization = n**3
11
monte carlo simulations 319
12 def __getitem__(self,point):
13 n = self.n
14 x,y,z = point
15 return self.s[(x+n)%n][(y+n)%n][(z+n)%n]
16
17 def __setitem__(self,point,value):
18 n = self.n
19 x,y,z = point
20 self.s[(x+n)%n][(y+n)%n][(z+n)%n] = value
21
22 def step(self,t,h):
23 n = self.n
24 x,y,z = random.randint(0,n-1),random.randint(0,n-1),random.randint(0,n
-1)
25 neighbors = [(x-1,y,z),(x+1,y,z),(x,y-1,z),(x,y+1,z),(x,y,z-1),(x,y,z+1)
]
26 dE = -2.0*self[x,y,z]*(h+sum(self[xn,yn,zn] for xn,yn,zn in neighbors))
27 if dE > t*math.log(random.random()):
28 self[x,y,z] = -self[x,y,z]
29 self.magnetization += 2*self[x,y,z]
30 return self.magnetization
31
32 def simulate(steps=100):
33 ising = Ising(n=10)
34 data = {}
35 for h in range(0,11): # external magnetic field
36 data[h] = []
37 for t in range(1,11): # temperature, in units of K
38 m = [ising.step(t=t,h=h) for k in range(steps)]
39 mu = mean(m) # average magnetization
40 sigma = sd(m)
41 data[h].append((t,mu,sigma))
42 return data
43
44 def main(name='ising.png'):
45 data = simulate(steps = 10000)
46 canvas = Canvas(xlab='temperature', ylab='magnetization')
47 for h in data:
48 color = '#%.2x0000' % (h*25)
49 canvas.errorbar(data[h]).plot(data[h],color=color)
50 canvas.save(name)
51
52 main()
Fig. 7.7.1 shows how the spins tend to align in the direction of the external
magnetic field, but the larger the temperature (left to right), the more
random they are, and the average magnetization tends to zero. The higher
the external magnetic field (bottom to top curves), the longer it takes for
320 annotated algorithms in python
Figure 7.6: Average magnetization as a function of the temperature for a spin system.
Figure 7.7: Random Ising states (2D section of 3D) for different temperatures.
monte carlo simulations 321
Figure 7.8: Schematic example of protein folding. The white circles are hydrophilic
amino-acids. The black ones are hydrophobic.
6 class Protein:
7
19 self.energy = self.compute_energy(self.folding)
20
21 def compute_folding(self,angles):
22 folding = {}
23 x,y,z = 0,0,0
24 k = 0
25 folding[x,y,z] = self.aminoacids[k]
26 for angle in angles:
27 k += 1
28 xn,yn,zn = self.moves[angle](x,y,z)
29 if (xn,yn,zn) in folding: return None # impossible folding
30 folding[xn,yn,zn] = self.aminoacids[k]
31 x,y,z = xn,yn,zn
32 return folding
33
34 def compute_energy(self,folding):
35 E = 0
36 for x,y,z in folding:
37 aminoacid = folding[x,y,z]
38 if aminoacid == 'H':
39 for face in range(6):
40 if not self.moves[face](x,y,z) in folding:
41 E = E + 1
42 return E
43
44 def fold(self,t):
45 while True:
46 new_angles = copy.copy(self.angles)
47 n = random.randint(1,len(self.aminoacids)-2)
48 new_angles[n] = random.randint(0,5)
49 new_folding = self.compute_folding(new_angles)
50 if new_folding: break # found a valid folding
51 new_energy = self.compute_energy(new_folding)
52 if (self.energy-new_energy) > t*math.log(random.random()):
53 self.angles = new_angles
54 self.folding = new_folding
55 self.energy = new_energy
56 return self.energy
57
58 def main():
59 aminoacids = ''.join(random.choice('HP') for k in range(20))
60 protein = Protein(aminoacids)
61 t = 10.0
62 while t>1e-5:
63 protein.fold(t = t)
64 print protein.energy, protein.angles
65 t = t*0.99 # cool
66
67 main()
324 annotated algorithms in python
The moves dictionary is a dictionary of functions. For each solid angle (0–
5), moves[angle] is a function that maps x,y,z, the coordinates of an amino
acid, to the coordinates of the cube at that solid angle.
The annealing procedure is performed in the main function. The fold
procedure is the same step as the Metropolis step. The purpose of the
while loop in the fold function is to find a valid fold for the accept–reject
step. Some folds are invalid because they are not physical and would
require two amino-acids to occupy the same portion of space. When this
happens, the compute_folding method returns None, indicating that one
must try a different folding.
8
Parallel Algorithms
In this example, the function g( x ) does not depend on the result of the
function f ( x ), and therefore the two functions could be computed inde-
pendently and in parallel.
Often large problems can be divided into smaller computational prob-
lems, which can be solved concurrently (“in parallel”) using different pro-
cessing units (CPUs, cores). This is called parallel computing. Algorithms
designed to work in parallel are called parallel algorithms.
In this chapter, we will refer to a processing unit as a node and to the
code running on a node as a process. A parallel program consists of
many processes running on as many nodes. It is possible for multiple
processes to run on one and the same computing unit (node) because of
the multitasking capabilities of modern CPUs, but that is not true parallel
computing. We will use an emulator, Psim, which does exactly that.
Programs can be parallelized at many levels: bit level, instruction level,
data, and task parallelism. Bit-level parallelism is usually implemented in
hardware. Instruction-level parallelism is also implemented in hardware
in modern multi-pipeline CPUs. Data parallelism is usually referred to as
326 annotated algorithms in python
In the MIMD case, multiple copies of the same problem run concurrently
(on different data subsets and branching differently, thus performing dif-
ferent instructions) on different processing units, and they exchange in-
formation using a network. How fast they can communicate depends on
the network characteristics identified by the network topology and the
latency and bandwidth of the individual links of the network.
Normally we classify network topologies based on the following taxon-
omy:
• Completely connected: Each node is connected by a directed link to
each other node.
parallel algorithms 329
• Bus topology: All nodes are connected to the same single cable. Each
computer can therefore communicate with each other computer using
one and the same bus cable. The limitation of this approach is that
the communication bandwidth is limited by the bandwidth of the ca-
ble. Most bus networks only allow two machines to communicate with
each other at one time (with the exception of one too many broad-
cast messages). While two machines communicate, the others are stuck
waiting. The bus topology is the most inexpensive but also slow and
constitutes a single point of failure.
• Switch topology (star topology): In local area networks with a switch
topology, each computer is connected via a direct link to a central de-
vice, usually a switch, and it resembles a star. Two computers can
communicate using two links (to the switch and from the switch). The
central point of failure is the switch. The switch is usually intelligent
and can reroute the messages from any computer to any other com-
puter. If the switch has sufficient bandwidth, it can allow multiple
computers to talk to each other at the same time. For example, for a 10
Gbit/s links and an 80 Gbit/s switch, eight computers can talk to each
other (in pairs) at the same time.
• Mesh topology: In a mesh topology, computers are assembled into an
array (1D, 2D, etc.), and each computer is connected via a direct link to
the computers immediately close (left, right, above, below, etc.). Next
neighbor communication is very fast because it involves a single link
and therefore low latency. For two computers not physically close to
communicate, it is necessary to reroute messages. The latency is pro-
portional to the distance in links between the computers. Some meshes
do not support this kind of rerouting because the extra logic, even if
unused, may be cause for extra latency. Meshes are ideal for solving
numerical problems such as solving differential equations because they
can be naturally mapped into this kind of topology.
• Torus topology: Very similar to a mesh topology (1D, 2D, 3D, etc.),
except that the network wraps around the edges. For example, in one
dimension node, i is connected to (i + 1)%p, where p is the total num-
ber of nodes. A one-dimensional torus is called a ring network.
330 annotated algorithms in python
• Tree network: The tree topology looks like a tree where the computer
may be associated with every tree node or every leaf only. The tree
links are the communication link. For a binary tree, each computer
only talks to its parent and its two children nodes. The root node is
special because it has no parent node.
Tree networks are ideal for global operations such as broadcasting and
for sharing IO devices such as disks. If the IO device is connected to the
root node, every other computer can communicate with it using only
log p links (where p is the number of computers connected). Moreover,
each subset of a tree network is also a tree network. This makes it easy
to distribute subtasks to different subsets of the same architecture.
• Hypercube: This network assumes 2d nodes, and each node corre-
sponds to a vertex of a hypercube. Nodes are connected by direct
links, which correspond to the edges of the hypercube. Its importance
is more academic than practical, although some ideas from hypercube
networks are implemented in some algorithms.
If we identify each node on the network with a unique integer number
called its rank, we write explicit code to determine if two nodes i and j
are connected for each network topology:
3 def BUS(i,j):
4 return True
5
6 def SWITCH(i,j):
7 return True
8
9 def MESH1(p):
10 return lambda i,j,p=p: (i-j)**2==1
11
12 def TORUS1(p):
13 return lambda i,j,p=p: (i-j+p)%p==1 or (j-i+p)%p==1
14
15 def MESH2(p):
16 q=int(math.sqrt(p)+0.1)
17 return lambda i,j,q=q: ((i%q-j%q)**2,(i/q-j/q)**2) in [(1,0),(0,1)]
18
parallel algorithms 331
19 def TORUS2(p):
20 q=int(math.sqrt(p)+0.1)
21 return lambda i,j,q=q: ((i%q-j%q+q)%q,(i/q-j/q+q)%q) in [(0,1),(1,0)] or \
22 ((j%q-i%q+q)%q,(j/q-i/q+q)%q) in [(0,1),(1,0)]
23 def TREE(i,j):
24 return i==int((j-1)/2) or j==int((i-1)/2)
• Number of links
• Diameter: The max distance between any two nodes measured as a
minimum number of links connecting them. Smaller diameter means
smaller latency. The diameter is proportional to the maximum time it
takes for a message go from one node to another.
• Bisection width: The minimum number of links one has to cut to turn
the network into two disjoint networks. Higher bisection width means
higher reliability of the network.
• Arc connectivity: The number of different paths (non-overlapping and
of minimal length) connecting any two nodes. Higher connectivity
means higher bandwidth and higher reliability.
Here are values of this parameter for each type of network:
Network Links Diameter Width
completely connected p( p − 1)/2 1 p−1
switch p 2 1
1D mesh p−1 p−1 1
1 n −1 2
nD mesh n ( p n − 1) p n n p3
p
1D torus p 2 2
n n1
nD torus np 2p 2n
p
hypercube 2 log2 p log2 p log2 p
tree p−1 log2 p 1
Most actual supercomputers implement a variety of taxonomies and
topologies simultaneously. A modern supercomputer has many nodes,
each node has many CPUs, each CPU has many cores, and each core im-
332 annotated algorithms in python
plements SIMD instructions. Each core has it own cache, each CPU has
its own cache, and each node has its own memory shared by all threads
running on that one node. Nodes communicate with each other using
multiple networks (typically a multidimensional mesh for point-to-point
communications and a tree network for global communication and gen-
eral disk IO).
This makes writing parallel programs very difficult. Parallel programs
must be optimized for each specific architecture.
The time it takes for a message of size m (in bytes) over a wire can be
broken into two components: a fixed overall time that does not depend on
the size of the message, called latency (and indicated with ts ), and a time
proportional to the message size, called inverse bandwidth (and indicated
with tw ).
Think of a pipe of length L and section s, and you want to pump m
liters of water through the pipe at velocity v. From the moment you start
pumping, it takes L/v seconds before the water starts arriving at the other
parallel algorithms 333
end of the pipe. From that moment, it will take m/sv for all the water to
arrive at its destination. In this analogy, L/v is the latency ts , sv is the
bandwidth, and tw = 1/sv.
The total time to send the message (or the water) is
T (m) = ts + tw m (8.1)
From now on, we will use T1 (n) to refer to the nonparallel running time
of an algorithm as a function of its input m. We will use Tp (n) to refer to
its running time with p parallel processes.
As a practical case, in the following example, we consider a generic algo-
rithm with the following parallel and nonparallel running times:
T1 (n) = t a n2 (8.2)
2
Tp (n) = t a n /p + 2p(ts + tw n/p) (8.3)
These formulas may come from example from the problem of multiplying
a matrix times a vector.
Here t a is the time to perform one elementary instruction; ts and tw are
the latency and inverse bandwidth. The first term of Tp is nothing but
T1 /p, while the second term is an overhead due to communications.
Typically ts >> tw >> t a . In the following plots, we will always assume
t a = 1, ts = 0.2, and tw = 0.1. With these assumptions, fig. 8.2.1 shows
how Tp changes with input size and number of parallel processes. Notice
that while for small p, Tp decreases ∝ 1/p, for large p, the communication
overhead dominates over computation. This overhead is our example and
is dominated by the latency contribution, which grows with p.
334 annotated algorithms in python
8.2.2 Speedup
T1 (n)
S p (n) = (8.4)
Tp (n)
8.2.3 Efficiency
S p (n) T (n)
E p (n) = = 1 (8.5)
p pTp (n)
8.2.4 Isoefficiency
E p (n) = E (8.6)
For example Tp , we obtain
1
Ep = =E (8.7)
1 + 2p2 (t 2
s + tw n/p ) / ( n t a )
Figure 8.5: Isoefficiency curves for different values of the target efficiency.
8.2.5 Cost
The cost of a computation is equal to the time it takes to run on each node,
multiplied by the number of nodes involved in the computation:
dC p (n)
= αT1 (n) > 0 (8.10)
dp
This means that for a fixed problem size n, the more an algorithm is par-
allelized, the more it costs to run it (because it gets less and less efficient).
338 annotated algorithms in python
1 1
Sp = < (8.13)
α + (1 − α)/p α
The function fork creates a copy of the current process (a child). The
parent process returns the ID of the child process, and the child process
returns 0. Therefore the if condition is both true and false, just on differ-
ent processes.
We have created a Python module called psim, and its source code is listed
here; psim forks the parallel processes, creates sockets connecting them,
and provides API for communications. An example of psim usage will be
given later.
9 def __init__(self,p,topology=SWITCH,logfilename=None):
10 """
340 annotated algorithms in python
28 def send(self,j,data):
29 """
30 sends data to process #j
31 """
32 if not self.topology(self.rank,j):
33 raise RuntimeError('topology violation')
34 self._send(j,data)
35
36 def _send(self,j,data):
37 """
38 sends data to process #j ignoring topology
39 """
40 if j<0 or j>=self.nprocs:
41 self.log("process %i: send(%i,...) failed!\n" % (self.rank,j))
42 raise Exception
43 self.log("process %i: send(%i,%s) starting...\n" % \
44 (self.rank,j,repr(data)))
45 s = pickle.dumps(data)
46 os.write(self.pipes[self.rank,j][1], string.zfill(str(len(s)),10))
47 os.write(self.pipes[self.rank,j][1], s)
48 self.log("process %i: send(%i,%s) success.\n" % \
49 (self.rank,j,repr(data)))
50
51 def recv(self,j):
52 """
53 returns the data received from process #j
54 """
55 if not self.topology(self.rank,j):
56 raise RuntimeError('topology violation')
57 return self._recv(j)
58
59 def _recv(self,j):
parallel algorithms 341
60 """
61 returns the data received from process #j ignoring topology
62 """
63 if j<0 or j>=self.nprocs:
64 self.log("process %i: recv(%i) failed!\n" % (self.rank,j))
65 raise RuntimeError
66 self.log("process %i: recv(%i) starting...\n" % (self.rank,j))
67 try:
68 size=int(os.read(self.pipes[j,self.rank][0],10))
69 s=os.read(self.pipes[j,self.rank][0],size)
70 except Exception, e:
71 self.log("process %i: COMMUNICATION ERROR!!!\n" % (self.rank))
72 raise e
73 data=pickle.loads(s)
74 self.log("process %i: recv(%i) done.\n" % (self.rank,j))
75 return data
3 comm = PSim(2,SWITCH)
4 if comm.rank == 0:
5 comm.send(1, "Hello World")
6 elif comm.rank == 1:
7 message = comm.recv(0)
8 print message
3 p = 10
342 annotated algorithms in python
4
5 comm = PSim(p,SWITCH)
6 if comm.rank == 0:
7 for other in range(1,p):
8 comm.send(other, "Hello %s" % p)
9 else:
10 message = comm.recv(0)
11 print message
4 def scalar_product_test1(n,p):
5 comm = PSim(p)
6 h = n/p
7 if comm.rank==0:
8 a = [random.random() for i in xrange(n)]
9 b = [random.random() for i in xrange(n)]
10 for k in xrange(1,p):
11 comm.send(k, a[k*h:k*h+h])
12 comm.send(k, b[k*h:k*h+h])
13 else:
14 a = comm.recv(0)
15 b = comm.recv(0)
16 scalar = sum(a[i]*b[i] for i in xrange(h))
17 if comm.rank == 0:
18 for k in xrange(1,p):
19 scalar += comm.recv(k)
20 print scalar
21 else:
22 comm.send(0,scalar)
23
24 scalar_product_test(10,2)
Most parallel algorithms follow a similar pattern. One process has access
to IO. That process reads and scatters the data. The other processes per-
form their part of the computation; the results are reduced (aggregated)
and sent back to the root process. This pattern may be repeated by multi-
ple functions, perhaps in loops. Different functions may handle different
parallel algorithms 343
data structure and may have different communication patterns. The one
thing that must be constant throughout the run is the number of processes
because one wants to pair each process with one computing unit.
In the following, we implement a parallel version of the mergesort. At
each step, the code splits the problem into two smaller problems. Half of
the problem is solved by the process that performed the split and assigns
the other half to an existing free process. When there are no more free
processes, it reverts to the nonparallel mergesort step. The merge function
here is the same as the nonparallel mergesort of chapter 3.
1 import random
2 from psim import PSim
3
12 def merge(A,x,y,z):
13 B,i,j = [],x,y
14 while True:
15 if A[i]<=A[j]:
16 B.append(A[i])
17 i=i+1
18 else:
19 B.append(A[j])
20 j=j+1
21 if i==y:
22 while j<z:
23 B.append(A[j])
24 j=j+1
25 break
26 if j==z:
27 while i<y:
28 B.append(A[i])
29 i=i+1
30 break
31 A[x:z]=B
32
33 def mergesort_test(n,p):
344 annotated algorithms in python
34 comm = PSim(p)
35 if comm.rank==0:
36 data = [random.random() for i in xrange(n)]
37 comm.send(1, data[n/2:])
38 mergesort(data,0,n/2)
39 data[n/2:] = comm.recv(1)
40 merge(data,0,n/2,n)
41 print data
42 else:
43 data = comm.recv(0)
44 mergesort(data)
45 comm.send(0,data)
46
47 mergesort_test(20,2)
8.3.1 Broadcast
The simplest type of broadcast is the one-2-all, which consists of one pro-
cess (source) sending a message (value) to every other process. A more
complex broadcast is when each process broadcasts a message simultane-
ously and each node receives the list of values ordered by the rank of the
sender:
13
14 def all2all_broadcast(self, value):
15 self.log("process %i: BEGIN all2all_broadcast(%s)\n" % \
16 (self.rank, repr(value)))
17 vector=self.all2one_collect(0,value)
18 vector=self.one2all_broadcast(0,vector)
19 self.log("process %i: END all2all_broadcast(%s)\n" % \
20 (self.rank, repr(value)))
21 return vector
3 p = 10
4
5 comm = PSim(p,SWITCH)
6 message = "Hello World" if comm.rank==0 else None
7 message = comm.one2all_broadcast(0, message)
8 print message
Notice how before the broadcast, only the process with rank 0 has knowl-
edge of the message. After broadcast, all nodes are aware of it. Also
notice that one2all_broadcast is a global communication function, and all
processes must call it. Its first argument is the rank of the broadcasting
process (0), while the second argument is the message to be broadcast
(only the value from node 0 is actually used).
4 if self.rank==source:
5 h, remainder = divmod(len(data),self.nprocs)
6 if remainder: h+=1
7 for i in xrange(self.nprocs):
8 self._send(i,data[i*h:i*h+h])
9 vector = self._recv(source)
10 self.log('process %i: END all2one_scatter(%i,%s)\n' % \
11 (self.rank,source,repr(data)))
12 return vector
13
14 def all2one_collect(self,destination,data):
15 self.log("process %i: BEGIN all2one_collect(%i,%s)\n" % \
16 (self.rank,destination,repr(data)))
17 self._send(destination,data)
18 if self.rank==destination:
19 vector = [self._recv(i) for i in xrange(self.nprocs)]
20 else:
21 vector = []
22 self.log("process %i: END all2one_collect(%i,%s)\n" % \
23 (self.rank,destination,repr(data)))
24 return vector
1 import random
2 from psim import PSim
3
4 def scalar_product_test2(n,p):
5 comm = PSim(p)
6 a = b = None
7 if comm.rank==0:
8 a = [random.random() for i in xrange(n)]
9 b = [random.random() for i in xrange(n)]
10 a = comm.one2all_scatter(0,a)
11 b = comm.one2all_scatter(0,b)
12
15 scalar = comm.all2one_reduce(0,scalar)
16 if comm.rank == 0:
17 print scalar
18
19 scalar_product_test2(10,2)
parallel algorithms 347
8.3.3 Reduce
The all-to-one reduce pattern is very similar to the collect, except that the
destination does not receive the entire list of values but some aggregated
information about the values. The aggregation must be performed using
a commutative binary function f ( x, y) = f (y, x ). This guarantees that the
reduction from the values go down in any order and thus are optimized
for different network topologies.
The all-to-all reduce is similar to reduce, but every process will get the re-
sult of the reduction, not just one destination node. This may be achieved
by an all-to-one reduce followed by a one-to-all broadcast:
And here are some examples of a reduce operation that can be passed to
the op argument of the all2one_reduce and all2all_reduce methods:
Graph algorithms can also be parallelized, for example, the Prim algo-
rithm. One way to do it is to represent the graph using an adjacency
matrix where term i, j corresponds to the link between vertex i and vertex
j. The term can be None if the link does not exist. Any graph algorithm, in
some order, loops over the vertices and over the neighbors. This step can
be parallelized by assigning different columns of the adjacency matrix to
different computing processes. Each process only loops over some of the
neighbors of the vertex being processed. Here is an example of the Prim
algorithm:
4 def random_adjacency_matrix(n):
5 A = []
6 for r in range(n):
7 A.append([0]*n)
8 for r in range(n):
9 for c in range(0,r):
10 A[r][c] = A[c][r] = random.randint(1,100)
11 return A
12
13 class Vertex(object):
14 def __init__(self,path=[0,1,2]):
15 self.path = path
16
20 def bb(adjacency,p=1):
21 n = len(adjacency)
22 comm = PSim(p)
23 Q = []
24 path = [0]
25 Q.append(Vertex(path))
26 bound = float('inf')
27 optimal = None
28 local_vertices = comm.one2all_scatter(0,range(n))
29 while True:
30 if comm.rank==0:
parallel algorithms 349
64 m = random_adjacency_matrix(5)
65 print bb(m,p=2)
8.3.4 Barrier
1 def barrier(self):
2 self.log("process %i: BEGIN barrier()\n" % (self.rank))
3 self.all2all_broadcast(0)
4 self.log("process %i: END barrier()\n" % (self.rank))
5 return
8.4 mpi4py
The Psim emulator does not provide any actual speedup unless you have
multiple cores or processors to execute the forked processes. A better
approach would be to use mpi4py [66] because it allows running different
processes on different machines on a network. mpi4py is a Python inter-
face to the message passing interface (MPI). MPI is a standard protocol
and API for interprocess communications. Its API are equivalent one by
one to those of PSim, except that they have different names and different
signatures.
Here is an example of using mpi4py:
1 from mpi4py import MPI
2
3 comm = MPI.COMM_WORLD
4 rank = comm.Get_rank()
5
6 if rank == 0:
7 message = "Hello World"
8 comm.send(message, dest=1, tag=11)
9 elif rank == 1:
10 message = comm.recv(source=0, tag=11)
11 print message
The comm object of class MPI.COMM_WORLD plays a similar role as the PSim
object of the previous section. The MPI send and recv functions are very
similar to the PSim equivalent ones, except that they require details about
the type of the data being transferred and a communication tag. The tag
allows node A to send multiple messages to B and allows B to receive
them out of order. PSim does not allow tags.
“Map” (implemented here in a function mapfn): The master node takes the
input data, partitions it into smaller subproblems, and distributes individ-
ual pieces of the data to worker nodes. A worker node may do this again
in turn, leading to a multilevel tree structure. The worker node processes
the smaller problem, computes a result, and passes that result back to its
master node.
“Reduce” (implemented here in a function reducefn): The master node
collects the partial results from all the subproblems and combines them
in some way to compute the answer to the problem it needs.
Map-Reduce allows for distributed processing of the map and reduction
operations. Provided each mapping operation is independent of the oth-
ers, all maps can be performed in parallel—though in practice, it is limited
by the number of independent data sources and/or the number of CPUs
near each source. Similarly, a set of “reducers” can perform the reduction
phase, provided all outputs of the map operation that share the same key
are presented to the same reducer at the same time.
While this process can often appear inefficient compared to algorithms
that are more sequential, Map-Reduce can be applied to significantly
larger data sets than “commodity” servers can handle—a large server
farm can use Map-Reduce to sort a petabyte of data in only a few hours,
which would require much longer in a monolithic or single process sys-
tem.
Parallelism also offers some possibility of recovering from partial failure
of servers or storage during the operation: if one mapper or reducer fails,
the work can be rescheduled—assuming the input data are still available.
Map-Reduce comprises of two main functions: mapfn and reducefn. mapfn
takes a (key,value) pair of data with a type in one data domain and returns
a list of (key,value) pairs in a different domain:
The mapfn function is applied in parallel to every item in the input data
set. This produces a list of (k2,v2) pairs for each call. After that, the
Map-Reduce framework collects all pairs with the same key from all lists
parallel algorithms 353
and groups them together, thus creating one group for each one of the
different generated keys. The reducefn function is then applied in parallel
to each group, which in turn produces a collection of values in the same
domain:
reducefn(k2, [list of v2]) → (k2, v3) (8.15)
The values returned by reducefn are then collected into a single list. Each
call to reducefn can produce one, none, or multiple partial results. Thus
the Map-Reduce framework transforms a list of (key, value) pairs into a
list of values. It is necessary but not sufficient to have implementations of
the map and reduce abstractions to implement Map-Reduce. Distributed
implementations of Map-Reduce require a means of connecting the pro-
cesses performing the mapfn and reducefn phases.
Here is a nonparallel implementation that explains the data workflow
better:
1 def mapreduce(mapper,reducer,data):
2 """
3 >>> def mapfn(x): return x%2, 1
4 >>> def reducefn(key,values): return len(values)
5 >>> data = xrange(100)
6 >>> print mapreduce(mapfn,reducefn,data)
7 {0: 50, 1: 50}
8 """
9 partials = {}
10 results = {}
11 for item in data:
12 key,value = mapper(item)
13 if not key in partials:
14 partials[key]=[value]
15 else:
16 partials[key].append(value)
17 for key,values in partials.items():
18 results[key] = reducer(key,values)
19 return results
And here is an example we can use to find how many random DNA
strings contain the subsequence “ACTA”:
1 >>> from random import choice
2 >>> strings = [''.join(choice('ATGC') for i in xrange(10))
3 ... for j in xrange(100)]
4 >>> def mapfn(string): return ('ACCA' in string, 1)
5 >>> def reducefn(check, values): return len(values)
354 annotated algorithms in python
The important thing about the preceding code is that there are two loops
in Map-Reduce. Each loop consists of executing tasks (map tasks and
reduce tasks) which are independent from each other (all the maps are
independent, all the reduce are independent, but the reduce depend on
the maps). Because they are independent, they can be executed in parallel
and by different processes.
A simple and small library that implements the map-reduce algorithm
in Python is mincemeat [68]. The workers connect and authenticate to the
server using a password and request tasks to executed. The server accepts
connections and assigns the map and reduce tasks to the workers.
The communication is performed using asynchronous sockets, which
means neither workers nor the master is ever in a wait state. The code
is event based, and communication only happens when a socket connect-
ing the master to a worker is ready for a write (task assignment) or a read
(task completed).
The code is also failsafe because if a worker closes the connection prema-
turely, the task is reassigned to another worker.
Function mincemeat uses the python libraries asyncore and asynchat to im-
plement the communication patterns, for which we refer to the Python
documentation.
Here is an example of a mincemeat program:
1 import mincemeat
2 from random import choice
3
8 s = mincemeat.Server()
9 s.mapfn = mapfn
10 s.reducefn = reducefn
11 s.datasource = dict(enumerate(strings))
12 results = s.run_server(password='changeme')
13 print results
parallel algorithms 355
Notice that in mincemeat, the data source is a list of key value dictionaries
where the values are the ones to be processed. The key is also passed
to the mapfn function as first argument. Moreover, the mapfn function can
return more than one value using yield. This syntactical notation makes
minemeat more flexible.
You can run more than one worker, although for this example the server
will terminate almost immediately.
Function mincemeat works fine for many applications, but sometimes one
wishes for a more powerful tool that provides faster communications,
support for arbitrary languages, and better scalability tools and monitor-
ing tools. An example in Python is disco. A standard tool, written in Java
but supporting Python, is Hadoop.
8.6 pyOpenCL
Nvidia should be credited for bringing GPU computing to the mass mar-
ket. They have developed the CUDA [69] framework for GPU program-
ming. CUDA programs consist of two parts: a host and a kernel. The
host deploys the kernel on the available GPU core, and multiple copies of
the kernel run in parallel.
Nvidia, AMD, Intel, and ARM have created the Kronos Group, and to-
gether they have developed the Open Common Language framework
(OpenCL [70]), which borrows many ideas from CUDA and promises
more portability. OpenCL supports Intel/AMD CPUs, Nvidia/ATI GPU,
ARM chips, and the LLVM virtual machine.
OpenCL is a C99 dialect. In OpenCL, like in CUDA, there is a host pro-
gram and a kernel. Multiple copies of the kernel are queued and run in
parallel on available devices. Kernels running on the same device have
356 annotated algorithms in python
Usually the kernel is written in C99, while the host is written in C++. It is
also possible to write the host code in other languages, including Python.
Here we will use the pyOpenCL [4] module for programming the host using
Python. This produces no significative loss of performance compared to
C++ because the actual computation is performed by kernel, not by the
host. It is also possible to write the kernels using Python. This can be
done using a library called Clyther [71] or one called ocl [5]. Here we
will use the latter; ocl performs a one-time conversion of Python code for
the kernel to C99 code. This conversion is done line by line and therefore
also introduces no performance loss compared to writing native OpenCL
kernels. It also provides an additional abstraction layer on top of pyOpenCL,
which will make our examples more compact.
dom numbers.
• It creates another numpy array w of the same size filled with zeros.
• It loops over all indices of w and adds, term by term, u and v storing
the result into w.
• It checks the result using the numpy linalg submodule.
Our goal is to parallelize the part of the computation performed in the
loop. Notice that our parallelization will not make the code faster because
this is a linear algorithm, and algorithms linear in the input are never
faster when parallelized because the communication has the same order
of growth as the algorithm itself:
1 from ocl import Device
2 import numpy as npy
3
4 n = 100000
5 u = npy.random.rand(n).astype(npy.float32)
6 v = npy.random.rand(n).astype(npy.float32)
7
8 device = Device()
9 u_buffer = device.buffer(source=a)
10 v_buffer = device.buffer(source=b)
11 w_buffer = device.buffer(size=b.nbytes)
12
13 kernels = device.compile("""
14 __kernel void sum(__global const float *u, /* u_buffer */
15 __global const float *v, /* v_buffer */
16 __global float *w) { /* w_buffer */
17 int i = get_global_id(0); /* thread id */
18 w[i] = u[i] + v[i];
19 }
20 """)
21
22 kernels.sum(device.queue,[n],None,u_buffer,v_buffer,w_buffer)
23 w = device.retrieve(w_buffer)
24
The device object encapsulate the kernel(s) and a queue for kernel sub-
mission.
The line:
1 kernels.sum(...,[n],...)
submits to the queue n instances of the sum kernel. Each kernel instance
can retrieve its own ID using the function get_global_id(0). Notice that a
kernel must be declared with the __kernel prefix. Arguments that are to
be shared by all kernels must be __global.
The Device class is defined in the “ocl.py” file in terms of pyOpenCL API:
1 import numpy
2 import pyopencl as pcl
3
4 class Device(object):
5 flags = pcl.mem_flags
6 def __init__(self):
7 self.ctx = pcl.create_some_context()
8 self.queue = pcl.CommandQueue(self.ctx)
9 def buffer(self,source=None,size=0,mode=pcl.mem_flags.READ_WRITE):
10 if source is not None: mode = mode|pcl.mem_flags.COPY_HOST_PTR
11 buffer = pcl.Buffer(self.ctx,mode, size=size, hostbuf=source)
12 return buffer
13 def retrieve(self,buffer,shape=None,dtype=numpy.float32):
14 output = numpy.zeros(shape or buffer.size/4,dtype=dtype)
15 pcl.enqueue_copy(self.queue, output, buffer)
16 return output
17 def compile(self,kernel):
18 return pcl.Program(self.ctx,kernel).build()
Here self.ctx is the device context, self.queue is the device queue. The
functions buffer, retrieve, and compile map onto the corresponding py-
OpenCL functions Buffer, enqueue_copy, and Program but use a simpler
syntax. For more details, we refer to the official pyOpenCL documentation.
Substitute them into eq. 8.16 and solve the equation in u( x, y). We obtain
u( x, y) = 1/4(u( x − h, y) + u( x + h, y) + u( x, y − h) + u( x, y + h) − h2 q( x, y))
(8.19)
We can therefore solve eq. 8.16 by iterating eq. 8.19 until convergence. The
initial value of u will not affect the solution, but the closer we can pick it
to the actual solution, the faster the convergence.
The procedure we utilized here for transforming a differential equation
into an iterative procedure is a general one and applies to other differen-
tial equations as well. The iteration proceeds very much as the fixed point
solver also examined in chapter 3.
Here is an implementation using ocl:
1 from ocl import Device
2 from canvas import Canvas
3 from random import randint, choice
4 import numpy
5
6 n = 300
7 q = numpy.zeros((n,n), dtype=numpy.float32)
8 u = numpy.zeros((n,n), dtype=numpy.float32)
9 w = numpy.zeros((n,n), dtype=numpy.float32)
10
360 annotated algorithms in python
11 for k in xrange(n):
12 q[randint(1, n-1),randint(1, n-1)] = choice((-1,+1))
13
14 device = Device()
15 q_buffer = device.buffer(source=q, mode=device.flags.READ_ONLY)
16 u_buffer = device.buffer(source=u)
17 w_buffer = device.buffer(source=w)
18
19
20 kernels = device.compile("""
21 __kernel void solve(__global float *w,
22 __global const float *u,
23 __global const float *q) {
24 int x = get_global_id(0);
25 int y = get_global_id(1);
26 int xy = y*WIDTH + x, up, down, left, right;
27 if(y!=0 && y!=WIDTH-1 && x!=0 && x!=WIDTH-1) {
28 up=xy+WIDTH; down=xy-WIDTH; left=xy-1; right=xy+1;
29 w[xy] = 1.0/4.0*(u[up]+u[down]+u[left]+u[right] - q[xy]);
30 }
31 }
32 """.replace('WIDTH',str(n)))
33
34 for k in xrange(1000):
35 kernels.solve(device.queue, [n,n], None, w_buffer, u_buffer, q_buffer)
36 (u_buffer, w_buffer) = (w_buffer, u_buffer)
37
38 u = device.retrieve(u_buffer,shape=(n,n))
39
40 Canvas().imshow(u).save(filename='plot.png')
We can now use the Python to C99 converter of ocl to write the kernel
using Python:
1 from ocl import Device
2 from canvas import Canvas
3 from random import randint, choice
4 import numpy
5
6 n = 300
7 q = numpy.zeros((n,n), dtype=numpy.float32)
8 u = numpy.zeros((n,n), dtype=numpy.float32)
9 w = numpy.zeros((n,n), dtype=numpy.float32)
10
11 for k in xrange(n):
12 q[randint(1, n-1),randint(1, n-1)] = choice((-1,+1))
13
14 device = Device()
15 q_buffer = device.buffer(source=q, mode=device.flags.READ_ONLY)
parallel algorithms 361
Figure 8.7: The image shows the output of the Laplace program and represents the
two-dimensional electrostatic potential for a random charge distribution.
16 u_buffer = device.buffer(source=u)
17 w_buffer = device.buffer(source=w)
18
19 @device.compiler.define_kernel(
20 w='global:ptr_float',
21 u='global:const:ptr_float',
22 q='global:const:ptr_float')
23 def solve(w,u,q):
24 x = new_int(get_global_id(0))
25 y = new_int(get_global_id(1))
26 xy = new_int(x*n+y)
27 if y!=0 and y!=n-1 and x!=0 and x!=n-1:
28 up = new_int(xy-n)
29 down = new_int(xy+n)
30 left = new_int(xy-1)
31 right = new_int(xy+1)
32 w[xy] = 1.0/4*(u[up]+u[down]+u[left]+u[right] - q[xy])
33
34 kernels = device.compile(constants=dict(n=n))
35
36 for k in xrange(1000):
37 kernels.solve(device.queue, [n,n], None, w_buffer, u_buffer, q_buffer)
38 (u_buffer, w_buffer) = (w_buffer, u_buffer)
362 annotated algorithms in python
39
40 u = device.retrieve(u_buffer,shape=(n,n))
41
42 Canvas().imshow(u).save(filename='plot.png')
It means that:
1 global const float* u
Because in C, one must declare the type of each new variable, we must do
the same in ocl. This is done using the pseudo-casting operators new_int,
new_float, and so on. For example,
1 a = new_int(b+c)
is converted into
1 int a = b+c;
The converter also checks the types for consistency. The return type is
determined automatically from the type of the object that is returned.
Python objects that have no C99 equivalent like lists, tuples, dictionar-
ies, and sets are not supported. Other types are converted based on the
following table:
parallel algorithms 363
ocl C99/OpenCL
a = new_type(...) type a = ...;
a = new_prt_type(...) type *a = ...;
a = new_prt_prt_type(...) type **a = ...;
None null
ADDR(x) &x
REFD(x) *x
CAST(prt_type,x) (type*)x
µ T x − rfree
max √ (8.20)
x x T Σx
symbol t for time (for daily data t is a day); n is the number of stocks, and
m is the number of trading days.
We use the canvas [11] library, based on the Python matplotlib library, to
display one of the stock price series and the resulting correlation matrix.
Following is the complete code. The output from the code can be seen in
fig. 8.6.3.
1 from ocl import Device
2 from canvas import Canvas
3 import random
4 import numpy
5 from math import exp
6
15 for k in xrange(n):
16 p[k,0] = 100.0
17 for t in xrange(1,m):
18 c = 1.0 if k==0 else (p[k-1,t]/p[k-1,t-1])
19 p[k,t] = p[k,t-1]*exp(random.gauss(0.0001,0.10))*c
20
21 device = Device()
22 p_buffer = device.buffer(source=p, mode=device.flags.READ_ONLY)
23 r_buffer = device.buffer(source=r)
24 mu_buffer = device.buffer(source=mu)
25 cov_buffer = device.buffer(source=cov)
26 cor_buffer = device.buffer(source=cor)
27
28 @device.compiler.define_kernel(p='global:const:ptr_float',
29 r='global:ptr_float')
30 def compute_r(p, r):
31 i = new_int(get_global_id(0))
32 for t in xrange(0,m-1):
33 r[i*m+t] = p[i*m+t+1]/p[i*m+t] - 1.0
34
35 @device.compiler.define_kernel(r='global:ptr_float',
36 mu='global:ptr_float')
37 def compute_mu(r, mu):
38 i = new_int(get_global_id(0))
39 sum = new_float(0.0)
parallel algorithms 365
40 for t in xrange(0,m-1):
41 sum = sum + r[i*m+t]
42 mu[i] = sum/(m-1)
43
44 @device.compiler.define_kernel(r='global:ptr_float',
45 mu='global:ptr_float', cov='global:ptr_float')
46 def compute_cov(r, mu, cov):
47 i = new_int(get_global_id(0))
48 j = new_int(get_global_id(1))
49 sum = new_float(0.0)
50 for t in xrange(0,m-1):
51 sum = sum + r[i*m+t]*r[j*m+t]
52 cov[i*n+j] = sum/(m-1)-mu[i]*mu[j]
53
54 @device.compiler.define_kernel(cov='global:ptr_float',
55 cor='global:ptr_float')
56 def compute_cor(cov, cor):
57 i = new_int(get_global_id(0))
58 j = new_int(get_global_id(1))
59 cor[i*n+j] = cov[i*n+j] / sqrt(cov[i*n+i]*cov[j*n+j])
60
61 program = device.compile(constants=dict(n=n,m=m))
62
63 q = device.queue
64 program.compute_r(q, [n], None, p_buffer, r_buffer)
65 program.compute_mu(q, [n], None, r_buffer, mu_buffer)
66 program.compute_cov(q, [n,n], None, r_buffer, mu_buffer, cov_buffer)
67 program.compute_cor(q, [n,n], None, cov_buffer, cor_buffer)
68
69 r = device.retrieve(r_buffer,shape=(n,m))
70 mu = device.retrieve(mu_buffer,shape=(n,))
71 cov = device.retrieve(cov_buffer,shape=(n,n))
72 cor = device.retrieve(cor_buffer,shape=(n,n))
73
74 points = [(x,y) for (x,y) in enumerate(p[0])]
75 Canvas(title='Price').plot(points).save(filename='price.png')
76 Canvas(title='Correlations').imshow(cor).save(filename='cor.png')
77
Notice how the memory buffers are always one-dimensional, therefore the
i,j indexes have to be mapped into a one-dimensional index i*n+j. Also
notice that while kernels compute_r and compute_mu are called [n] times
(once per stock k), kernels compute_cov and compute_cor are called [n,n]
366 annotated algorithms in python
times, once per each couple of stocks i,j. The values of i,j are retrieved
by get_global_id(0) and (1), respectively.
In this program, we have defined multiple kernels and complied them
at once. We call one kernel at the time to make sure that the call to the
previous kernel is completed before running the next one.
Figure 8.8: The image on the left shows one of the randomly generated stock price
histories. The image on the right represents the computed correlation matrix. Rows and
columns correspond to stock returns, and the color at the intersection is their correlation
(red for high correlation and blue for no correlation). The resulting shape is an artifact
of the algorithm used to generate random data.
9
Appendices
9.1.1 Symbols
∞ infinity
∧ and
∨ or
∩ intersection
∪ union
∈ element or In (9.1)
∀ for each
∃ exists
⇒ implies
: such that
iff if and only if
368 annotated algorithms in python
Important sets
0 empty set
N natural numbers {0,1,2,3,. . . }
N+ positive natural numbers {1,2,3,. . . }
Z all integers {. . . ,-3,-2,-1,0,1,2,3,. . . } (9.2)
R all real numbers
R+ positive real numbers (not including 0)
R0 positive numbers including 0
Set operations
A, B and C are some generic sets.
• Intersection
A ∩ B ≡ { x : x ∈ A and x ∈ B} (9.3)
• Union
A ∪ B ≡ { x : x ∈ A or x ∈ B} (9.4)
• Difference
A − B ≡ { x : x ∈ A and x ∈
/ B} (9.5)
Set laws
• Empty set laws
A∪0 = A (9.6)
A∩0 = 0 (9.7)
• Idempotency laws
A∪A = A (9.8)
A∩A = A (9.9)
appendices 369
• Commutative laws
• Associative laws
A ∪ (B ∪ C) = (A ∪ B) ∪ C (9.12)
A ∩ (B ∩ C) = (A ∩ B) ∩ C (9.13)
• Distributive laws
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (9.14)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) (9.15)
• Absorption laws
A ∩ (A ∪ B) = A (9.16)
A ∪ (A ∩ B) = A (9.17)
• De Morgan’s laws
A − (B ∪ C) = (A − B) ∩ (A − C) (9.18)
A − (B ∩ C) = (A − B) ∪ (A − C) (9.19)
Relations
• A Cartesian Product is defined as
A × B = {( a, b) : a ∈ A and b ∈ B} (9.20)
9.1.3 Logarithms
Properties of logarithms:
log x
loga x = (9.21)
log a
log xy = (log x ) + (log y) (9.22)
x
log = (log x ) − (log y) (9.23)
y
log x n = n log x (9.24)
Definition
i <n
∑ f ( i ) ≡ f (0) + f (1) + · · · + f ( n − 1) (9.25)
i =0
Properties
• Linearity I
i ≤n i <n
∑ f (i ) = ∑ f (i ) + f ( n ) (9.26)
i =0 i =0
i ≤b i ≤b i<a
∑ f (i ) = ∑ f (i ) − ∑ f (i ) (9.27)
i=a i =0 i =0
• Linearity II
! !
i <n i<n i <n
∑ a f (i) + bg(i) = a ∑ f (i ) +b ∑ g (i ) (9.28)
i =0 i =0 i =0
appendices 373
Proof:
i <n
∑ a f (i) + bg(i) = (a f (0) + bg(0)) + · · · + (a f (n − 1) + bg(n − 1))
i =0
= a f (0) + · · · + a f (n − 1) + bg(0) + · · · + bg(n − 1)
= a ( f (0) + · · · + f (n − 1)) + b ( g(0) + · · · + g(n − 1))
! !
i <n i <n
=a ∑ f (i ) +b ∑ g (i ) (9.29)
i =0 i =0
Examples:
i <n
∑ c = cn for any constant c (9.30)
i =0
i <n
1
∑ i = 2 n ( n − 1) (9.31)
i =0
i <n
1
∑ i2 = 6 n(n − 1)(2n − 1) (9.32)
i =0
i <n
1
∑ i 3 = 4 n2 ( n − 1)2 (9.33)
i =0
i <n
xn − 1
∑ xi = x−1
(geometric sum) (9.34)
i =0
i <n
1 1
∑ i ( i + 1) = 1−
n
(telescopic sum) (9.35)
i =0
9.1.5 Limits (n → ∞)
f (n)
lim =? (9.36)
n→∞ g(n)
374 annotated algorithms in python
• If a ∈ R and b ∈ R+ then
f (n) a
lim = (9.39)
n→∞ g(n) b
• If a ∈ R and b = ∞ then
f (x)
lim =0 (9.40)
x →∞ g( x )
• If ( a ∈ R+ and b = 0) or ( a = ∞ and b ∈ R)
f (n)
lim =∞ (9.41)
n→∞ g(n)
f (n) f 0 (n)
lim = lim 0 (9.42)
n→∞ g(n) n→∞ g (n )
f (n) g(n)
lim = a ⇒ lim = 1/a (9.43)
n→∞ g(n) n→∞ f (n )
appendices 375
Table of derivatives
f (x) f 0 (x)
c 0
ax n anx n−1
1 (9.44)
log x x
ex ex
ax a x log a
x n log x, n > 0 x n−1 (n log x + 1)
d
( f ( x ) + g( x )) = f 0 ( x ) + g0 ( x ) (9.45)
dx
d
( f ( x ) − g( x )) = f 0 ( x ) − g0 ( x ) (9.46)
dx
d
( f ( x ) g( x )) = f 0 ( x ) g( x ) + f ( x ) g0 ( x ) (9.47)
dx
f 0 (x)
d 1
=− (9.48)
dx f ( x ) f ( x )2
f 0 (x) f ( x ) g0 ( x )
d f (x)
= − (9.49)
dx g( x ) g( x ) g ( x )2
d
f ( g( x )) = f 0 ( g( x )) g0 ( x ) (9.50)
dx
appendices 377
Index
[1] http://www.python.org
[2] Travis Oliphant, “A Guide to NumPy”. Vol.1. USA: Trelgol Pub-
lishing (2006)
[3] http://www.scipy.org/
[4] Andreas Klöckner et al “Pycuda and pyopencl: A scripting-
based approach to gpu run-time code generation.” Parallel
Computing 38.3 (2012) pp 157-174
[5] https://github.com/mdipierro/ocl
[6] http://www.network-theory.co.uk/docs/pytut/
[7] http://oreilly.com/catalog/9780596158071
[8] http://www.python.org/doc/
[9] http://www.sqlite.org/
[10] http://matplotlib.sourceforge.net/
[11] https://github.com/mdipierro/canvas
[12] Stefan Behnel et al., “Cython: The best of both worlds.” Com-
puting in Science & Engineering 13.2 (2011) pp 31-39
[13] Donald Knuth, “The Art of Computer Programming, Volume
3”, Addison-Wesley, (1997). ISBN 0-201-89685-0
[14] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and
384 annotated algorithms in python