An Intro To Python and Algorithms
An Intro To Python and Algorithms
An Intro To Python and Algorithms
Summer, 2013
“There’s nothing to fear but the fear itself.
That’s called recursion, and that would lead you to
infinite fear.”
This book is distributed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 license.
That means you are free: to Share - to copy, distribute and transmit the
work; to Remix - to adapt the work; under the following conditions:
Attribution. You must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse you
or your use of the work).
Noncommercial. You may not use this work for commercial purposes.
Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same or similar license to this
one.
For any reuse or distribution, you must make clear to others the license
terms of this work. The best way to do this is with a link to my github
repository.
5
6 CONTENTS
5 Object-Oriented Design 85
5.1 Classes and Objects . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Principles of OOP . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 An Example of a Class . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Python Design Patterns . . . . . . . . . . . . . . . . . . . . . 90
6 Advanced Topics 95
6.1 Multiprocessing and Threading . . . . . . . . . . . . . . . . . 95
6.2 Good Practices . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9 Sorting 149
9.1 Quadratic Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.2 Linear Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.3 Loglinear Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
9.4 Comparison Between Sorting Methods . . . . . . . . . . . . . 157
9.5 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 157
CONTENTS 7
10 Searching 161
10.1 Unsorted Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . 161
10.1.1 Sequential Search . . . . . . . . . . . . . . . . . . . . . 161
10.1.2 Quick Select and Order Statistics . . . . . . . . . . . . 162
10.2 Sorted Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10.2.1 Binary Search . . . . . . . . . . . . . . . . . . . . . . . 163
10.3 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . 166
9
Chapter 1
Oh Hay, Numbers!
1.1 Integers
Python represents integers using the immutable int type. For immutable
objects there is no difference between a variable and an object reference.
The size of Python’s integers is limited by the machine memory (the
range depends on the C or Java built-in compiler), being at least 32-bit long
(4 bytes)1 .
To see how many bytes is hold in an integer, the int.bit length()
method is available (starting in Python 3.):
>>> (999).bit_length()
10
11
12 CHAPTER 1. OH HAY, NUMBERS!
>>> s = ’11’
>>> d = int(s)
>>> print(d)
11
>>> b = int(s, 2)
>>> print(b)
3
1.2 Floats
Numbers with a fractional part are represented by the immutable type float.
When we use single precision, a 32-bit float is represented by: 1 bit for sign
(negative being 1, positive being 0), plus 23 bits for the significant digits
(or mantissa), plus 8 bits for the exponent.
Additionally, the exponent is represented using the biased notation, where
you add the number 127 to the original value2 .
Comparing Floats
Floats have a precision limit, which constantly causes them to be truncated
(that’s why we should never compare floats for equality!).
Equality tests should be done in terms of some predefined precision. A
simple example would be the following function:
>>> def a(x , y, places=7):
... return round(abs(x-y), places) == 0
def float_to_fractions(number):
return Fraction(number.as_integer_ratio())
3
Every snippet shown in this book is available at https://github.com/bt3gl/Python-
and-Algorithms-and-Data-Structures.
1.5. THE DECIMAL MODULE 15
While the math and cmath modules are not suitable for the decimal mod-
ule, it does have these functions built in, such as decimal.Decimal.exp(x).
[convert_to_decimal.py]
def test_convert_to_decimal():
number, base = 1001, 2
assert(convert_to_decimal(number, base) == 9)
def test_convert_from_decimal():
number, base = 9, 2
assert(convert_from_decimal(number, base) == 1001)
strings = "0123456789ABCDEFGHIJ"
result = ""
while number > 0:
digit = number%base
result = strings[digit] + result
number = number//base
return result
def test_convert_from_decimal_larger_bases():
number, base = 31, 16
assert(convert_from_decimal_larger_bases(number, base) == ’1F’)
a, b = b, a % b
return result
def test_finding_gcd():
number1 = 21
number2 = 12
assert(finding_gcd(number1, number2) == 3)
test_finding_gcd()
The follow snippet runs some tests on the Python’s random module:
[testing_random.py]
import random
def testing_random():
’’’ testing the module random’’’
values = [1, 2, 3, 4]
print(random.choice(values))
print(random.choice(values))
print(random.choice(values))
print(random.sample(values, 2))
print(random.sample(values, 3))
Fibonacci Sequences
The module below shows how to find the nth number in a Fibonacci sequence
in three different ways: (a) with a recursive O(2n ) runtime; (b) with an
iterative O(n2 ) runtime; and (c) using the generator propriety instead:
[fibonacci.py]
def fib_generator():
a, b = 0, 1
while True:
yield b
a, b = b, a+b
def fib(n):
’’’
>>> fib(2)
1
>>> fib(5)
5
>>> fib(7)
13
’’’
if n < 3:
return 1
a, b = 0, 1
count = 1
while count < n:
count += 1
a, b = b, a+b
return b
def fib_rec(n):
’’’
>>> fib_rec(2)
1
>>> fib_rec(5)
5
>>> fib_rec(7)
13
’’’
20 CHAPTER 1. OH HAY, NUMBERS!
if n < 3:
return 1
return fib_rec(n - 1) + fib_rec(n - 2)
Primes
The following snippet includes functions to find if a number is prime, to give
the number’s primes factors, and to generate primes:
[primes.py]
import math
import random
def find_prime_factors(n):
’’’
>>> find_prime_factors(14)
[2, 7]
>>> find_prime_factors(19)
[]
’’’
divisors = [d for d in range(2, n//2 + 1) if n % d == 0]
primes = [d for d in divisors if is_prime(d)]
return primes
def is_prime(n):
for j in range(2, int(math.sqrt(n))):
if (n % j) == 0:
return False
return True
def generate_prime(number=3):
while 1:
p = random.randint(pow(2, number-2), pow(2, number-1)-1)
p = 2 * p + 1
if find_prime_factors(p):
return p
1.8. THE NUMPY PACKAGE 21
import numpy as np
def testing_numpy():
ax = np.array([1,2,3])
ay = np.array([3,4,5])
print(ax)
print(ax*2)
print(ax+10)
print(np.sqrt(ax))
print(np.cos(ax))
print(ax-ay)
4
http://www.numpy.org
22 CHAPTER 1. OH HAY, NUMBERS!
NumPy arrays are also much more efficient than Python’s lists, as we can
see in the benchmark tests below:
[testing_numpy_speed.py]
import numpy
import time
def trad_version():
t1 = time.time()
X = range(10000000)
Y = range(10000000)
Z = []
for i in range(len(X)):
Z.append(X[i] + Y[i])
return time.time() - t1
def numpy_version():
t1 = time.time()
X = numpy.arange(10000000)
Y = numpy.arange(10000000)
Z = X + Y
return time.time() - t1
if __name__ == ’__main__’:
assert(trad_version() > 3.0)
assert(numpy_version() < 0.1)
Chapter 2
In this chapter we will learn how Python represents sequence data types.
A sequence type is defined by having the following properties:
Python has five built-in sequence types: strings, tuples, lists, byte
arrays, and bytes:1
>>> l = []
>>> type(l)
<type ’list’>
>>> s = ’’
>>> type(s)
<type ’str’>
>>> t = ()
>>> type(t)
<type ’tuple’>
>>> ba = bytearray(b’’)
1
A named tuple is also available in the standard library, under the collections
package.
23
24 CHAPTER 2. BUILT-IN SEQUENCE TYPES
>>> type(ba)
<type ’bytearray’>
>>> b = bytes([])
>>> type(byte)
<type ’type’>
Mutability
In Python, tuple, strings, and bytes are immutable, while lists and byte
arrays are mutable.
Immutable types are in general more efficient than mutable objects. In
addition, some collection data types2 can only be indexed by immutable data
types.
In Python, any variable is an object reference, so copying mutable objects
can be tricky. When you say a = b you are actually pointing a to where b
points to. For this reason, it’s important to understand the concept of deep
copying.
For instance, to make a deep copy of a list, we do:
>>> newList = myList[:]
>>> newList2 = list(myList2)
To make a deep copy of some other object, we use the copy module:
2
Collection data types, such as sets and dictionaries, are reviewed in the next chapter.
2.1. STRINGS 25
If we want to start counting from the right, we can represent the index
as negative:
>>> word = "Let us kill some vampires!"
>>> word[-1]
’!’
>>> word[-2]
’s’
>>> word[-2:]
’s!’
>>> word[:-2]
’Let us kill some vampire’
>>> word[-0]
’L’
2.1 Strings
In Python, every object has two output forms: while string forms are de-
signed to be human-readable, representational forms are designed to produce
an output that if fed to a Python interpreter, reproduces the represented ob-
ject.
Python represents strings, i.e. a sequence of characters, with the immutable
str type.
26 CHAPTER 2. BUILT-IN SEQUENCE TYPES
Unicode Strings
Python’s Unicode encoding is used to include special characters in strings
(for example, whitespaces). Additionally, starting from Python 3, all strings
are Unicode, not just plain bytes. The difference is that, while ASCII repre-
sentations are given by 8-bits, Unicode representations usually use 16-bits.
To create an Unicode string, we use the ‘u’ prefix:
>>> u’Goodbye\u0020World !’
’Goodbye World !’
In the example above, the escape sequence indicates the Unicode character
with the ordinal value 0x0020.
We can use split() to write our own method for erasing spaces from
strings:
>>> def erase_space_from_string(string):
... s1 = string.split(" ")
... s2 = "".join(s1)
... return s2
>>> slayers.strip("999")
’Buffy and Faith’
The program below uses strip() to list every word and the number of
the times they occur in alphabetical order for some file:3
[count_unique_words.py]
import string
import sys
def count_unique_word():
words = {} # create an empty dictionary
strip = string.whitespace + string.punctuation + string.digits
+ "\"’"
for filename in sys.argv[1:]:
with open(filename) as file:
for line in file:
for word in line.lower().split():
word = word.strip(strip)
if len(word) > 2:
words[word] = words.get(word,0) +1
for word in sorted(words):
print("’{0}’ occurs {1} times.".format(word, words[word]))
Similar methods are: lstrip(), which returns a copy of the string with
all whitespace at the beginning of the string stripped away; and rstrip(),
which returns a copy of the string with all whitespace at the end of the string
stripped away.
? capitalize(): returns a copy of the string with only the first character
in uppercase;
? lower(): returns a copy of the original string, but with all characters
in lowercase;
? upper(): returns a copy of the original string, but with all characters
in uppercase.
2
>>> slayer.count("Buffy")
3
2.2 Tuples
A tuple is an Python immutable sequence type consisting of values separated
by commas:
>>> t1 = 1234, ’hello!’
>>> t1[0]
1234
>>> t1
(12345, ’hello!’)
>>> t2 = t2, (1, 2, 3, 4, 5) # nested
>>> u
((1234, ’hello!’), (1, 2, 3, 4, 5))
0
>>> len(t1)
1
>>> t1
(’hello’,)
The count(x) method counts how many times x appears in the tuple:
>>> t = 1, 5, 7, 8, 9, 4, 1, 4
>>> t.count(4)
2
Tuple Unpacking
In Python, any iterable object can be unpacked using the sequence unpacking
operator, *. This operator can be used with two or more variables. In this
case, the items in the left-hand side of the assignment (which are preceded by
*) are assigned to the named variables, while the items left over are assigned
to the starred variable:
>>> x, *y = (1, 2, 3, 4)
>>> x
1
>>> y
[2, 3, 4]
2.3. LISTS 33
Named Tuples
Python’s package collections4 contains a sequence data type called named
tuple. These objects behave just like the built-in tuple, with the same
performance characteristics, but in addition they carry the ability to refer to
items by name. This allows the creation of aggregates of data items:
>>> import collections
>>> MonsterTuple = collections.namedtuple("Monsters","name age
power")
>>> MonsterTuple = (’Vampire’, 230, ’immortal’)
>>> MonsterTuple
(’Vampire’, 230, ’immortal’)
def namedtuple_example():
>>> namedtuple_example()
slayer
’’’
sunnydale = namedtuple(’name’, [’job’, ’age’])
buffy = sunnydale(’slayer’, ’17’)
print(buffy.job)
2.3 Lists
In Computer Science, arrays are a very simple data structure where elements
are sequentially stored in continued memory, and linked lists are structures
4
We are going to explore collections in the following chapters.
34 CHAPTER 2. BUILT-IN SEQUENCE TYPES
where several separated nodes link to each other. Iterating over the contents
is equally efficient for both kinds, but directly accessing an element at a
given index has O(1) (complexity) runtime5 in an array, while it has O(n)
in a linked list with n nodes (you would have to transverse the list from the
beginning).
Additionally, in a linked list, once you know where you want to insert
something, insertion is O(1), no matter how many elements the list has. For
arrays, an insertion would have to move all elements that are to the right
of the insertion point or moving all the elements to a larger array if needed,
being then O(n).
In Python, the closest object to an array is a list, which is a dynamic
resizing array and it does not have anything to do with the formal concept
of linked lists. Linked lists are a very important abstract data structure (we
will see more about them in a following chapter) and it is fundamental to
understand what makes them different from arrays (or Python’s lists).
Lists in Python are created by comma-separated values, between square
brackets. Items in lists do not need to have all the same data type. Unlike
strings which are immutable, it is possible to change individual elements of
a list (lists are mutable):
>>> q = [2, 3]
>>> p = [1, q, 4]
>>> p[1].append("buffy")
>>> p
[1, [2, 3, ’buffy’], 4]
>>> q
[2, 3, ’buffy’]
>>> q
[2, 3, ’buffy’]
To insert items, lists perform best (O(1)) when items are added or re-
moved at the end, using the methods append() and pop(), respectively.
The worst performance (O(n)) occurs with operations that need to search
for items in the list, for example, using remove() or index(), or using in
for membership testing.6
5
The Big-O notation is a key to understand algorithms. We will learn it in the following
chapters. For now, keep in mine that O(1) O(n) O(n2 ), etc...
6
This explains why append() is so much more efficient than insert().
2.3. LISTS 35
Inserts an item in a given position i: the first argument is the index of the
element before which to insert:
>>> people = ["Buffy", "Faith"]
>>> people.insert(1, "Xander")
>>> people
[’Buffy’, ’Xander’, ’Faith’]
Removes the first item from the list whose value is x. Raises a ValueError
exception if x is not found:
>>> people = ["Buffy", "Faith"]
>>> people.remove("Buffy")
>>> people
[’Faith’]
>>> people.remove("Buffy")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
Removes the item at the given position in the list, and then returns it. If no
index is specified, pop() returns the last item in the list:
>>> people = ["Buffy", "Faith"]
>>> people.pop()
’Faith’
>>> people
[’Buffy’]
2.3. LISTS 37
The del:
Deletes the object reference, not the content, i.e., it is a way to remove an
item from a list given its index instead of its value. This can also be used to
remove slices from a list:
>>> a = [-1, 4, 5, 7, 10]
>>> del a[0]
>>> a
[4, 5, 7, 10]
>>> del a[2:3]
>>> a
[4, 5, 10]
>>> del a # also used to delete entire variable
When an object reference is deleted and no other object refers to its data,
Python schedules the item to be garbage-collected.7
Returns the index in the list of the first item whose value is x:
>>> people = ["Buffy", "Faith"]
>>> people.index("Buffy")
0
7
Garbage is a memory occupied by objects that are no longer referenced and garbage
collection is a form of automatic memory management, freeing the memory occupied by
the garbage.
38 CHAPTER 2. BUILT-IN SEQUENCE TYPES
List Unpacking
Uunpacking in lists is similar to tuples:
>>> first, *rest = [1,2,3,4,5]
>>> first
1
>>> rest
[2, 3, 4, 5]
Python also has a related concept named starred arguments, that can be
used as a passing argument for a function:
>>> def example_args(a, b, c):
... return a * b * c # here * is the multiplication operator
>>> L = [2, 3, 4]
>>> example_args(*L)
24
>>> example_args(2, *L[1:])
24
2.3. LISTS 39
List Comprehensions
A list comprehension is an expression and loop (with an optional condition)
enclosed in brackets:
[item for item in iterable]
[expression for item in iterable]
[expression for item in iterable if condition]
List comprehensions should only be used for simple cases, when each
portion fits in one line (no multiple for clauses or filter expressions):
[Good]
40 CHAPTER 2. BUILT-IN SEQUENCE TYPES
result = []
for x in range(10):
for y in range(5):
if x * y > 10:
result.append((x, y))
for x in range(5):
for y in range(5):
if x != y:
for z in range(5):
if y != z:
yield (x, y, z)
[Bad]
result = [(x, y) for x in range(10) for y in range(5) if x * y >
10]
return ((x, y, z)
for x in xrange(5)
for y in xrange(5)
if x != y
for z in xrange(5)
if y != z)
eter is a statement to set up the test. The timeit module will time how long
it takes to execute the statement some number of times (one million times
by default).
When the test is done, the function returns the time as a floating point
value representing the total number of seconds:
[runtime_lists_with_timeit_module.py]
def test1():
l = []
for i in range(1000):
l = l + [i]
def test2():
l = []
for i in range(1000):
l.append(i)
def test3():
l = [i for i in range(1000)]
def test4():
l = list(range(1000))
if __name__ == ’__main__’:
import timeit
t1 = timeit.Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "milliseconds")
t2 = timeit.Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "milliseconds")
t3 = timeit.Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "milliseconds")
t4 = timeit.Timer("test4()", "from __main__ import test4")
print("list range ",t4.timeit(number=1000), "milliseconds")
def revert(string):
’’’
>>> s = ’hello’
>>> revert(s)
’olleh’
>>> revert(’’)
’’
’’’
return string[::-1]
def reverse_string_inplace(s):
’’’
>>> s = ’hello’
>>> reverse_string_inplace(s)
’olleh’
>>> reverse_string_inplace(’’)
’’
’’’
if s:
s = s[-1] + reverse_string_inplace(s[:-1])
return s
the characters utilizing two pointers. The second loop will search for spaces
and then revert the words.
Further caveats of this problems is that we can represent space as ’ ’ or
as Unicodes (u0020). We also might have to pay attention in the last word,
since it does not end with a space. Finally, an extension of this problem is
to look for symbols such as !?;-.":
[reversing_words.py]
def reversing_words(word):
"""
>>> reversing_words(’buffy is awesome’)
’awesome is buffy’
"""
new_word = []
words = word.split(’ ’)
for word in words[::-1]:
new_word.append(word)
return " ".join(new_word)
def reversing_words2(s):
"""
>>> reversing_words2(’buffy is awesome’)
’awesome is buffy’
"""
words = s.split()
return ’ ’.join(reversed(words))
def reversing_words3(s):
"""
>>> reversing_words(’buffy is awesome’)
’awesome is buffy’
"""
words = s.split(’ ’)
words.reverse()
return ’ ’.join(words)
[simple_str_compression.py]
def str_comp(s):
’’’
>>> s1 = ’aabcccccaaa’
>>> str_comp(s1)
’a2b1c5a3’
>>> str_comp(’’)
’’
’’’
count, last = 1, ’’
list_aux = []
for i, c in enumerate(s):
if last == c:
count += 1
else:
if i != 0:
list_aux.append(str(count))
list_aux.append(c)
count = 1
last = c
list_aux.append(str(count))
return ’’.join(list_aux)
String Permutation
[permutation.py]
def perm(str1):
’’’
>>> perm(’123’)
[’123’, ’132’, ’231’, ’213’, ’312’, ’321’]
’’’
if len(str1) < 2:
return str1
res = []
46 CHAPTER 2. BUILT-IN SEQUENCE TYPES
for i, c in enumerate(str1):
for cc in perm(str1[i+1:] + str1[:i]):
res.append(c + cc)
return res
String Combination
[combination.py]
def combinations(s):
’’’
>>> combinations(’abc’)
[’abc’, ’acb’, ’bac’, ’bca’, ’cab’, ’cba’]
>>> combinations(’’)
’’
’’’
if len(s) < 2:
return s
res = []
for i, c in enumerate(s):
res.append(c)
for j in combinations(s[:i] + s[i+1:]):
res.append(c + j)
return res
Palindrome
[palindrome.py]
def is_palindrome(array):
’’’
>>> is_palindrome(’subi no onibus’)
True
>>> is_palindrome(’helllo there’)
False
2.5. FURTHER EXAMPLES 47
>>> is_palindrome(’h’)
True
>>> is_palindrome(’’)
True
’’’
array = array.strip(’ ’)
if len(array) < 2:
return True
if array[0] == array[-1]:
return is_palindrome(array[1:-1])
else:
return False
48 CHAPTER 2. BUILT-IN SEQUENCE TYPES
Chapter 3
Differently from sequence structures, where the data can be ordered or sliced,
collection structures are containers that aggregates data without relating
them. They have the following proprieties:
In Python, built-in collection types are given by sets and dicts. Addi-
tional collection types are found in the collections package, as we discuss
in end of this chapter.
3.1 Sets
A Set is an unordered collection data type that is iterable, mutable, and has
no duplicate elements. Sets are used for membership testing and eliminating
duplicate entries.
Since sets have O(1) insertion, the runtime of union is O(m + n). For
intersection, it is only necessary to transverse the smaller set, so the runtime
is O(n). 1
1
Python’s collection package has supporting for Ordered sets. This data type enforces
some predefined comparison for their members.
49
50 CHAPTER 3. COLLECTION DATA STRUCTURES
Frozen Sets
Frozen sets are immutable objects that only support methods and operators
that produce a result without affecting set to which they are applied.
def difference(l1):
""" return the list with duplicate elements removed """
return list(set(l1))
def test_sets_operations_with_lists():
l1 = [1,2,3,4,5,9,11,15]
52 CHAPTER 3. COLLECTION DATA STRUCTURES
l2 = [4,5,6,7,8]
l3 = []
assert(difference(l1) == [1, 2, 3, 4, 5, 9, 11, 15])
assert(difference(l2) == [8, 4, 5, 6, 7])
assert(intersection(l1, l2) == [4,5])
assert(union(l1, l2) == [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15])
assert(difference(l3) == [])
assert(intersection(l3, l2) == l3)
assert(sorted(union(l3, l2)) == sorted(l2))
print(’Tests passed!’)
if __name__ == ’__main__’:
test_sets_operations_with_lists()
def set_operations_with_dict():
pairs = [(’a’, 1), (’b’,2), (’c’,3)]
d1 = OrderedDict(pairs)
print(d1) # (’a’, 1), (’b’, 2), (’c’, 3)
2
Sets properties can be used on the dict’s attributes items() and keys() attributes,
however values() do not support set operations.
3.2. DICTIONARIES 53
if __name__ == ’__main__’:
set_operations_with_dict()
3.2 Dictionaries
Dictionaries in Python are implemented using hash tables3 . Hashing func-
tions assign integer values to an arbitrary object in constant time, which can
be used as an index into an array:
>>> hash(42)
42
>>> hash("hello")
355070280260770553
def usual_dict(dict_data):
newdata = {}
for k, v in dict_data:
if k in newdata:
newdata[k].append(v)
else:
newdata[k] = [v]
return newdata
def setdefault_dict(dict_data):
newdata = {}
3.2. DICTIONARIES 55
for k, v in dict_data:
newdata.setdefault(k, []).append(v)
return newdata
import timeit
import random
for i in range(10000,1000001,20000):
t = timeit.Timer("random.randrange(%d) in x"%i, "from __main__
import random,x")
x = list(range(i))
lst_time = t.timeit(number=1000)
x = {j:None for j in range(i)}
d_time = t.timeit(number=1000)
print("%d,%10.3f,%10.3f" % (i, lst_time, d_time))
We clearly see that the membership operation for lists is O(n) and for
dictionaries is O(1)):
10000, 0.192, 0.002
30000, 0.600, 0.002
50000, 1.000, 0.002
3.3. PYTHON’S COLLECTION DATA TYPES 57
Default Dictionaries
Default dictionaries are an additional unordered mapping type provided
by Python’s collections.defaultdict. They have all the operators and
methods that a built-in dictionary has, but they also handle missing keys:
[example_defaultdict.py]
def defaultdict_example():
pairs = {(’a’, 1), (’b’,2), (’c’,3)}
d1 = {}
for key, value in pairs:
if key not in d1:
d1[key] = []
d1[key].append(value)
print(d1)
d2 = defaultdict(list)
for key, value in pairs:
d2[key].append(value)
print(d2)
Ordered Dictionaries
Ordered dictionaries are an ordered mapping type provided by Python’s
collections.OrderedDict. They have all the methods and properties of a
built-in dict, but in addition they store items in the insertion order:
[example_OrderedDict.py]
d1[key].append(value)
for key in d1:
print(key, d1[key])
d2 = OrderedDict(pairs)
for key in d2:
print(key, d2[key])
Counter Dictionaries
A specialized Counter type (subclass for counting objects) is provided by
Python’s collections.Counter:
[example_Counter.py]
def Counter_example():
seq1 = [1, 2, 3, 5, 1, 2, 5, 5, 2, 5, 1, 4]
seq_counts = Counter(seq1)
print(seq_counts)
60 CHAPTER 3. COLLECTION DATA STRUCTURES
seq3 = [1, 4, 3]
for key in seq3:
seq_counts[key] += 1
print(seq_counts)
import collections
import string
import sys
def count_unique_word():
words = collections.defaultdict(int)
strip = string.whitespace + string.punctuation + string.digits
+ "\"’"
for filename in sys.argv[1:]:
with open(filename) as file:
for line in file:
for word in line.lower().split():
word = word.strip(strip)
if len(word) > 2:
words[word] = +1
for word in sorted(words):
print("’{0}’ occurs {1} times.".format(word, words[word]))
Anagrams
The following program finds whether two words are anagrams. Since sets do
not count occurrence, and sorting a list is O(n log n), hash tables can be a
good solution for this type of problem:
[anagram.py]
62 CHAPTER 3. COLLECTION DATA STRUCTURES
Sums of Paths
The following program uses two different dictionary containers to determine
the number of ways two dices can sum to a certain value:
[find_dice_probabilities.py]
def delete_unique(str1):
’’’
>>> delete_unique("Trust no one")
’on’
>>> delete_unique("Mulder?")
’’
’’’
str_strip = ’’.join(str1.split())
repeat = Counter()
for c in str_strip:
repeat[c] += 1
result = ’’
for c, count in repeat.items():
if count > 1:
result += c
return result
def removing_duplicates_seq(str1):
’’’
>>> delete_unique("Trust no one")
64 CHAPTER 3. COLLECTION DATA STRUCTURES
’on’
>>> delete_unique("Mulder?")
’’
’’’
seq = str1.split()
result = set()
for item in seq:
if item not in result:
#yield item
result.add(item)
return result
Chapter 4
if
1
Note that colons are used with else, elif, and in any other place where a suite is to
follow.
65
66 CHAPTER 4. PYTHON’S STRUCTURE AND MODULES
for
Python’s for statement iterates over the items of any sequence (e.g., a list
or a string), in the order that they appear in the sequence:
>>> a = ["buffy", "willow", "xander", "giles"]
>>> for i in range(len(a)):
... print(a[i])
buffy
willow
xander
giles
? For sequences (strings, lists, tuples), use the fact that empty sequences
are False, so if not seq or if seq is preferable to if len(seq) or
if not len(seq).
? When handling integers, implicit False may involve more risk than
benefit, such as accidentally handling None as 0:
[Good]
if not users: print ’no users’
4.1. CONTROL FLOW 67
if foo == 0: self.handle_zero()
if i % 10 == 0: self.handle_multiple_of_ten()
[Bad]
if len(users) == 0: print ’no users’
if foo is not None and not foo: self.handle_zero()
if not i % 10: self.handle_multiple_of_ten()
if __name__ == ’__main__’:
68 CHAPTER 4. PYTHON’S STRUCTURE AND MODULES
fib = fib_generator()
print(next(fib))
print(next(fib))
print(next(fib))
print(next(fib))
Activation Records
An activation record happens every time we invoke a method: information
is put in the stack to support invocation. Activation records process in the
following order:
1. the actual parameters of the method are pushed onto the stack,
Packages
A package is a directory that contains a set of modules and a file called
init .py :
>>> import foldername.filemodulename
70 CHAPTER 4. PYTHON’S STRUCTURE AND MODULES
means importing every object in the module, except those whose names begin
with , or if the module has a global all variable, the list in it.
is executed.
The variables sys.ps1 and sys.ps2 define the strings used as primary
and secondary prompts. The variable sys.argv allows the recovering of the
arguments passed in the command line:
import sys
The built-in method dir() is used to find which names a module defines
and includes all types of names: variables, modules, functions:
>>> import sys
>>> dir(sys)
[ __name__ , argv , builtin_module_names , copyright ,
exit , maxint , modules , path , ps1 ,
ps2 , setprofile , settrace , stderr ,
stdin , stdout , version ]
import os
import sys
def read_data(filename):
72 CHAPTER 4. PYTHON’S STRUCTURE AND MODULES
lines = []
fh = None
try:
fh = open(filename)
for line in fh:
if line.strip():
lines.append(line)
except (IOError, OSError) as err:
print(err)
finally:
if fh is not None:
fh.close()
return lines
def remove_blank_lines():
if len(sys.argv) < 2:
print ("Usage: noblank.py infile1 [infile2...]")
for filename in sys.argv[1:]:
lines = read_data(filename)
if lines:
write_data(lines, filename)
>>> f.readlines()
[’This is the first line of the file.\n’, ’Second line of the
file\n’]
... try:
... i = int(input(msg))
... return i
... except ValueError as err:
... print(err)
>>> age = get_int("Enter your age: ")
import os
import sys
import shutil
def change_file_ext():
if len(sys.argv) < 2:
print("Usage: change_ext.py filename.old_ext ’new_ext’")
sys.exit()
name = os.path.splitext(sys.argv[1])[0] + "." + sys.argv[2]
print (name)
try:
shutil.copyfile(sys.argv[1], name)
except OSError as err:
print (err)
76 CHAPTER 4. PYTHON’S STRUCTURE AND MODULES
import pickle
finally:
4.3. FILE HANDLING 77
if fh is not None:
fh.close()
import pickle
def import_pickle(filename):
fh = None
try:
fh = open(filename, "rb")
mydict2 = pickle.load(fh)
return mydict2
except (EnvironmentError) as err:
print ("{0}: import error:
{0}".format(os.path.basename(sys.arg[0]), err))
return false
finally:
if fh is not None:
fh.close()
Handling Exceptions
When an exception is raised and not handled, Python outputs a traceback
along with the exception’s error message. A traceback is a list of all the calls
made from the point where the unhandled exception occurred back to the
top of the call stack.
We can handle predictable exceptions with the try-except-finally
paradigm:
try:
try_suite
except exception1 as variable1:
exception_suite1
...
except exceptionN as variableN:
exception_suiteN
If the statements in the try block’s suite are all executed without raising
an exception, the except blocks are skipped. If an exception is raised inside
the try block, control is immediately passed to the suite corresponding to the
first matching exception. This means that any statements in the suite that
follow the one that caused the exception will not be executed:
while 1:
try:
x = int(raw_input("Please enter a number: "))
4.4. ERROR HANDLING IN PYTHON 79
break
except ValueError:
print "Oops! That was no valid number. Try again..."
ception class, which should be inherit from the built-in Exception class.
The base exception for a module should be called Error.
class Error(Exception):
pass
[0, 3, 6, 9]
[grep_word_from_files.py]
import sys
def grep_word_from_files():
word = sys.argv[1]
for filename in sys.argv[2:]:
with open(filename) as file:
for lino, line in enumerate(file, start=1):
if word in line:
print("{0}:{1}:{2:.40}".format(filename, lino,
line.rstrip()))
if __name__ == ’__main__’:
if len(sys.argv) < 2:
print("Usage: grep_word_from_files.py word infile1
[infile2...]")
sys.exit()
else:
grep_word_from_files()
...
>>> area(5,4)
10.0
Object-Oriented Design
However, many things are missing here. First, there are no guarantees
that anyone who uses our circle data is not going to type an invalid input
value, such as a negative number for the radius. Second, how could we also
associate to our circle some operations that are proper from it, such as its
area or perimeter?
For the first problem, we can see that the inability to validate when cre-
ating an object is a really bad aspect of taking a purely procedural approach
in programming. Even if we decide to include many exceptions handling the
invalid inputs for our circles, we still would have a data container that is not
intrinsically made and validated for its real purpose. Imagine now if we had
chosen a list instead of the named tuple, how would we handle the fact that
lists have sorting properties?
It is clear that we need to create an object that has only the proprieties
that we expect it to have. In other words, we want to find a way to package
data and restrict its methods. That is what object-oriented programming
85
86 CHAPTER 5. OBJECT-ORIENTED DESIGN
Class Instantiation
Attributes
Objects have attributes (methods and data) from their Classes. Method
attributes are functions whose first argument is the instance on which it is
called to operate (conventionally called self).
References to names in modules are attribute references: in the expression
modname.funcname, modname is a module object and funcname is one of
its attribute. Attributes may be read-only or writable. Writable attributes
may be deleted with the del statement.
1
Multiple names (in multiple scopes) can be bound to the same object (also know as
aliasing).
5.2. PRINCIPLES OF OOP 87
Namespaces
A namespace is a mapping from names to objects. Most namespaces are
currently implemented as Python dictionaries.
Examples of namespaces are: the set of built-in names, the global names
in a module, and the local names in a function invocation. The statements
executed by the top-level invocation of the interpreter, either reading from a
script file or interactively, are considered part of a module called main .
Scope
A scope is a textual region of a Python program where a namespace is di-
rectly accessible. Although scopes are determined statically, they are used
dynamically.
The global scope of a function defined in a module is the module’s names-
pace. When a class definition is entered, a new namespace is created, and
used as the local scope.
Polymorphism
Functions in Python are virtual, so any method can be overridden in a sub-
class. Polymorphism is the principle where methods can be redefined inside
subclasses.
88 CHAPTER 5. OBJECT-ORIENTED DESIGN
Aggregation
Aggregation (or composition) defines the process where a class includes one of
more instance variables that are from other classes. It is a has-a relationship.
import math
class Point:
def __init__(self, x = 0, y = 0):
self.x = x # data attribute
self.y = y
def __repr__(self):
return "point ({0.x!r}, {0.y!r})".format(self)
def __str__(self):
return "({0.x!r}, {0.y!r})".format(self)
def edge_distance_from_origin(self):
return abs(self.distance_from_origin() - self.radius)
def area(self):
return math.pi*(self.radius**2)
def circumference(self):
return 2*math.pi*self.radius
def __repr__(self):
return "circle ({0.radius!r}, {0.x!r})".format(self)
def __str__(self):
return repr(self)
>>> str(c)
’circle (3, 2)’
>>> c.circumference()
18.84955592153876
>>> c. edge_distance_from_origin()
0.7639320225002102
Decorator Pattern
Decorators (used with a @ notation) are a tool to elegantly specify some
transformation on functions and methods. The decorator pattern allows us
to wrap an object that provides core functionality with other objects that
alter that functionality.
With decorators, the snippet below:
class NiceClass(object):
def method(self):
method = my_decorator(method)
Let’s look at a reals examples. For instance, a sum function that would
be otherwise written as:
[example_decorators.py]
def sum(func):
s = 0
for i in func():
5.4. PYTHON DESIGN PATTERNS 91
s += i
return s
def interate():
a = []
for i in range(10):
a.append(i)
return a
print sum(interate)
print interate
@logger
def foo(x, y):
return x+y
print foo(1, 2)
def benchmark(func):
import time
def wrapper(*args, **kwargs):
t = time.clock()
res = func(*args, **kwargs)
print("\t%s" % func.__name__, time.clock()-t)
return res
return wrapper
@benchmark
def random_tree(n):
temp = [n for n in range(n)]
for i in range(n+1):
temp[random.choice(temp)] = random.choice(temp)
return temp
Observer Pattern
The observer pattern is useful when we want to have a core object that
maintains certain values, and then having some observers to create serialized
copies of that object.
This can be implemented by using the @properties decorator, placed
before our functions. This will control attribute access, for example, to make
an attribute to be read-only.
@property
def radius(self):
return self.__radius
Singleton Pattern
A class follows the singleton pattern if it allows exactly one instance of a
certain object to exist.
Since Python does not have private constructors, we use the new class
method to ensure that only one instance is ever created. For example, in the
5.4. PYTHON DESIGN PATTERNS 93
snippet below, the two objects are the same object (they are equal and are
in the same address):
>>> class SinEx:
... _sing = None
... def __new__(self, *args, **kwargs):
... if not self._sing:
... self._sing = super(SinEx, self).__new__(self, *args,
**kwargs)
... return self._sing
>>> x = SinEx()
>>> x
<__main__.SinEx object at 0xb72d680c>
>>> y = SinEx()
>>> x == y
True
>>> y
<__main__.SinEx object at 0xb72d680c>
94 CHAPTER 5. OBJECT-ORIENTED DESIGN
Chapter 6
Advanced Topics
95
96 CHAPTER 6. ADVANCED TOPICS
with different work to do. Child processing allows to take advantage of mul-
ticore processor, leaving concurrency issues to be handled by the operational
system.
def worker(num):
"""thread worker function"""
print ’Worker: %s’ % num
return
threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
Additionally, we can use the queue.queue() class to handle all the locking
internally. This will serialize accesses, meaning that only one thread at time
has access to the data (FIFO).
Since the program will not terminate while there are running threads,
the ones that are finished will appear to be still running. The solution is
to transform threads into daemons. In this case, the program terminates as
soon as there is no daemon threads running.
were not using mutexes, the thread could try to modify the array as well, at
the same time.
Conceptually, a mutex is an integer that starts at 1. Whenever a thread
needs to alter the array, it “locks” the mutex. This causes the thread to wait
until the number is positive and then decreases it by one. When the thread
is done modifying the array, it “unlocks” the mutex, causing the number to
increase by 1.
Semaphores are more general than mutexes. A semaphore’s integer may
start at a number greater than 1. The number at which a semaphore starts
is the number of threads that may access the resource at once. Semaphores
support “wait” and “signal” operations, which are analogous to the “lock”
and “unlock” operations of mutexes.
Virtualenv
To create a virtual environment:
$ virtualenv venv
98 CHAPTER 6. ADVANCED TOPICS
If you are done working in the virtual environment for the moment, you
can deactivate it:
$ deactivate
Virtualenvwrapper
Virtualenvwrapper provides some addtional commands and places all virtual
environments in one place:
$ pip install virtualenvwrapper
To check if it is working:
$ which python
To delete it:
$ rmvirtualenv test
6.2. GOOD PRACTICES 99
Debugging
The Python debugger, pdb, can be found at http://pymotw.com/2/pdb/.
If you have some code in a source file and you want to debug it interac-
tively, you can run Python with the -i switch:
$ python -i example.py
To perform the inspection, type: s for step, p for point, and n for next
line, list to see the next 10 lines, and help for help.
Profiling
If a program runs very slowly or consumes far more memory than expected,
the problem is most often due to our choice of algorithms or data structures
or due to some inefficient implementation:
Test Nomenclature
Test fixtures: The code necessary to set up a test (for example, creating
an input file for testing and deleting afterwards).
Test suites: The collection of test cases, created by the subclass unittest.testcase,
where each method has a name beginning with “test”.
doctest
Good for tests inside the modules and functions’ docstrings. To use it, we
add the following lines in the module:
if __name__ = "__main__"
import doctest
doctest.testmod()
unittest
unittest is the standard testing package in Python. To use it, the test files
should start with test :
import unittest
class BasicsTestCase(unittest.TestCase):
def test_find_name(self):
self.assertTrue(1 == 1)
self.assertFalse(1 == 2)
if __name__ == ’__main__’:
unittest.main()
pytest
To install pytest, use:
$ pip install pytest
Running:
$ python -m pytest
Part II
103
Chapter 7
Asymptotic Analysis
105
106 CHAPTER 7. ASYMPTOTIC ANALYSIS
P
P is the complexity class of decision problems that can be solved on a deter-
ministic Turing machine in polynomial time (in the worst case). If we can
turn a problem into a decision problem, the result would belong to P.
NP
NP is the complexity class of decision problems that can be solved on a non-
deterministic Turing machine (NTM) in polynomial time. In other words, it
includes all decision problems whose yes instances can be solved in polyno-
mial time with the NTM.
A problem is called complete if all problems in the class are reduced to
it. The subclass called NP-complete (NPC) contains the hardest problems
in all of NP.
Any problem that is at least as hard (determined by polynomial-time
reduction) as any problem in NP, but that need not itself be in NP, is
called NP-hard. For example, finding the shortest route through a graph,
which is called the Travelling Salesman (or Salesrep) problem (TSP).
7.2. RECURSION 107
P=NP?
The class co-NP is the class of the complements of NP problems. For every
“yes” answer, we have the “no”, and vice versa. If NP is truly asymmetric,
then these two classes are different. Although, there is overlap between
them since all of P lies in their intersection. Moreover, both the yes and no
instances in P can be solved in polynomial time with an NTM.
What would happen if a NPC was found in a intersection of N and co-
NP? First, it would mean that all of NP would be inside co-NP, so we
would show NP = co-NP and the asymmetry would disappear. Second,
since all of P is in this intersection, P = NP. If P = NP, we could solve
any (decision) problem that had a practical (verifiable) solution.
However, it is (strongly) believed that NP and co-NP are different. For
instance, no polynomial solution to the problem of factoring numbers was
found, and this problem is in both NP and co-NP.
7.2 Recursion
In functions, the three laws of recursion are:
2. A recursive algorithm must change its state and move toward the base
case.
For every recursive call, the recursive function has to allocate memory on
the stack for arguments, return address, and local variables, costing time to
push and pop these data onto the stack. Recursive algorithms take at least
O(n) space where n is the depth of the recursive call.
Recursion is very costly when there are duplicated calculations and/or
there are overlap among subproblems. In some cases this can cause the stack
to overflow. For this reason, where subproblems overlap, iterative solutions
might be a better approach. For example, in the case of the Fibonacci
series, the iterative solution runs on O(n) while the recursive solution runs
on exponential runtime.
108 CHAPTER 7. ASYMPTOTIC ANALYSIS
Recursive Relations
To describe the running time of recursive functions, we use recursive relations:
T (n) = a · T (g(n)) + f (n),
where a represents the number of recursive calls, g(n) is the size of each
subproblem to be solved recursively, and f (n) is any extra work done in the
function. The following table shows examples of recursive relations:
T (n) = T (n − 1) + 1 O(n) Processing a sequence
T (n) = T (n − 1) + n O(n2 ) Handshake problem
T (n) = 2T (n − 1) + 1 O(2n ) Towers of Hanoi
T (n) = T (n/2) + 1 O(ln n) Binary search
T (n) = T (n/2) + n O(n) Randomized select
T (n) = 2T (n/2) + 1 O(n) Tree transversal
T (n) = 2T (n/2) + n O(n ln n) Sort by divide and conquer
[find_fibonacci_seq.py]
def find_fibonacci_seq_rec(n):
if n < 2: return n
return find_fibonacci_seq_rec(n - 1) +
find_fibonacci_seq_rec(n - 2)
In conclusion, the following table shows the runtime results for several
known algorithms:
8.1 Stacks
A stack is a linear data structure that can be accessed only at one of its ends
(which we refers as the top) for either storing or retrieving. You can think
of a stack as a huge pile of books on your desk.
Array access of elements in a stack is restricted since a stack is a last-
in-first-out (LIFO) structure. However, stacks have the following operations
running at O(1):
111
112 CHAPTER 8. ABSTRACT DATA STRUCTURES
Stacks in Python can be implemented with lists and the methods append()
and pop():
[stack.py]
class Stack(object):
def __init__(self):
self.content = []
self.min_array = []
self.min = float(’inf’)
def pop(self):
if self.content:
value = self.content.pop()
self.min_array.pop()
if self.min_array:
self.min = self.min_array[-1]
return value
else:
return ’Empty List. ’
def find_min(self):
if self.min_array:
return self.min_array[-1]
else:
return ’No min value for empty list.’
def size(self):
return len(self.content)
def isEmpty(self):
8.1. STACKS 113
def peek(self):
if self.content:
return self.content[-1]
else:
print(’Stack is empty.’)
def __repr__(self):
return ’{}’.format(self.content)
if __name__ == ’__main__’:
q = Stack()
for i in range(15,20):
q.push(i)
for i in range(10,5,-1):
q.push(i)
for i in range(1, 13):
print q.pop(), q.find_min()
class Node(object):
def __init__(self, value=None, pointer=None):
self.value = value
self.pointer = pointer
class Stack(object):
def __init__(self):
self.head = None
def isEmpty(self):
return not bool(self.head)
def pop(self):
if self.head:
node = self.head
self.head = node.pointer
return node.value
else:
print(’Stack is empty.’)
def peek(self):
if self.head:
return self.head.value
else:
print(’Stack is empty.’)
def size(self):
node = self.head
count = 0
while node:
count +=1
node = node.pointer
return count
def _printList(self):
node = self.head
while node:
print(node.value)
node = node.pointer
if __name__ == ’__main__’:
stack = Stack()
print("Is the stack empty? ", stack.isEmpty())
print("Adding 0 to 10 in the stack...")
for i in range(10):
stack.push(i)
stack._printList()
print("Stack size: ", stack.size())
print("Stack peek : ", stack.peek())
print("Pop...", stack.pop())
print("Stack peek: ", stack.peek())
print("Is the stack empty? ", stack.isEmpty())
8.2. QUEUES 115
stack._printList()
8.2 Queues
A queue, differently of a stack, is a structure where the first enqueued element
(at the back) will be the first one to be dequeued (when it is at the front),
being defined as a first-in-first-out (FIFO) structure. You can think of a
queue as a line of people waiting for a roller-coaster ride.
Array access of elements in queues is restricted and queues should have
the following operations running at O(1):
Again, we can write a queue class with Python’s lists. However, this will
be very inefficient (because it uses the method insert()):
[queue_inefficient.py]
class Queue(object):
def __init__(self):
self.items = []
def isEmpty(self):
return not bool(self.items)
def dequeue(self):
116 CHAPTER 8. ABSTRACT DATA STRUCTURES
return self.items.pop()
def size(self):
return len(self.items)
def peek(self):
return self.items[-1]
def __repr__(self):
return ’{}’.format(self.items)
if __name__ == ’__main__’:
queue = Queue()
print("Is the queue empty? ", queue.isEmpty())
print("Adding 0 to 10 in the queue...")
for i in range(10):
queue.enqueue(i)
print("Queue size: ", queue.size())
print("Queue peek : ", queue.peek())
print("Dequeue...", queue.dequeue())
print("Queue peek: ", queue.peek())
print("Is the queue empty? ", queue.isEmpty())
print(queue)
A better way would write a queue using two stacks (two lists) instead of
one:
[queue.py]
class Queue(object):
def __init__(self):
self.in_stack = []
self.out_stack = []
def _transfer(self):
while self.in_stack:
self.out_stack.append(self.in_stack.pop())
def dequeue(self):
if not self.out_stack:
self._transfer()
if self.out_stack:
return self.out_stack.pop()
else:
return "Queue empty!"
def size(self):
return len(self.in_stack) + len(self.out_stack)
def peek(self):
if not self.out_stack:
self._transfer()
if self.out_stack:
return self.out_stack[-1]
else:
return "Queue empty!"
def __repr__(self):
if not self.out_stack:
self._transfer()
if self.out_stack:
return ’{}’.format(self.out_stack)
else:
return "Queue empty!"
def isEmpty(self):
return not (bool(self.in_stack) or bool(self.out_stack))
if __name__ == ’__main__’:
queue = Queue()
print("Is the queue empty? ", queue.isEmpty())
print("Adding 0 to 10 in the queue...")
for i in range(10):
queue.enqueue(i)
print("Queue size: ", queue.size())
print("Queue peek : ", queue.peek())
print("Dequeue...", queue.dequeue())
118 CHAPTER 8. ABSTRACT DATA STRUCTURES
class Node(object):
def __init__(self, value=None, pointer=None):
self.value = value
self.pointer = None
class LinkedQueue(object):
def __init__(self):
self.head = None
self.tail = None
def isEmpty(self):
return not bool(self.head)
def dequeue(self):
if self.head:
value = self.head.value
self.head = self.head.pointer
return value
else:
print(’Queue is empty, cannot dequeue.’)
self.tail.pointer = node
self.tail = node
def size(self):
node = self.head
num_nodes = 0
while node:
num_nodes += 1
node = node.pointer
return num_nodes
def peek(self):
return self.head.value
def _print(self):
node = self.head
while node:
print(node.value)
node = node.pointer
if __name__ == ’__main__’:
queue = LinkedQueue()
print("Is the queue empty? ", queue.isEmpty())
print("Adding 0 to 10 in the queue...")
for i in range(10):
queue.enqueue(i)
print("Is the queue empty? ", queue.isEmpty())
queue._print()
print("Queue size: ", queue.size())
print("Queue peek : ", queue.peek())
print("Dequeue...", queue.dequeue())
print("Queue peek: ", queue.peek())
queue._print()
Deques
A deque is a double-ended queue, which can roughly be seen as an union of
a stack and a queue.
Python has an efficient deque implementation, with fast appending and
popping from both ends:
>>> from collections import deque
>>> q = deque(["buffy", "xander", "willow"])
>>> q
deque([’buffy’, ’xander’, ’willow’])
>>> q.append("giles")
>>> q
deque([’buffy’, ’xander’, ’willow’, ’giles’])
>>> q.popleft()
’buffy’
>>> q.pop()
’giles’
>>> q
deque([’xander’, ’willow’])
>>> q.appendleft(’angel’)
>>> q
deque([’angel’, ’xander’, ’willow’])
Heaps
Conceptually, a heap is a binary tree where each node is smaller (larger) than
its children. We will learn about trees in the next chapters but we should
already keep in mind that when modifications are made in a balanced tree,
we can repair its structure with O(logn) runtimes.
Heaps are generally useful for applications that repeatedly access the
smallest (largest) element in the list. A min-(max-)heap lets you to find the
smallest (largest) element in O(1) and to extract/add/replace it in O(ln n).
>>> l
[4, 6, 8]
class Heapify(object):
def __init__(self, data=None):
self.data = data or []
for i in range(len(data)//2, -1, -1):
self.__max_heapify__(i)
def __repr__(self):
8.3. PRIORITY QUEUES AND HEAPS 123
return ’{}’.format(self.data)
def extract_max(self):
n = len(self.data)
max_element = self.data[0]
self.data[0] = self.data[n - 1]
self.data = self.data[:n - 1]
self.__max_heapify__(0)
return max_element
def test_Heapify():
l1 = [3, 2, 5, 1, 7, 8, 2]
h = Heapify(l1)
assert(h.extract_max() == 8)
124 CHAPTER 8. ABSTRACT DATA STRUCTURES
To conclude this section, the following example shows how to use the heapq
package to implement a priority queue class:
[PriorityQueueClass.py]
import heapq
class PriorityQueue(object):
def __init__(self):
self._queue = []
self._index = 0 # comparying same priority level
def pop(self):
return heapq.heappop(self._queue)[-1]
class Item:
def __init__(self, name):
self.name = name
def __repr__(self):
return "Item({!r})".format(self.name)
def test_PriorityQueue():
’’’ push and pop are all O(logN) ’’’
q = PriorityQueue()
q.push(Item(’test1’), 1)
q.push(Item(’test2’), 4)
q.push(Item(’test3’), 3)
assert(str(q.pop()) == "Item(’test2’)")
8.4. LINKED LISTS 125
class Node(object):
def __init__(self, value=None, pointer=None):
self.value = value
self.pointer = pointer
def getData(self):
return self.value
def getNext(self):
return self.pointer
if __name__ == ’__main__’:
L = Node("a", Node("b", Node("c", Node("d"))))
assert(L.pointer.pointer.value==’c’)
print(L.getData())
print(L.getNext().getData())
L.setData(’aa’)
L.setNext(Node(’e’))
print(L.getData())
print(L.getNext().getData())
class LinkedListLIFO(object):
def __init__(self):
self.head = None
self.length = 0
prev = None
node = self.head
found = 0
while node and not found:
if node.value == value:
found = True
else:
prev = node
node = node.pointer
return node, prev, found
if __name__ == ’__main__’:
ll = LinkedListLIFO()
for i in range(1, 5):
ll._add(i)
print(’The list is:’)
ll._printList()
print(’The list after deleting node with index 2:’)
ll.deleteNode(2)
ll._printList()
print(’The list after deleting node with value 3:’)
ll.deleteNodeByValue(2)
ll._printList()
print(’The list after adding node with value 15’)
128 CHAPTER 8. ABSTRACT DATA STRUCTURES
ll._add(15)
ll._printList()
print("The list after deleting everything...")
for i in range(ll.length-1, -1, -1):
ll.deleteNode(i)
ll._printList()
class LinkedListFIFO(object):
def __init__(self):
self.head = None
self.length = 0
self.tail = None # this is different from ll lifo
prev.pointer = node.pointer
if not self.tail == node:
self.tail = prev
else:
print(’Node with index {} not found’.format(index))
if __name__ == ’__main__’:
ll = LinkedListFIFO()
for i in range(1, 5):
ll.addNode(i)
print(’The list is:’)
ll._printList()
print(’The list after deleting node with index 2:’)
ll.deleteNode(2)
ll._printList()
print(’The list after adding node with value 15’)
ll._add(15)
ll._printList()
print("The list after deleting everything...")
for i in range(ll.length-1, -1, -1):
ll.deleteNode(i)
ll._printList()
Linked lists have a dynamic size at runtime and they are good for when
you have an unknown number of items to store.
Insertion is O(1) but deletion and searching can be O(n) because locating
an element in a linked list is slow and is it done by a sequential search. A
good trick to delete a i node at O(1) is copying the data from i + 1 to i and
then to deleting the i + 1 node.
Traversing a linked list backward or sorting it are both O(n2 ).
buckets. For example, if a hash value is 42, and there are 5 buckets, the mod
function is used to decide that the destination bucket is 2 (42 mod 5).
When two objects have the same hash value, we have a collision. One
way to deal with this is to have a linked list for each bucket. In the worst-
case scenario, each key hashes to the same bucket, so insertion, removal, and
lookup will take O(n) time.
[Hash_Table.py]
class HashTable(object):
def __init__(self, slots=10):
self.slots = slots
self.table = []
self.create_table()
def create_table(self):
for i in range(self.slots):
self.table.append([])
def print_table(self):
for key in range(len(self.table)):
print "Key is %s, value is %s." %(key, self.table[key])
if __name__ == ’__main__’:
132 CHAPTER 8. ABSTRACT DATA STRUCTURES
dic = HashTable(5)
for i in range(1, 40, 2):
dic.add_item(i)
dic.print_table()
assert(dic.find_item(20) == False)
assert(dic.find_item(21) == True)
8.6. ADDITIONAL EXERCISES 133
def balance_par_str_with_stack(str1):
s = Stack()
balanced = True
index = 0
while index < len(str1) and balanced:
symbol = str1[index]
if symbol == "(":
s.push(symbol)
else:
if s.isEmpty():
balanced = False
else:
s.pop()
index += 1
if balanced and s.isEmpty():
return True
else:
return False
if __name__ == ’__main__’:
print(balance_par_str_with_stack(’((()))’))
print(balance_par_str_with_stack(’(()’))
Decimal to Binary
The following example uses a stack to transform a decimal number to binary
number:
134 CHAPTER 8. ABSTRACT DATA STRUCTURES
[dec2bin_with_stack.py]
def dec2bin_with_stack(decnum):
s = Stack()
str_aux = ’’
while decnum > 0:
dig = decnum % 2
decnum = decnum//2
s.push(dig)
while not s.isEmpty():
str_aux += str(s.pop())
return str_aux
if __name__ == ’__main__’:
decnum = 9
assert(dec2bin_with_stack(decnum) == ’1001’)
class NodeWithMin(object):
def __init__(self, value=None, minimum=None):
self.value = value
self.minimum = minimum
class StackMin(Stack):
def __init__(self):
self.items = []
self.minimum = None
def peek(self):
return self.items[-1].value
def peekMinimum(self):
return self.items[-1].minimum
def pop(self):
item = self.items.pop()
if item:
if item.value == self.minimum:
self.minimum = self.peekMinimum()
return item.value
else:
print("Stack is empty.")
def __repr__(self):
aux = []
for i in self.items:
aux.append(i.value)
return ’{}’.format(aux)
if __name__ == ’__main__’:
stack = StackMin()
print("Is the stack empty? ", stack.isEmpty())
print("Adding 0 to 10 in the stack...")
for i in range(10, 0, -1):
stack.push(i)
for i in range(1, 5):
stack.push(i)
print(stack)
print("Stack size: ", stack.size())
print("Stack peek and peekMinimum : ", stack.peek(),
stack.peekMinimum())
print("Pop...", stack.pop())
print("Stack peek and peekMinimum : ", stack.peek(),
stack.peekMinimum())
136 CHAPTER 8. ABSTRACT DATA STRUCTURES
Set of Stacks
The following example implements a set of stacks, creating a new stack when
the previous exceeds capacity. The push and pop methods are identical to a
single stack:
[set_of_stacks.py]
class SetOfStacks(Stack):
def __init__(self, capacity=4):
self.setofstacks = []
self.items = []
self.capacity = capacity
def pop(self):
value = self.items.pop()
if self.isEmpty() and self.setofstacks:
self.items = self.setofstacks.pop()
return value
def sizeStack(self):
return len(self.setofstacks)*self.capacity + self.size()
def __repr__(self):
aux = []
for s in self.setofstacks:
aux.extend(s)
8.6. ADDITIONAL EXERCISES 137
aux.extend(self.items)
return ’{}’.format(aux)
if __name__ == ’__main__’:
capacity = 5
stack = SetOfStacks(capacity)
print("Is the stack empty? ", stack.isEmpty())
print("Adding 0 to 10 in the stack...")
for i in range(10):
stack.push(i)
print(stack)
print("Stack size: ", stack.sizeStack())
print("Stack peek : ", stack.peek())
print("Pop...", stack.pop())
print("Stack peek: ", stack.peek())
print("Is the stack empty? ", stack.isEmpty())
print(stack)
Queues
Using a Queue to Model an Animal Shelter
[animal_shelter.py]
class Node(object):
def __init__(self, animalName=None, animalKind=None,
pointer=None):
self.animalName = animalName
self.animalKind = animalKind
self.pointer = pointer
self.timestamp = 0
class AnimalShelter(object):
def __init__(self):
self.headCat = None
self.headDog = None
self.tailCat = None
self.tailDog = None
138 CHAPTER 8. ABSTRACT DATA STRUCTURES
self.animalNumber = 0
if animalKind == ’cat’:
if not self.headCat:
self.headCat = newAnimal
if self.tailCat:
self.tailCat.pointer = newAnimal
self.tailCat = newAnimal
def dequeueDog(self):
if self.headDog:
newAnimal = self.headDog
self.headDog = newAnimal.pointer
return str(newAnimal.animalName)
else:
return ’No Dogs!’
def dequeueCat(self):
if self.headCat:
newAnimal = self.headCat
self.headCat = newAnimal.pointer
return str(newAnimal.animalName)
else:
return ’No Cats!’
def dequeueAny(self):
if self.headCat and not self.headDog:
return self.dequeueCat()
8.6. ADDITIONAL EXERCISES 139
def _print(self):
print("Cats:")
cats = self.headCat
while cats:
print(cats.animalName, cats.animalKind)
cats = cats.pointer
print("Dogs:")
dogs = self.headDog
while dogs:
print(dogs.animalName, dogs.animalKind)
dogs = dogs.pointer
if __name__ == ’__main__’:
qs = AnimalShelter()
qs.enqueue(’bob’, ’cat’)
qs.enqueue(’mia’, ’cat’)
qs.enqueue(’yoda’, ’dog’)
qs.enqueue(’wolf’, ’dog’)
qs._print()
print("Deque one dog and one cat...")
qs.dequeueDog()
qs.dequeueCat()
qs._print()
[find_N_largest_smallest_items_seq.py]
import heapq
def find_smallest_items_seq_heap(seq):
heapq.heapify(seq)
return heapq.heappop(seq)
def find_smallest_items_seq(seq):
return min(seq)
def test_find_N_largest_smallest_items_seq(module_name=’this
module’):
seq = [1, 3, 2, 8, 6, 10, 9]
N = 3
assert(find_N_largest_items_seq(seq, N) == [10, 9, 8])
assert(find_N_largest_items_seq_sorted(seq, N) == [8, 9, 10])
assert(find_N_smallest_items_seq(seq, N) == [1,2,3])
assert(find_N_smallest_items_seq_sorted(seq, N) == [1,2,3])
assert(find_smallest_items_seq(seq) == 1)
assert(find_smallest_items_seq_heap(seq) == 1)
The following example uses Python’s heapq package to merge a two sorted
sequences with little overhead:
[merge_sorted_seqs.py]
8.6. ADDITIONAL EXERCISES 141
import heapq
Linked List
Find the kth Element from the End of a Linked List
[find_kth_from_the_end.py]
class LinkedListFIFO_find_kth(LinkedListFIFO):
if __name__ == ’__main__’:
ll = LinkedListFIFO_find_kth()
for i in range(1, 11):
ll.addNode(i)
print(’The Linked List:’)
print(ll._printList())
k = 3
k_from_last = ll.find_kth_to_last(k)
print("The %dth element to the last of the LL of size %d is %d"
%(k, ll.length, k_from_last))
return less
if __name__ == ’__main__’:
ll = LinkedListFIFO()
l = [6, 7, 3, 4, 9, 5, 1, 2, 8]
for i in l:
ll.addNode(i)
print(’Before Part’)
ll._printList()
print(’After Part’)
newll = partList(ll, 6)
newll._printList()
class dNode(object):
def __init__(self, value=None, pointer=None, previous=None):
self.value = value
self.pointer = pointer
self.previous = previous
class dLinkList(LinkedListFIFO):
if __name__ == ’__main__’:
from collections import Counter
8.6. ADDITIONAL EXERCISES 145
ll = dLinkList()
for i in range(1, 5):
ll.addNode(i)
print(’Printing the list...’)
ll._printList()
print(’Now, printing the list inversely...’)
ll.printListInverse()
print(’The list after adding node with value 15’)
ll._add(15)
ll._printList()
print("The list after deleting everything...")
for i in range(ll.length-1, -1, -1):
ll.deleteNode(i)
ll._printList()
Supposing two linked lists representing numbers, such that in each of their
nodes they carry one digit. The function below sums the two numbers that
these two linked lists represent, returning a third list representing the sum:
[sum_linked_list.py]
class LinkedListFIFOYield(LinkedListFIFO):
# print each node’s value, starting from the head
def _printList(self):
node = self.head
while node:
yield(node.value)
node = node.pointer
if __name__ == ’__main__’:
l1 = LinkedListFIFOYield() # 2671
l1.addNode(1)
l1.addNode(7)
l1.addNode(6)
l1.addNode(2)
l2 = LinkedListFIFOYield() # 455
l2.addNode(5)
l2.addNode(5)
l2.addNode(4)
lsum = sumlls(l1, l2)
l = list(lsum._printList())
8.6. ADDITIONAL EXERCISES 147
for i in reversed(l):
print i
class cicularLinkedListFIFO(LinkedListFIFO):
def _add(self, value):
self.length += 1
node = Node(value, self.head)
if self.tail:
self.tail.pointer = node
self.tail = node
def isCircularll(ll):
p1 = ll.head
p2 = ll.head
while p2:
try:
p1 = p1.pointer
p2 = p2.pointer.pointer
except:
break
if p1 == p2:
return True
return False
if __name__ == ’__main__’:
ll = LinkedListFIFO()
for i in range(10):
ll.addNode(i)
ll._printList()
148 CHAPTER 8. ABSTRACT DATA STRUCTURES
print(isCircularll(ll))
lcirc = cicularLinkedListFIFO()
for i in range(10):
lcirc.addNode(i)
print(isCircularll(lcirc))
Chapter 9
Sorting
The simplest way to sort a group of items is to start by removing the smallest
item from the group, and putting it in the first position. Then removing the
next smallest item, and putting it as second, and so on.
This example is clearly an O(n2 ) algorithm. It is obvious that we need
better solutions! This is what this chapter is about. We are going to look at
many sorting algorithms and we analyze their features and runtimes.
But first some nomenclature. An in-place sort does not use any additional
memory to do the sorting (for example, swapping elements in an array). A
stable sort preserves the relative order of otherwise identical data elements
(for example, if two data elements have identical values, the one that was
ahead of the other stays ahead). In any comparison sort problem, a key is
the value (or values) that determines the sorting order. A comparison sort
requires only that there is a way to determine if a key is less than, equal
to, or greater than another key. Most sorting algorithms are comparison
sorts where the worst-case running time for such sorts can be no better than
O(n ln n).
149
150 CHAPTER 9. SORTING
def insertion_sort(seq):
for i in range(1, len(seq)):
j = i
while j > 0 and seq[j-1] > seq[j]:
seq[j-1], seq[j] = seq[j], seq[j-1]
j -= 1
return seq
Selection Sort
Selection sort is based on finding the smallest or largest element in a list
and exchanging it to the first, then finding the second, etc, until the end is
reached. Even when the list is sorted, it is O(n2 ) (and not stable):
[selection_sort.py]
def selection_sort(seq):
for i in range(len(seq) -1, 0, -1):
max_j = i
for j in range(max_j):
9.2. LINEAR SORT 151
Gnome Sort
Gnome sort works by moving forward to find a misplaced value and then
moving backward to place it in the right position:
[gnome_sort.py]
def gnome_sort(seq):
i = 0
while i < len(seq):
if i ==0 or seq[i-1] <= seq[i]:
i += 1
else:
seq[i], seq[i-1] = seq[i-1], seq[i]
i -= 1
return seq
[count_sort.py]
def count_sort_dict(a):
b, c = [], defaultdict(list)
for x in a:
c[x].append(x)
for k in range(min(c), max(c) + 1):
b.extend(c[k])
return b
If several values have the same key, they will have the original order with
respect with each other, so the algorithm is stable.
Merge Sort
Merge sort divides the list in half to create two unsorted lists. These two
unsorted lists are sorted and merged by continually calling the merge-sort
algorithm, until you get a list of size 1.
1
Timsort is a hybrid sorting algorithm, derived from merge sort and insertion sort, and
invented by Tim Peters for Python.
9.3. LOGLINEAR SORT 153
Merge sort is stable, as well as fast, for large data sets. However, since it
is not in-place, it requires much more memory than many other algorithms.
The space complexity is O(n) for arrays and O(ln n) for linked lists2 . The
best, average, and worst case times are all O(n ln n).
Merge sort is a good choice when the data set is too large to fit into the
memory. The subsets can be written to disk in separate files until they are
small enough to be sorted in memory. The merging is easy, as it involves
reading single elements at a time from each file and writing them to the final
file in the correct order:
[merge_sort.py]
def merge_sort(array):
’’’
>>> merge_sort([3 ,5, 1, 2, 10, 6])
[1, 2, 3, 5, 6, 10]
’’’
if len(array) < 2:
return array
mid = len(array)//2
left = merge_sort(array[:mid])
right = merge_sort(array[mid:])
res = []
i, j = 0, 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
res.append(left[i])
i += 1
else:
res.append(right[j])
j += 1
if left[i:]:
res.extend(left[i:])
if right[j:]:
res.extend(right[j:])
return res
2
Never ever consider to sort a linked list!
154 CHAPTER 9. SORTING
Quick Sort
Quick sort works by choosing a pivot and partitioning the array so that the
elements that are smaller than the pivot go to the left, and elements that are
larger go to the right. Then, it recursively sorts these left and right parts.
The choice of the pivot value is a key to the performance. It can be shown
that always choosing the value in the middle of the set is the best choice
for already-sorted data and no worse than most other choices for random
unsorted data.
The wort case runtime is O(n2 ) in the rare cases when partitioning keeps
producing a region of n − 1 elements (when the pivot is the minimum or the
maximum value). The best case produces two n/2-sized lists, being O(n ln n).
This algorithm is not stable.
Here is an example of quick sort in Python:
[quick_sort.py]
def quick_sort(seq):
if len(seq) < 2: return seq
ipivot = len(seq)//2
pivot = seq[ipivot]
before = [x for i,x in enumerate(seq) if x <= pivot and i !=
ipivot]
after = [x for i,x in enumerate(seq) if x > pivot and i !=
ipivot]
return qs(before) + [pivot] + qs(after)
def quick_sort_divided(seq):
if len(seq) < 2: return seq
lo, pi, hi = partition(seq)
9.3. LOGLINEAR SORT 155
Heap Sort
Heap sort is similar to a selection sort, except that the unsorted region is a
heap, so finding the largest element n times results in a loglinear runtime.
In a heap, for every node other than the root, it has at least (at most)
the value of its parent. Thus, the smallest (largest) element is stored at the
root and the subtrees rooted at a node contain only larger (smaller) values
than does the node itself.
Although the insertion is only O(1), the performance of validating (the
heap order) is O(ln n). Searching (traversing) is O(n).
In Python, a heap sort can be implemented by pushing all values onto a
heap and then popping off the smallest values one at a time:
[heap_sort1.py]
import heapq
def heap_sort1(seq):
h = []
for value in seq:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
156 CHAPTER 9. SORTING
Quadratic Sort
The following program implements a bubble sort, a very inefficient sorting
algorithm:
[bubble_sort.py]
def bubble_sort(seq):
size = len(seq) -1
for num in range(size, 0, -1):
for i in range(num):
if seq[i] > seq[i+1]:
temp = seq[i]
seq[i] = seq[i+1]
seq[i+1] = temp
return seq
9.5. ADDITIONAL EXERCISES 157
Linear Sort
The example below shows a simple count sort for people ages:
[counting_sort.py]
def counting_sort_age(A):
oldestAge = 100
timesOfAge = [0]*oldestAge
ageCountSet = set()
B = []
for i in A:
timesOfAge[i] += 1
ageCountSet.add(i)
for j in ageCountSet:
count = timesOfAge[j]
while count > 0:
B.append(j)
count -= 1
return B
The example below uses quick sort to find the k largest elements in a
sequence:
[find_k_largest_seq_quicksort.py]
import random
Searching
The most common searching algorithms are the sequential search and the
binary search. If an input array is not sorted, or the input elements are
accommodated by dynamic containers (such as linked lists), the search is
usually sequential. If the input is a sorted array, the binary search algorithm
is the best choice. If we are allowed to use auxiliary memory, a hash table
might help the search, with which a value can be located in O(1) time (with
a key).
159
160 CHAPTER 10. SEARCHING
Now, if we sort the sequence first, we can improve the sequential search
in the case when the item is not present to have the same runtimes as when
the item is present:
[ordered_sequential_search.py]
In general, we can define the median as the the value that is bigger than
half of the array. This algorithm is important in the context of larger prob-
lems such as finding the nearest neighbor or the shortest path.
if hi < lo:
return False
mid = (hi + lo)//2
if item == array[mid]:
return mid
elif item < array[mid]:
return binary_search(array, item, hi=mid-1, lo=lo)
else:
return binary_search(array, item, hi=hi, lo=mid+1)
Note that the module returns the index after the key, which is where you
should place the new value. Other available functions are bisect right and
bisect left.
164 CHAPTER 10. SEARCHING
import numpy
lo = 0
hi = rows*cols
while lo < hi:
mid = (lo + hi)//2
row = mid//cols
col = mid%cols
v = m1[row][col]
if v == value:
return True
elif v > value:
hi = mid
else:
lo = mid+1
return False
Unimodal Arrays
An array is unimodal if it is consisted of an increasing sequence followed by
a decreasing sequence.
The example below shows how to find the “locally maximum” of an array
using binary search:
[find_max_unimodal_array.py]
def find_max_unimodal_array(A):
if len(A) <= 2 :
return None
left = 0
right = len(A)-1
while right > left +1:
mid = (left + right)//2
if A[mid] > A[mid-1] and A[mid] > A[mid+1]:
return A[mid]
elif A[mid] > A[mid-1] and A[mid] < A[mid+1]:
left = mid
else:
right = mid
return None
166 CHAPTER 10. SEARCHING
count +=1
else:
break
for i in range(index_some_k-1, -1, -1): # go down
if seq[i] == k:
count +=1
else:
break
return count
Intersection of Arrays
The snippet below shows three ways to perform the intersection of two sorted
arrays.
The simplest way is to use sets, however this will not preserve the order-
ing.
The second example uses an adaptation of the merge sort.
The third example is suitable when one of the arrays is much larger than
other. In this case, binary search is the best option:
[intersection_two_arrays.py]
res.reverse()
return res
Dynamic Programming
11.1 Memoization
Dynamically Solving the Fibonacci Series
High-level languages such as Python can implement the recursive formulation
directly, caching return values. Memoization is a method which for each call
made more than once (with the same arguments), the result is returned
directly from the cache.
For example, we can dynamically solve the exponential Fibonacci series.
For this, we use a memo function that uses nested scopes to give the wrapped
function memory:
[memo.py]
169
170 CHAPTER 11. DYNAMIC PROGRAMMING
def memo(func):
cache = {}
@wraps(func)
def wrap(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return wrap
def naive_longest_inc_subseq(seq):
for length in range(len(seq), 0, -1):
for sub in combinations(seq, length):
if list(sub) == sorted(sub):
return len(sub)
def memoized_longest_inc_subseq(seq):
@memo
def L(cur):
res = 1
for pre in range(cur):
if seq[pre] <= seq[cur]:
res = max(res, 1 + L(pre))
return res
return max(L(i) for i in range(len(seq)))
172 CHAPTER 11. DYNAMIC PROGRAMMING
Part III
173
Chapter 12
Introduction to Graphs
Direction of a Graph
Subgraphs
175
176 CHAPTER 12. INTRODUCTION TO GRAPHS
Completeness of a Graph
If all the nodes in a graph are pairwise adjacent, the graph is called complete.
Degree in a Node
The number of undirected edges incident on a node is called degree. Zero-
degree graphs are called isolated.
For directed graphs, we can split this number into in-degree (incoming
edges/parents) and out-degree/children (outgoing edges).
Length of a Path
The length of a path or walk is the value given by its edge count.
Weight of an Edge
Associating weights with each edge in G gives us a weighted graph. The
weight of a path or cycle is the sum of its edge weights. So, for unweighted
graphs, it is simply the number of edges.
Planar Graphs
A graph that can be drawn on the plane without crossing edges is called
planar. This graph has regions, which are areas bounded by the edges.
Interestingly, the Euler’s formula for connected planar graphs says that
V − E + F = 2, where V, E, F are the number of nodes, edges, and regions,
respectively.
12.2. THE NEIGHBORHOOD FUNCTION 177
Graph Traversal
Adjacent Lists
For each node in an adjacent list, we have access to a list (or set or container
or iterable) of its neighbor.
Supposing we have n nodes, each adjacent (or neighbor) list is just a list
of such numbers. We place the lists into a main list of size n, indexable by
the node numbers, where the order is usually arbitrary.
Deleting objects from the middle of a Python list is O(n), but deleting
from the end is only O(1). If the order of neighbors is not important, you
can delete an arbitrary neighbor in O(1) time by swapping it in to the last
item in the list and then calling pop().
Adjacent Matrices
In adjacent matrices, instead of listing all the neighbors for each node, we
have one row with one position for each possible neighbor, filled with True
and False values.
The simplest implementation of adjacent matrices is given by nested lists.
Note that the diagonal is always False:
180 CHAPTER 12. INTRODUCTION TO GRAPHS
Representing Trees
The simplest way of representing a tree is by a nested lists:
>>> T = [’a’, [’b’, [’d’, ’f’]], [’c’, [’e’, ’g’]]]
>>> T[0]
’a’
>>> T[1][0]
’b’
>>> T[1][1][0]
’d’
>>> T[1][1][1]
’f’
>>> T[2][0]
’c’
>>> T[2][1][1]
’g’
However, this becomes very hard to handle if we simply add a couple more
branches. The best solution is representing trees as a collection of nodes.
Let us start with a simple example, where we define a simple tree class
with an attribute for value, another for a children (or ‘next’), and a method
for printing the result:
[tree.py]
class SimpleTree(object):
def __init__(self, value=None, children = None):
182 CHAPTER 12. INTRODUCTION TO GRAPHS
self.value = value
self.children = children
if self.children == None:
self.children = []
def main():
"""
’a’
’b’
’d’
’e’
’c’
’h’
’g’
"""
st = SimpleTree(’a’, [SimpleTree(’b’, [SimpleTree(’d’),
SimpleTree(’e’)] ), SimpleTree(’c’, [SimpleTree(’h’),
SimpleTree(’g’)]) ])
print(st)
In the next chapter we will learn how to improve this class, including
many features and methods that a tree can hold.
Chapter 13
Binary Trees
Binary trees are tree data structures where each node has at most two child
nodes: the left and the right. Child nodes may contain references to their
parents. The root of a tree (the ancestor of all nodes) can exist either inside
or outside the tree.
The degree of every node is at most two. Supposing that an arbitrary
rooted tree has m internal nodes and each internal node has exactly two
children, if the tree has n leaves, the degree of the tree is n − 1:
2m = n + m − 1 → m = n − 1,
class Node(object):
def __init__(self, item=None,):
self.item = item
self.left = None
183
184 CHAPTER 13. BINARY TREES
Figure 13.1: The height (h) and width (number of leaves) of a (perfectly
balanced) binary tree.
self.right = None
def __repr__(self):
return ’{}’.format(self.item)
if self.right:
found = found or self.right._search(value)
return found
def _isLeaf(self):
return not self.right and not self.left
def printPreorder(self):
current = self.root
nodes, stack = [], []
while stack or current:
if current:
nodes.append(current.item)
stack.append(current)
current = current.left
else:
current = stack.pop()
current = current.right
print nodes
class Node(object):
(...)
def _add(self, value):
new_node = Node(value)
if not self.item:
self.item = new_node
else:
if value > self.item:
self.right = self.right and self.right._add(value)
or new_node
elif value < self.item:
self.left = self.left and self.left._add(value) or
new_node
else:
print("BSTs do not support repeated items.")
return self
AVL Trees
An AVL tree is a binary search tree with a self-balancing condition where the
difference between the height of the left and right subtrees cannot be more
than one.
To implement an AVL tree, we can start by adding a self-balancing
method to our BST classes, called every time we add a new node to the
tree. The method works by continuously checking the height of the tree,
which is added as a new attribute.
Red-black Trees
Red-black trees are an evolution of a binary search trees that aim to keep the
tree balanced without affecting the complexity of the primitive operations.
188 CHAPTER 13. BINARY TREES
This is done by coloring each node in the tree with either red or black
and preserving a set of properties that guarantees that the deepest path in
the tree is not longer than twice the shortest one.
Red-black trees have the following properties:
? All leaf (nil) nodes are colored with black; if a node’s child is missing
then we will assume that it has a nil child in that place and this nil
child is always colored black.
? Every path from a node n to a descendant leaf has the same number
of black nodes (not counting node n). We call this number the black
height of n.
Binary Heaps
Binary heaps are complete balanced binary trees. The heap property makes
it easier to maintain the structure, i.e., the balance of the tree.
In a heap, there is no need to modify the tree structure by splitting or
rotating nodes: the only operations needed is swapping (both parents and
child nodes).
In a binary heap, considering a node at index i, we have that:
i−1
? the parent index is 2
,
[check_largest_item.py]
def largest(node):
if node.right:
return largest(node.right)
return node.item
[check_if_balanced.py]
[check_if_bst.py]
Traversing Trees
Traversals are algorithms used to visit the objects (nodes) in some connected
structure, such as a tree or a graph. Traversal problems can be either visiting
every node or visiting only some specific nodes.
Figure 14.1: Binary tree traversals: preorder, inorder, postorder, and breath-
first search.
191
192 CHAPTER 14. TRAVERSING TREES
nodes + total number of outgoing edges from these nodes) = O(V + E).
DFS comes in three flavors, depending on when you print (or save) the
current node’s value:
Postorder: Visit a node after traversing all subtrees (left → right → root):
def postorder(root):
if root != 0:
postorder(root.left)
postorder(root.right)
yield root.value
Inorder: Visit a node after traversing its left subtree but before the right
subtree (left → root → right):
def inorder(root):
if root != 0:
inorder(root.left)
yield root.value
inorder(root.right)
Problems that employ BFSs usually are used to find the fewest number
of steps (or the shortest path) needed to reach a certain end point from the
starting point.
Traditionally, BFSs are implemented using a list to store the values of
the visited nodes and then a FIFO queue to store those nodes that have yet
to be visited. The total runtime is also O(V + E).
An example of a BFS algorithm is the following:
[transversal.py]
def BFT(tree):
current = tree.root
nodes = []
queue = deque()
queue.append(current)
while queue:
current = queue.popleft()
nodes.append(current.item)
if current.left:
queue.append(current.left)
if current.right:
queue.append(current.right)
return nodes
194 CHAPTER 14. TRAVERSING TREES
The example below finds the lowest level common ancestor of two nodes in
a binary search tree from an inoder list:
[check_ancestor.py]
while path:
current_value = path[0]
if __name__ == ’__main__’:
bst = BST()
l = [10, 5, 6, 3, 8, 2, 1, 11, 9, 4]
for i in l:
bst.add(i)
nodes = inorder(bst.root)
print ’Inorder: ’, nodes
print ’Ancestor for 3, 11:’, find_ancestor(nodes, 3, 11)
Index
-c, 67 collections
defaultdict, 56
add(), 48 namedtuple, 83
adjacent OrderedDict, 56
lists, 178 comprehensions, 37
matrices, 179 continue, 66
algorithms count(), 28, 30, 35
travelling salesman, 104 cProfile, 98
all, 68
append(), 33 daemons, 94
array debugging, 97
unimodal, 164 decimal, 13, 25
arrays, 31 decorators, 88, 169
as, 78 del, 35, 84
assert, 10 DFS, 113, 177, 191
asymptotic analysis, 103 dictionaries, 51, 179
difference(), 48
BFS, 117, 177, 192 discard(), 49
big O, 103 docstrings, 99
bin, 13 doctest, 99
binary search, 161, 162 dynamic programming, 169
bisect(), 162
break, 66 enumerate(), 79
bytearray, 40 exceptions, 10, 76
bytes, 40 EnvironmentError, 74, 75
KeyError, 49
cache, 169 PickingError, 74
class, 78, 85 StandardError, 78
clear(), 49, 54 unpickingError, 75
close(), 72 ValueError, 28, 34
cmath, 11 extend(), 33
195
196 INDEX
raise, 77 test
range(), 78 unit, 98
raw bytes, 40 unittest, 10
read(), 71 threading, 93, 94
readline(), 71 timeit, 38, 98
readlines(), 71 traceback, 76
recursive relations, 106 Trees
remove(), 34, 49 red-black, 187
replace(), 29 trees, 177
reverse(), 36 AVL, 187
rjust(), 24 binary, 183
BST, 186
seek(), 72 rotation, 187
series try, 72–75, 78
fibonacci, 169 tuples, 29
sets, 47, 178 named, 31
shutil, 73
singletons, 64 unicode, 24, 75
slicing, 23, 64 union, 47
sort(), 36 union(), 48
sorted(), 55, 150 unittest, 10
sorting update(), 48
gnome, 149 values(), 53
heap, 153
insertion, 147 write(), 72
quick, 152
quick, 160 yield, 55, 65
split(), 26, 27 zip(), 79
splitlines(), 26
stack, 105
stacks, 67, 76, 109
strings, 23, 27
strip(), 26, 79
struct, 75
subprocess, 93
sys, 68
tell(), 72
198 INDEX
Bibliography
Websites:
[Interactive Python] http://interactivepython.org
Books:
[Cracking the Coding Interview] Gayle Laakmann McDowell, 2013
199