Graph Theory
Graph Theory
Graph Theory
Seeking the shortest round trip connecting state capitals (lower 48 only).
Graph Theory
1
to include information about the mileage of each intercity road. In that case, we say we are using weighted
edges.
This simple framework is enough for us to model a number of interesting real world problems involving this
simplified geometry, to propose algorithms for solving them, and computer programs for efficiently finding
solutions.
An alternative way to describe the adjacency matrix is that A(i, j) is the number of paths from node i to
node j using a single edge. This may sound awfully pointless, but let’s compute the matrix A2 = A ∗ A:
1 1 0 0 1
1 2 0 0 2
2
A = 0 0 2 2 0
0 0 2 3 0
1 2 0 0 2
If you look at the diagram of the graph, you should convince yourself that the fact that there are indeed
exactly 2 distinct paths, involving two consecutive edges, to get from the node labeled 3 to the node labeled
4. Can you see what it means to say there are 3 distinct paths of length 2 to get from node 4 back to node
4?
2
We say that a graph is connected if for any starting node i there is always at least one path, of some length,
to any final node j. If the graph has n nodes, then this path must be of length n − 1 or less. Given the
adjacency matrix A for a graph, consider, for 0 ≤ ` < n, the matrices P (`) defined by:
k≤`
X
P (`) = Ak
k=0
P (`) records the number of paths of length ` or less, between any pair of nodes. For our example graph, we
have
5 6 12 17 6
6 12 23 29 11
P (4) = A0 + A1 + A2 + A3 + A4 =
12 23 11 12 23
17 29 12 17 29
6 11 23 29 12
For a given graph of n nodes, the path matrix P (n − 1) indicates the number of paths (of any length 0
through n − 1) available to reach any node i from any node j. For our example graph, why does P (4) by
itself tell you that the graph is connected?
3 Edge Distance
From a given node i in a graph, we may ask how far away another node j is; that is, the length of the
shortest path from i to j, if any. We may symbolize this edge distance as d(j). Of course, d(i) is 0, and if
there is no path from i to j, we may set d(j) = ∞.
If we see a plot of a small graph, we can easily work out the values of d from a given starting point. But
how could this be done computationally, even for a rather large and unseen graph?
A simple algorithm is as follows:
The graph G has n nodes
Initialize all n values of d to oo.
Set d to zero for the starting node.
found = 1
while ( 0 < found )
found = 0
for every node i
if d(i) < oo
for every node j
if A(i,j) == 1
if ( d(j) == oo )
d(j) = d(i) + 1
found = found + 1
Using the adjacency matrix A[,] and a numpy array d[], it is not too hard to implement this algorithm in
Python. The exercises will ask you to try to do this, and apply it to the next example.
3
From the previous algorithm, we can imagine starting at node Q with distance 0, then marking all the
neighbors of Q as having distance 1, then marking all the neighbors of neighbors of ttQ as having distance
2, and so on. You can see that the search algorithm proceeds by creating levels of nodes. A new level is
created from all the unvisited neighbors of the previous level.
d(node) node
0 Q
1 K P
2 E L O
3 D F R I
4 H J
5 B N
6 C M
7 G
8 A
After there are no more levels to consider, the distance array is
d = [8 ,5 ,6 ,3 ,2 ,3 ,7 ,4 ,3 ,4 ,1 ,2 ,6 ,5 ,2 ,1 ,0 ,3]
Note that, if the graph had actually been disconnected, then at the end of our search, some entries of d
would still be ∞, which is appropriate.
4
Since our museum nodes are labeled with letters, we could describe the museum graph using the following
dictionary:
museum = {
’A ’ : [ ’G ’ ],
’B ’ : [ ’C ’ , ’H ’ ],
’C ’ : [ ’B ’ ],
’D ’ : [ ’E ’ ],
’E ’ : [ ’D ’ , ’F ’ , ’K ’ ] ,
’F ’ : [ ’E ’ ],
’G ’ : [ ’A ’ , ’M’ ],
’H ’ : [ ’B ’ , ’I ’ , ’N ’ ] ,
’ I ’ : [ ’H ’ , ’J ’ , ’O ’ ] ,
’J ’ : [ ’ I ’ ],
’K ’ : [ ’E ’ , ’L ’ , ’Q ’ ] ,
’ L ’ : [ ’K ’ , ’R ’ ],
’M’ : [ ’G ’ , ’N ’ ],
’N ’ : [ ’H ’ , ’M’ ],
’O ’ : [ ’ I ’ , ’P ’ ],
’P ’ : [ ’O ’ , ’Q ’ ],
’Q ’ : [ ’K ’ , ’P ’ ],
’R ’ : [ ’ L ’ ]
}
One advantage of this format is that, if we have defined a dictionary G for our graph, then for any node s,
we can generate a for loop that examines every neighbor node of s using the command:
f o r n e i g h b o r in G[ s ] :
We will take advantage of the concise format of the dictionary representation, and the convenience of refer-
encing all the neighbor nodes, in our implementation of the breadth first search algorithm.
We actually need a second object, visited[], which we treat as a set. The set includes every node that we
have ever visited. We add a node using the command visited.add(node), and we check whether or not a
node has been visited with the formula if ( node not in visited ):.
5
The following code performs a breadth first search of a graph. It reports each node that it visits by printing
its label. In other situations, we might have more complicated things to do at each node that we visit.
def b f s ( G, node ) :
#
# C r e a t e empty v i s i t e d and queue o b j e c t s .
#
visited = []
queue = [ ]
#
# Add s t a r t i n g node t o b o t h l i s t s .
#
v i s i t e d . append ( node )
queue . append ( node )
#
# I s t h e r e a node we haven ’ t c h e c k e d ?
#
while next :
print ( ” ” )
return
6
print ( node , end = ’ ’ )
v i s i t e d . add ( node )
f o r n e i g h b o r in G[ node ] :
d f s ( G, n e i g h b o r , v i s i t e d )
return
We can invoke a search of the museum map by initializing the museum dictionary definition, and visited
and then calling as follows:
museum = [ . . . ]
v i s i t e d = set ( )
d f s ( museum , ’Q ’ , v i s i t e d )
Q K E D F L R P O I H B C NMG A J
Unlike the breadth first search, the depth first search has been implemented using recursion. Each recursive
step explores a new room. If that room has unexplored neighbors, we take another recursive step; if it does
not, then we back up one call; that is, we retreat to the previous room, and make the check again. Once all
nodes have been visited, the code terminates.
8 Solving a Maze
A maze is an arrangement of rooms, connected in a complicated way. A maze might have an entrance and
exit, or we might be required to go from one room to another.
A maze can be represented by a graph. The entrance, the exit, every dead-end, and every point where we
have to make a choice should each be a node. The paths between these points become edges. The “right
hand rule” suggests getting from one spot by walking with your right hand always touching the wall, turning
whenever you have to to keep your hand on the wall.
In the figure above, we have a simple maze and a suggestion of its corresponding graph. We can easily verify
that the “right hand rule” works for the yellow maze, by listing the sequence of nodes, starting at node 1,
that lead to node 2 following this rule;
7
We can create an adjacency matrix for the maze:
0 1 2 3 4 5 6 7
0 . . . 1 . . . .
1 . . 1 . . . . .
2 . 1 . 1 1 . . .
A= 3 1 . 1 . . 1 . .
4 . . 1 . . . 1 1
5 . . . 1 . . . .
6 . . . . 1 . . .
7 . . . . 1 . . .
Now we are going to use depth-first search on this maze, starting at node 0, and trying to reach a goal of
node 5. Think of this procedure as something like this:
• Mark all nodes as unvisited.
• Visit 0, our current node;
• If the current node has no unvisited neighbors, back up to previous node;
• Otherwise, replace current node by an unvisited neighbor, and ”remember” other nodes you didn’t
visit yet.
Here is how that procedure might be carried out for our simple maze:
node edges
---- --------------
0
3 0->3
2 3->2
1 2->1 Dead end, back up from 1 to 2
2 2<-1
4 2->4
6 4->6 Dead end, back up from 6 to 4
4 4<-6
7 4->7 Dead end, back up from 7 to 4
4 4<-7
2 2<-4 No unvisited nodes, back up from 4 to 2
3 3<-2 No unvisited nodes, back up from 2 to 3
5 3->5 Success, we found the goal.
The exercises ask you to implement a depth-first search algorithm that solves this maze.
8
9 Shortest Path
Earlier, we computed the distance between a starting node i and every other node j in a graph G. We
measured the distance as the number of edges we needed to travel across in our journey. But if our graph
represented, say, a highway map, then an edge is a trip from one city to a neighboring city, and it is important
to know the associated distance involved. It is natural to represent such a system using a “distance matrix”
D, in which Di,j is the distance between nodes i and j. If there is no connection between the nodes, we set
Di,j = ∞. In Python, this would be done by
D[ i , j ] = np . I n f
A graph in which each edge is assigned a value is called a weighted graph. These weights are usually required
to be positive. They might represent distances, but could instead represent other quantities, such as the
flow capacity of a pipe. We are interested in solving the node-to-node distance problem again, but now with
weights assigned to each edge. As an example, consider the following weighted graph, with 9 nodes and 15
edges.
9
(b) For every unvisited neighbor j of the current node, set s[j] = min ( s[j], s[current] +
D[current,j] ) ;
(c) Set visited[current]=True.
(d) If there are no more unvisited nodes, terminate.
Once the algorithm has completed, the s array contains the shortest distance from the starting node to each
node in the graph. If the graph is not connected, then some distances will be infinite.
0 1 2 3 4 5 6 7 8 base node
0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ initialize
0 4 ∞ ∞ ∞ ∞ 7 ∞ ∞ node 0, distance 0
0 4 13 ∞ ∞ ∞ 7 24 ∞ node 1, distance 4
0 4 13 ∞ ∞ ∞ 7 8 ∞ node 6, distance 7
0 4 13 ∞ 9 ∞ 7 8 11 node 7, distance 8
0 4 11 19 9 24 7 8 11 node 4, distance 9
0 4 11 17 9 24 7 8 11 node 2, distance 11
0 4 11 17 9 23 7 8 11 node 8, distance 11
0 4 11 17 9 22 7 8 11 node 3, distance 17
0 4 11 17 9 22 7 8 11 node 5, distance 22
4 9 6 5 12 5 1 1 7
0-->1-->2-->3-->5-->8-->4-->7-->6-->0
The exercises ask you to try to reproduce this calculation.
10
12 Traveling Salesperson Heuristics
Given that larger TSP problems are extremely difficult to solve, many suggestions have been made for coming
up with approximate solutions. In each case we start with a basic itinerary 0-->?-->0 and try to fill in the
blanks.
• Nearest neighbor: From your current location, add a trip to the nearest city that has not been visited
yet. Repeat until all cities have been reached, and go home.
• Greedy algorithm: Find the shortest unused city-to-city trip which does not create a “short circuit”
and which does not involve a city which already has a trip in and a trip out; Add this trip to your
itinerary. Keep doing this until your itinerary includes every city.
• Insertion: Choose an unvisited city k at random. For every pair of consecutive cities i and j already in
your itinerary, compute the length of the revised itinerary i− > k− > j. Choose to insert k between
the cities i and j which minimize this change in trip length.
• Transpose: Start with an itinerary chosen at random. Pick two distinct non-neighboring cities i and
j. Reverse the order of all the cities in the itinerary between i and j. If this reduces the total length,
keep the change. Keep trying.
13 Exercises
1. Consider the graph in the following diagram. Compute the appropriate path matrix that can check
whether the graph is connected or not. Explain what the path matrix says about this graph.
2. In the following puzzle, it is your task to interchange the black and white knights, using only knight
moves, remaining on the fragmentary board that is shown, using no captures. You may find it very
useful to construct the adjacency matrix in which two squares are adjacent if there is a legal knight
move from one to the other.
3. For the museum problem, write a Python program which computes the distance array d given any
starting room. If your starting room is Q you should match the results reported above.
11
4. The game of Risk involves a map of 42 countries. Squinting at the map as best you can, determine
the adjacency matrix. Then work out the distance to every country if you start from Mexico (country
#9). Since Risk numbers starting at 1, you will have to make the usual Python index adjustment.
What is the furthest country from Mexico?
5. Assume that you are given the blue maze in the figure below, and that you are to find a sequence of
nodes that takes you from node 1 to node 2. Verify that the “right hand rule” does not work for this
maze, by listing the sequence of nodes, starting at 1, until you come back to node 1. A depth first
search will correctly find a path from node 1 to node 2. Give the steps that such a search might use.
You can stop when you reach node 2, and you may allow your example to be ”lucky”. Your report will
indicate, at each step of the algorithm, where you are (what node), what nodes you have seen but not
explored, and what node you are about to explore next.
Step 0: I am at node 1, I see node 19, and I am about to move to node 19.
12
6. Repeat the depth-first search of the museum, but instead of using a set to store the list of visited nodes,
use a logical array visited. To do this, you will need to redefine the museum dictionary, replacing the
alphabetic node labels by numeric values.
7. Implement the depth-first search algorithm for the maze example; print out the sequence of nodes you
visit, and compare it to the results reported in the text.
8. Write a Python code that implements Dijkstra’s algorithm. Apply it to the 9-node example graph.
Verify that your results agree with the text.
9. Dijkstra’s algorithm doesn’t actually remember the shortest path it used to reach each node. Modify
the algorithm so that, every time that the distance dj is adjusted using an edge from node i to node
j, a corresponding value pj is set to i. Show that this is enough information to work your way back
from any node j to the starting node. In the 9-node example graph, compute the shortest path back
from node 5 to node 0.
10. Using brute force, enumerate every possible round trip starting at node 0 on the weighted graph. For
each round trip, evaluate the total distance. Report the length and itinerary of the shortest trip you
found, the number of possible itineraries, and the time required for your computation.
11. Try to solve the TSP for the following distance matrix, using any of the heuristics given above, or any
other method you can think up. The shortest trip I found had length 2090. The minimal tour length
is actually 2085.
0 633 257 91 412 150 80 134 259 505 353 324 70 211 268 246 121
633 0 390 661 227 488 572 530 555 289 282 638 567 466 420 745 518
257 390 0 228 169 112 196 154 372 262 110 437 191 74 53 472 142
91 661 228 0 383 120 77 105 175 476 324 240 27 182 239 237 84
412 227 169 383 0 267 351 309 338 196 61 421 346 243 199 528 297
150 488 112 120 267 0 63 34 264 360 208 329 83 105 123 364 35
80 572 196 77 351 63 0 29 232 444 292 297 47 150 207 332 29
134 530 154 105 309 34 29 0 249 402 250 314 68 108 165 349 36
259 555 372 175 338 264 232 249 0 495 352 95 189 326 383 202 236
505 289 262 476 196 360 444 402 495 0 154 578 439 336 240 685 390
353 282 110 324 61 208 292 250 352 154 0 435 287 184 140 542 238
324 638 437 240 421 329 297 314 95 578 435 0 254 391 448 157 301
70 567 191 27 346 83 47 68 189 439 287 254 0 145 202 289 55
211 466 74 182 243 105 150 108 326 336 184 391 145 0 57 426 96
268 420 53 239 199 123 207 165 383 240 140 448 202 57 0 483 153
246 745 472 237 528 364 332 349 202 685 542 157 289 426 483 0 336
121 518 142 84 297 35 29 36 236 390 238 301 55 96 153 336 0
13