Daa Combined

Download as pdf or txt
Download as pdf or txt
You are on page 1of 407

Objectives:

• Introduction of Design and Analysis of


Algorithms
• Course Blow-up
• Algorithm and Program
• Characteristics of Algorithm
• How to write and analyse an algorithm?
• Examples to write and analyse algorithm.
Text Books:
1. Cormen H. T., Leiserson E. C., Rivest L. R., and Stein C.,
Introduction to Algorithms, MIT Press (2009) 3rd ed.

2. Horwitz E., Sahni S., Rajasekaran S., Fundamentals of


Computers Algorithms, Universities Press (2008) 2nd ed.

Reference Books:
1. Levitin A., Introduction to the design and analysis of
algorithms, Pearson Education (2008) 2nd ed.

2. Aho A.V., Hopcraft J. E., Dulman J. D., The Design and


Analysis of Computer Algorithms, Addsion Wesley (1974)
1st ed.

3. Sedgewick R. and Wayne K., Algorithms, Addison-


Wesley Professional (2011), 4th ed
THANKS !
Objectives
• Recurrence Relation
• Deriving recurrence relation from an
algorithm
• Computing time complexity from recurrence
relation
• Substitution method
• Master Theorem
THANKS !
Objectives

• Recursive Tree Method


• Examples of solution of Recurrence
relation using Recursive Tree Method
• Master theorem for decreasing function
• Master theorem for dividing function
• Assignment (Page No. 88-98, CORMEN,
Introduction to Algorithms)
THANKS !
Objectives
• Divide and Conquer Strategy
• Design Merge Sort Algorithm
• Analyse time and space complexity of
Merge Sort
• Design Quick Sort Algorithm
• Analyse time and space complexity of
Quick Sort
• Binary Search using Iterative and Recursion
THANKS !
Greedy Method
By
Prof. Shaik Naseera
Department of CSE
JNTUA College of Engg., Kalikiri

1
Objectives
• Greedy Method
• Applications
– Optimal Storage on Tapes
– Job Sequencing with Deadlines
– Knapsack Problem
– Single Source Shortest Path
• Previous Gate Questions

2
Greedy Method
• The Greedy method is a most straight forward design
technique which can be applied to a wide variety of
problems.
• This algorithm works in steps. In each step it selects the
best available options until all options are finished.
• Most of the problems have n inputs and require us to
obtain a subset that satisfies some constraints.
• Any subset that satisfies these constraints is called as a
feasible solution.
• A feasible solution that either minimizes or maximizes
a given objective function is called as Optimal
Solution. 3
Greedy Algorithm
• The Greedy method suggest that one can devise an
algorithm that work in stages, considering one input at a
time.
• At each stage, a decision is made regarding whether a
particular input is an optimal solution or not.
• This is done by considering the inputs in an order
determined by some selection procedure.
• If the inclusion of the next input into the partially
constructed optimal solution results sub-
optimal/infeasible solution, then that input is not added to
the partial solution. Otherwise, it is added. The selection
procedure itself is based on some optimization measures.
4
Control Abstraction for Greedy Method

Select selects an input from a[] and removes it. The selected input’s value is
assigned to x.
Feasible is a Boolean-valued function that determines whether x can be included
into the solution vector or not.
Union combines x with the solution and updates the objective function.
5
Types of Greedy Problems
• Subset Paradigm
• To solve a problem (or possibly find the
optimal/best solution), greedy approach generate
subset by selecting one or more available choices.
Eg. includes Knapsack problem, job sequencing
with deadlines. In both of the problems greedy
create a subset of items or jobs which satisfies all
the constraints.
• Ordering Paradigm
• In this, greedy approach generate some
arrangement/order to get the best solution. Eg.
includes: Minimum Spanning tree
6
Applications
• Fractional knapsack algorithm
• Optimal Storage on tapes
• Job sequencing with deadline
• Single source shortest path
– Dijkstra's SSSP algorithm
• Activity Selection Problem
• Minimum Cost Spanning Tree
• …

7
Optimal Storage on Tapes

• Optimal Storage on Tapes is one of the application


of the Greedy Method.

• The objective is to find the Optimal retrieval time


for accessing programs that are stored on tape.

8
Description
• There are n programs that are to be stored on a computer
tape of length L.
• Associated with each program i is a length l;
• Clearly, all programs can be stored on the tape if and only if
the sum of the lengths of the programs is at most L.
• We shall assume that whenever a program is to be
retrieved from this tape, the tape is initially positioned at
the front.
• Hence' if the programs are stored in the order I=i1, i2’ i3 …. in
the time tj needed to retrieve program ij is proportional to
lik .
• If all programs are retrieved equally often then the
expected or mean retrieval time (MRT) is

9
Example

10
Method
• The greedy method simply requires us to store
the programs in non-decreasing order of their
lengths.
• This ordering (sorting) can be carried out in
O(n log n) time using an efficient sorting
algorithm

11
Algorithm for multiple tapes

Note: The programs are assumed to be in increasing order of their lengths 12


Knapsack Problem
• Let, we are given n objects and a Knapsack or
Bag.
• Object i has weight Wi and the Knapsack has a
capacity M.
• if a fraction Xi of object i is placed into
Knapsack, then a profit of PiXi is earned.
• The objective is to obtain a filling of Knapsack
that maximizes the total profit earned.

13
• Maximize (A)

• Subject to (B)

• And 0 ≤ Xi ≤ 1, 1 ≤i ≤n (C)
• The profit and weights are the positive numbers.
• Here, A feasible solution is any set (X1, X2, …,
Xn) satisfying above rules (B) and (C).
• And an optimal solution is feasible solution for
which rule (A) is maximized.

14
Here, N=3, M=20, (P1, P2, P3)=(25, 24, 15) and (W1, W2, W3)=(18, 15, 10)
Different feasible solutions are:
(X1, X2, X3)

1. (1/2, 1/3, ¼) 16.5 24.25


2. (1, 2/15, 0) 20 28.2
3. (0, 2/3, 1) 20 31
4. (0, 1, 1/2) 20 31.5
5. (1/2, 2/3, 1/ 10) 20 30
6. (1, 0, 2/10) 20 28
• Of these Six feasible solutions, solution 4 yields the maximum profit.
Therefore solution 4 is optimal for the given problem instance.
• Consideration 1 - In case the sum of all the weights is ≤ M, then Xi=1, 1 ≤ i ≤n
is an optimal solution.
• Consideration 2 - All optimal solutions will fill the knapsack exactly.

15
The knapsack algorithm
• The greedy algorithm:
Step 1: Sort pi/wi into nonincreasing order.
Step 2: Put the objects into the knapsack according
to the sorted sequence as possible as we can.
• e. g. p[i]/w[i]≥p[i+1]≥w[i+1]
n = 3, M = 20, (p1, p2, p3) = (25, 24, 15)
(w1, w2, w3) = (18, 15, 10)
Sol: p1/w1 = 25/18 = 1.39
p2/w2 = 24/15 = 1.6
p3/w3 = 15/10 = 1.5
Optimal solution: x1 = 0, x2 = 1, x3 = 1/2
total profit = 24 + 7.5 = 31.5
16
Algorithm
• Algorithm GreedyKnapsack(m,n)
//order the n objects such that p[i]/w[i]≥p[i+1]≥w[i+1]
{
for i:=1 to n do x[i]:=0.0;
U:=m;
for i:=1 to n do
{
if(w[i] > U) then break;
x[i]:=1.0; U:=U-w[i];
}
if( i ≤ n) then x[i]:=U/w[i];
}

17
Example problem
• N=7
• M=15(15-1=14-2=12-4=8-5=3-1=2)
• (p1,p2…p7)=(10,5,15,7,6,18,3)
• (w1,w2..w7)=(2,3,5,7,1,4,1)
• The solution vector is (1,2/3,1,0,1,1,1)
• P1/w1=5 p2/w2=1.66 p3/w3=3 p4/w4=1
p5/w5=6 p6/w6=4.5 p7/w7=3
• x5,x1,x6,x3,x7,x2,x4
• Weight is 1+2+4+5+1+2/3*3+0=13+2/3*3=15
• Profit is 6+10+18+15+3+2/3*5+0=55.34
(1,2/3,1,0,1,1,1) 18
Job Sequencing with Deadlines
• We are given a set of n jobs.
• Di is a deadline given to complete ithjob for profit Pi where Pi>0 &
Di ≥ 0.
• For any job i profit Pi is earned iff the job is completed within its
deadline.
• To complete a job, one has to process the job on a machine for one
unit of time.
• Only one machine is available for processing jobs.
• A feasible solution for this problem is a subset J of jobs such that
each job in this subset can be completed by its deadline.
• The value of a feasible solution J is the sum of the profits of the job
in J.
• An optimal solution is a feasible solution with maximum value.
19
• Example –Suppose on a single machine four jobs with
profit values (100, 10, 15 and 27) and their respective
deadline unit values (2, 1, 2, 1) are given. Calculate the
different feasible solutions to complete the jobs with
optimal solution.
• Solution:
• Number of Jobs (n)= 4
• Profit values of four jobs (P1, P2, P3, P4)=(100, 10, 15, 27)
• Process deadlines for respective jobs are
(D1, D2, D3, D4)=(2, 1, 2, 1)

20
(P1, P2, P3, P4)=(100, 10, 15, 27) (D1, D2, D3, D4) =(2, 1, 2, 1)
The feasible solutions and their values are –
Feasible sol. subset Processing Sequence Values
(1, 2) (2, 1) 110
(1, 3) (1, 3 or 3, 1) 115
(1, 4) (4, 1) 127
(3, 2) (2, 3) 25
(3, 4) (4, 3) 42
(1) (1) 100
(2) (2) 10
(3) (3) 15
(4) (4) 27
Here, Solution 3 is optimal solution. In this solution only job 1 and 4
are processed and the profit value is 127. These jobs must be
processed in the order Job 4 followed by job 1. Thus the processing
of job 4 begins at time zero and that of job 1 is completed at time 2.
21
Algorithm

• The algorithm constructs an optimal set J of jobs that


can be processed by their deadlines. 22
Single-Source Shortest Path
(Dijkstra's algorithm)

23
Single-Source Shortest Path Problem

Single-Source Shortest Path Problem - The


problem of finding shortest paths from a source
vertex v to all other vertices in the graph.

24
Dijkstra's algorithm
Dijkstra's algorithm - is a solution to the single-source
shortest path problem in graph theory.

Works on both directed and undirected graphs.


However, all edges must have nonnegative weights.

Approach: Greedy

Input: Weighted graph G={E,V} and source vertex v∈V,


such that all edge weights are nonnegative

Output: Lengths of shortest paths (or the shortest


paths themselves) from a given source vertex v∈V to
all other vertices
25
Dijkstra's algorithm - Pseudocode

dist[s] ←0 (distance to source vertex is zero)


for all v ∈ V–{s}
do dist[v] ←∞ (set all other distances to infinity)
S←∅ (S, the set of visited vertices is initially empty)
Q←V (Q, the queue initially contains all vertices)
while Q ≠∅ (while the queue is not empty)
do u ← mindistance(Q,dist) (select the element of Q with the min.
distance)
S←S∪{u}
Q=Q-{u} (add u to list of visited vertices)
for all v ∈ neighbors[u]
do if dist[v] > dist[u] + w(u, v) (if new shortest path found)
then d[v] ←d[u] + w(u, v) (set new value of shortest path)
(if desired, add traceback code)
return dist

26
Dijkstra Animated Example

27
Dijkstra Animated Example

28
Dijkstra Animated Example

29
Dijkstra Animated Example

30
Dijkstra Animated Example

31
Dijkstra Animated Example

32
Dijkstra Animated Example

33
Dijkstra Animated Example

34
Dijkstra Animated Example

35
Dijkstra Animated Example

36
Implementations and Running Times
The simplest implementation is to store vertices in an array
or linked list. This will produce a running time of
O(|V|^2 + |E|)
Where |E| is for search needed to find the minimum
distance node. //A node may be connected to a maximum
of E nodes
For sparse graphs, or graphs with very few edges and
many nodes, it can be implemented more efficiently storing
the graph in an adjacency list using a binary heap or
priority queue. This will produce a running time of
O((|E|+|V|) log |V|) //inner loop is replaced by a heap

37
DIJKSTRA'S ALGORITHM - WHY USE IT?

• As mentioned, Dijkstra’s algorithm calculates


the shortest path to every vertex.
• Therefore, any time we want to know the
optimal path to some other vertex from a
determined origin, we can use Dijkstra’s
algorithm.

38
Example

• Vertex 1 is the source

1 2 3 4 5 6 7
0 inf inf inf inf inf Inf
25 inf inf inf inf inf
35 39 inf inf inf
39 inf 51 Inf
inf 51 inf
93 65
93
39
Previous Gate Questions

40
Q. No. 1

Queue takes linear time

41
Q. No. 2

Answer is 16

16

60/10 =6
28/7 =4

20/4 =5

24/2 =12

X4, x1, x3, x2 24+20=44

Vopt= using dynamic programming is 60


42
Q. No. 3

43
44
Q. No. 4

Answer: None of the Above


45
Q. No. 5

46
Q. No. 6

47
• 7 Given below are some algorithms, and some algorithm design paradigms.
Q. No.
List-I
A. Dijkstra’s Shortest Path 3
B. Floyd-Warshall algorithm to compute all pairs shortest path 2
C. Binary search on a sorted array 1
D. Backtracking search on a graph 4
List-II
1. Divide and Conquer
2. Dynamic Programming
3. Greedy design
4. Depth-first search
5. Breadth-first search
Match the above algorithms on the left to the corresponding design paradigm they follow Codes:
ABCD
(a) 1 3 1 5
(b) 3 3 1 5
(c) 3 2 1 4
(d) 3 2 1 5
Options:
A) a
B) b
C) c
D) D Answer C 48
Q. No. 9

49
Q. No. 10

50
Graphs
Introduction
• Generalization of a tree.
• Collection of vertices (or nodes) and connections
between them.
• No restriction on
– The number of vertices.
– The number of connections between the two
vertices.
• Have several real life applications.
Definition
• A graph G = (V,E) consists of a
– Non-empty set V of vertices and
– Possibly empty set E of edges.
• |V| denotes number of vertices.
• |E| denotes number of edges.
• An edge (or arc) is a pair of vertices (vi,vj) from V.
– Simple or undirected graph (vi,vj) = (vj,vi).
– Digraph or directed graph (vi,vj) ≠ (vj,vi).
• An edge has an associated weight or cost as well.
Undirected
Contd… Graph
v1 v4
v1 v4 Directed v2
Graph
v2 v5
v3 v6
v5
v3 v6
v1 v2
v1 v4
v2 v3
v4
v5 Complete
v3
Multigraph Graph
Representation – I
• Adjacency matrix
– Adjacency matrix for a graph G = (V, E) is a two
dimensional matrix of size |V| x |V| such that
each entry of this matrix
a[i][j] = 1 (or weight), if an edge (vi,vj) exists.
0, otherwise.

– For an undirected graph, it is always a symmetric


matrix, as (vi, vj) = (vj, vi).
Adjacency matrix
Contd… (weighted)
Representation – II
• Adjacency list

– Uses an array of linked lists with size equals to |V|.

– An ith entry of an array points to a linked list of


vertices adjacent to vi.

– The weights of edges are stored in nodes of linked


lists to represent a weighted graph.
Adjacency List
• Undirected graph
Contd…
• Directed graph
Contd…
• Weighted directed graph
Pros and Cons
Adjacency Matrix Adjacency List
Memory Requirement O(|V|2) O(|V|+|E|)
Add Edge O(1) O(1)
Remove Edge O(1) O(|E|)
Find Edge Existence O(1) O(|V|)
Find # of Edges O(|V|2) O(|E|)
Add Vertex O(|V|2) O(|V|)
Remove Vertex O(|V|2) O(|E|)
Visit Adjacent Vertices O(|V|) O(Degree of that node)
Some Definitions
• Walk
– An alternating sequence of vertices and connecting
edges.
– Can end on the same vertex on which it began or on a
different vertex.
– Can travel over any edge and any vertex any number of
times.
• Path
– A walk that does not include any
vertex twice, except that its first
and last vertices might be the same.
Contd…
• Trail
– A walk that does not pass over the same edge twice.
– Might visit the same vertex twice, but only if it comes
and goes from a different edge each time.
• Cycle
– Path that begins and ends
on the same vertex.
• Circuit
–Trail that begins and ends
on the same vertex.
Euler Graphs
• An Eulerian trail (or Eulerian path) is a trail in a
graph which visits every edge exactly once.

• An Eulerian circuit (or Eulerian cycle) is an Eulerian


trail which starts and ends on the same vertex.

• A graph containing an Eulerian circuit is called Euler


graph.

• A connected graph G is an Euler graph if and only if


all vertices of G are of even degree.
Example

Non-Eulerian
graph as only
Eulerian graph as Euler path exists.
Euler circuit exists.
Fleury's algorithm
1. Start at a vertex of odd degree (Euler path), or, if
the graph has none, start with an arbitrarily
chosen vertex (Euler circuit).
2. Choose the next edge in the path to be one whose
deletion would not disconnect the graph. (Always
choose the non-bridge in between a bridge and a
non-bridge.)
3. Add that edge to the circuit, and delete it from the
graph.
4. Continue until the circuit is complete.
Example – 1
Vertex Degree
v1 2
v2 2
v3 4
v4 2
v5 2
v6 2

• As degree of each vertex is even, thus Euler circuit


exists.
Contd…
Graph Current Vertex Trail
v1 e1 v2 v4 v1 NULL
e3
e7 e2 e4

v6 e6 v3 e5 v5
v1 v2 v4 v2 v1v2
e3 or
e7 e2 e4
e1
v6 e6 v3 e5 v5
v1 v4 v3 v1v2v3
e3 or
e7 e4
e1e2
v6 e6 v3 e5 v5
Contd…
Graph Current Vertex Trail
v1 v4 v4 v1v2v3v4
v3v6 or e6 is a or
e7 e4
bridge and can’t e1e2e3
v6 e6 v3 e5 v5 be selected.
v1 v5 v1v2v3v4v5
or
e7
e1e2e3e4
v6 e6 v3 e5 v5
v1 v3 v1v2v3v4v5v3
or
e7
e1e2e3e4e5
v6 e6 v3
Contd…
Graph Current Vertex Trail
v1 v6 v1v2v3v4v5v3v6
or
e7
e1e2e3e4e5e6
v6

NULL v1 v1v2v3v4v5v3v6v1
or
e1e2e3e4e5e6e7

• Required Trail:
v1v2v3v4v5v3v6v1
or e1e2e3e4e5e6e7
Example – 2 Vertex Degree
v1 2
v2 2
v1 v2
v3 4
v4 3
v4 v5
v3 v5 2
v6 2
v6 v7 v8 v7 3
v8 2

• As degree of all the vertices is even except two (v4


or v7), thus Euler path exists.
• Start either from v4 or v7.
Contd…
Graph Current Vertex Trail
v4 NULL

v2 v4v2
v4v6 is a bridge
and can’t be
selected.

v1 v4v2v1
Contd…
Graph Current Vertex Trail
v3 v4v2v1v3

v4 v4v2v1v3v4

v6 v4v2v1v3v4v6

v7 v4v2v1v3v4v6v7
Contd…
Graph Current Vertex Trail
v5 v4v2v1v3v4v6v7v5

v8 v4v2v1v3v4v6v7v5v8

NULL v7 v4v2v1v3v4v6v7v5v8v7

v1 v2

• Required Trail:
v4 v5
v3
v4v2v1v3v4v6v7v5v8v7
v6 v7 v8
Example – 3
• Find Euler circuit using Fleury's algorithm. Start at
vertex A.

https://www.youtube.com/watch?v=vvP4Fg4r-Ns
Contd…
• In Fleury's algorithm, the graph traversal is linear in
the number of edges, i.e. O(|E|), excluding the
complexity of detecting bridges.

• With Tarjan's linear time bridge-finding algorithm,


the time complexity is O(|E|2).

• With Thorup's dynamic bridge-finding algorithm,


the time complexity gets improved to
O(|E|(log|E|)3log log |E|).
Hierholzer's algorithm
1. Choose any starting vertex v, and follow a trail of
edges from that vertex until returning to v. The
tour formed in this way is a closed tour, but may
not cover all the vertices and edges of the initial
graph.
2. As long as there exists a vertex u that belongs to
the current tour but that has adjacent edges not
part of the tour, start another trail from u,
following unused edges until returning to u, and
join the tour formed in this way to the previous
tour.
Example – 1
Graph Trail
v1 v2 v4 v1v2v3v6v1

v6 v3 v5
v4 v1v2v3v4v5v3v6v1

v3 v5

Null Required Trail:

v1v2v3v4v5v3v6v1
Example – 2
• Find Euler circuit using Hierholzer’s algorithm. Start
at vertex A.

• Solution:
– ABCDEA
– AGFABCDEA
– AGFABEBCDEA
Contd…
• If doubly linked list is used to maintain
– The set of unused edges incident to each vertex,
– The list of vertices on the current tour that have unused
edges, and
– The tour itself.
• The individual operations of finding
– Unused edges exiting each vertex,
– A new starting vertex for a tour, and
– Connecting two tours that share a vertex.
• May be performed in constant time, so the
algorithm takes linear time, O(|E|).
Hamiltonian Graphs
• A path passing through all the vertices of a graph is
called a Hamiltonian path.
– A graph containing a Hamiltonian path is said to be
traceable.
• A cycle passing through all the vertices of a graph is
called a Hamiltonian cycle (or Hamiltonian circuit).
• A graph containing a Hamiltonian cycle is called a
Hamiltonian graph.
• Hamiltonian path problem: Determine the
existence of Hamiltonian paths and cycles in graphs
(NP-complete).
Example
• Not Hamiltonian graphs.
Example


• Hamiltonian graph
– 128765431

Not a Hamiltonian graph.


Solution 1 – Brute Force Search Algorithm
• A Hamiltonian Path in a graph having N vertices is
nothing but a permutation of the vertices of the
graph
[v1, v2, v3, ......vN-1, vN]
such that there is an edge between vi and vi+1 where
1 ≤ i ≤ N-1.
• So it can be checked for all permutations of the
vertices whether any of them represents a
Hamiltonian Path or not.
Contd…
function check_all_permutations(adj[][], n)
for i = 0 to n
p[i]=i
while next permutation is possible
valid = true
for i = 0 to n-1
if adj[p[i]][p[i+1]] == false
valid = false
break
if valid == true
print_permutation(p)
return true
p = get_next_permutation(p)
return false
Example
Not a Hamiltonian graph as
path exists and not a circuit.

1. 0-1-2-3 7. 1-0-2-3 13. 2-0-1-3 19. 3-0-1-2


2. 0-1-3-2 8. 1-0-3-2 14. 2-0-3-1 20. 3-0-2-1
3. 0-2-1-3 9. 1-2-0-3 15. 2-1-0-3 21. 3-1-0-2
4. 0-2-3-1 10. 1-2-3-0 16. 2-1-3-0 22. 3-1-2-0
5. 0-3-1-2 11. 1-3-0-2 17. 2-3-1-0 23. 3-2-0-1
6. 0-3-2-1 12. 1-3-2-0 18. 2-3-0-1 24. 3-2-1-0
Solution 2 – Backtracking
1. Create an empty path array and add vertex 0 to it.
2. Add other vertices, starting from the vertex 1.
3. Before adding a vertex, check for whether it is
adjacent to the previously added vertex and not
already added.
4. If such a vertex is found, add that vertex as part of
the solution.
5. Otherwise, return false.
Contd…
1. bool hamCycle(bool graph[V][V])
2. { int *path = new int[V];
3. for (int i = 0; i < V; i++)
4. path[i] = -1;
5.
6. path[0] = 0;
7. if ( hamCycleUtil(graph, path, 1) == false )
8. { printf("\nSolution does not exist");
9. return false;
10. }
11. printSolution(path);
12. return true; }
Contd…
1. bool hamCycleUtil(bool graph[V][V], int path[], int pos)
2. { if (pos == V)
3. { if ( graph[ path[pos-1] ][ path[0] ] == 1 )
4. return true;
5. else
6. return false;
7. }
8. for (int v = 1; v < V; v++)
9. { if (isSafe(v, graph, path, pos))
10. { path[pos] = v;
11. if (hamCycleUtil (graph, path, pos+1) == true)
12. return true;
13. path[pos] = -1;
14. }
15. }
16. return false; }
Contd…
1. bool isSafe(int v, bool graph[V][V], int path[], int pos)
2. {
3. if (graph [ path[pos-1] ][ v ] == 0)
4. return false;
5.
6. for (int i = 0; i < pos; i++)
7. if (path[i] == v)
8. return false;
9.
10. return true;
11. }
Example

graph[1][4] = 1.
Thus Hamiltonian graph.

1 2 3 4
4 4 1 0 1 0 1
3 3 2 1 0 1 1
2 2 3 0 1 0 1
path[] 1 1 graph[][] 4 1 1 1 0
Applications
• Painting road lines,
• Plowing roads after a snowstorm,
• Checking meters along roads,
• Garbage pickup routes, etc.
Planar Graphs
• If a graph G can be drawn on a plane (or a sphere) so that
the edges only intersect at vertices, then it is planar.
• Such a drawing of a planar graph is a planar embedding of
the graph.

• Planar Nonplanar
Contd…

• A non-planar graph converted in to a planar graph.


Euler's formula
• For a finite, connected, planar graph, if
– v is the number of vertices,
– e is the number of edges, and
– f is the number of faces (regions bounded by
edges, including the outer, infinitely large region).
– Then,
v–e+f=2
Example

8 – 11 + 5 = 2
Graph Coloring
• Special case of labeling graph elements subject to
certain constraints.
– Traditionally, "colors" are used as labels.
• Several forms
– Vertex coloring. Color vertices of a graph such that
no two adjacent vertices share the same color.
– Edge coloring. Color edges such that no two adjacent
edges share the same color.
– Face coloring of a planar graph. Assign color to each
face or region so that no two faces that share a
boundary have the same color.
Contd…

• The simplest form is vertex coloring.


• Formally,
If C be the set of colors, then find a function
f : V→ C
– such that if there is an edge (vw), then f(v) ≠ f(w),
and
– C is of minimum cardinality.
Definitions

• k-coloring. Coloring using at most k colors.


• Chromatic number of a graph (χ(G)). The smallest
number of colors needed to color a graph G.
• k-colorable. A graph that can be assigned a k-
coloring is k-colorable.
• Note: k-coloring partitions the vertex set into k
independent sets.
Sequential Coloring – Greedy Approach
• sequentialColoringAlgorithm(graph = (V,E))
1. Put vertices in a certain order {v1,v2,…,vn}.
2. Put colors in a certain order {c1,c2,…,ck}.
3. for i = 1 to n
4. j = the smallest index of color that does
not appear in any neighbor of vi.
5. color(vi) = j.
If every node in G has degree ≤ d, then the
algorithm uses at most d + 1 colors for G.
Example v1 v2 v3 v4

v5 v6 v7 v8
Order of vertices
v1 v2 v3 v4 v5 v6 v7 v8
c1 c1 c2 c1 c2 c2 c3 c4
Number of colors required ≤ Maximum degree + 1
≤ max{deg(v1), deg(v2), deg(v3), deg(v4), deg(v5),
deg(v6), deg(v7), deg(v8)} + 1
≤ max{3, 3, 2, 2, 2, 4, 5, 3} + 1
≤5+1
≤ 6.
Welsh–Powell algorithm
• If vertices are ordered according to their degrees
(decreasing), then

The number of colors needed to color the graph is


at most
max min( i, deg(vi )  1)
i
Number of colors for graph at slide 42.
• Number of colors required ≤ max min( i, deg(vi )  1)
i
• If vertices are arranged as {v1,v2,v3,v4,v5,v6,v7,v8}
≤ max{min(1,4), min(2,4), min(3,3), min(4,3),
min(5,3), min(6,5), min(7,6), min(8,4)}
≤ max{1,2,3,3,3,5,6,4}
≤ 6.
• Arrangement as per degrees {v7,v6,v1,v2,v8,v3,v4,v5}
≤ max{min(1,6), min(2,5), min(3,4), min(4,4),
min(5,4), min(6,3), min(7,3), min(8,3)}
≤ max{1,2,3,4,4,3,3,3}
≤ 4.
Example

v1 v2 v3 v4

v5 v6 v7 v8

v7 v6 v1 v2 v8 v3 v4 v5
c1 c2 c3 c1 c3 c2 c3 c2
m-Coloring Problem
• Given an undirected graph and a number m, determine if
the graph can be colored with at most m colors such that
no two adjacent vertices have the same color.
• Input:
– An adjacency matrix, graph[V][V] where V is the
number of vertices in the graph.
– An integer m which is maximum number of colors that
can be used.
• Output:
– An array color[V] containing colors assigned to all the
vertices in the range 1 to m. False will be returned, if
the graph cannot be colored with m colors.
Backtracking
1. If all colors are assigned,
2. Print vertex assigned colors.
3. Else
4. Trying all possible colors, assign a color to the
vertex.
5. If color assignment is possible, recursively assign
colors to next vertices.
6. If color assignment is not possible, de-assign
color, return False.
Example – Check for 3-colorability
v2 v3 v5 v6

7 c2 c1 c2
6 c3 c1 c2 c3 v7 v1 v0 v4
5 c2 c1 c2
0 1 2 3 4 5 6 7
4 c3 c1 c2 c3 0 0 1 1 0 1 1 1 0
3 c1 1 1 0 1 0 1 0 1 0
2 1 1 0 0 0 0 0 1
2 c3 c1 c2 c3 3 0 0 0 0 1 1 0 1
1 c2 c1 c2 4 1 1 0 1 0 0 0 0
5 1 0 0 1 0 0 0 0
0 c1 6 1 1 0 0 0 0 0 0
7 0 0 1 1 0 0 0 0
Applications
• Scheduling.
• Frequency Assignment.
• Register Allocation.
• Bipartite Graphs.
• Map Coloring, etc. v2 v3 v5 v6

v7 v1 v0 v4
Definitions
• Diagraph or directed graph.
• A cycle in a diagraph G is a set of edges, {(v1, v2), (v2, v3),
..., (vr −1, vr)} where v1 = vr .
• A diagraph is acyclic if it has no cycles. Also known as
directed acyclic graph (DAG).
• DAGs are used in many applications to indicate
precedence among events. For example,
– Inheritance between classes.
– Prerequisites between courses of a degree program.
– Scheduling constraints between the tasks of a projects.
Example – Prerequisites between courses
142 → 143 → 378 → 370 → 321 → 341 → 322 →
326 → 421 → 401
Example – Scheduling
Topological Sort
• Let, G = (V,E) be a DAG
• Linear ordering of all its
vertices such that if G contains
an edge (u,v), then u appears
before v in the ordering.
• Can also be viewed as an
ordering of vertices
along a horizontal line so
that all directed edges
go from left to right.
Kahn's algorithm
• L ← Empty list that will contain the sorted elements
• S ← Set of all nodes with no incoming edge
1. while S is non-empty do
2. remove a node n from S
3. add n to tail of L
4. for each node m with an edge e from n to m do
5. remove edge e from the graph
6. if m has no other incoming edges then
7. insert m into S
8. if graph has edges then
9. return error (graph has at least one cycle)
10. else
11. return L (a topologically sorted order)
Example – 1
B C

A F

D E

A B D Vertex In-degree
A 0
B C
B 1
C D E C 1
D E D 2
E 2
E
F 0
F
A B D
Contd… B C

L: Ø C D E
S: A → F
D E

F
L: A
S: F → B

L: A → F
S: B
A B D
Contd… B C

L: A → F → B C D E
S: C
D E

F
L: A → F → B → C
S: D

L: A → F → B → C → D
Required
S: E topological
sort is
AFBCDE
L: A → F → B → C → D → E NULL
S: Ø
Topological Sort using DFS
• Perform DFS. Sort vertices in decreasing order of
their finishing time.
1 2 3 4

5 6 7 8

9 10 11 12
Contd…
Starting Finishing
Vertex
Time Time
A 1 10
B 6 9
C 7 8
D 2 5
E 3 4
F 11 12

• Required topological sort is F A B C D E.


Topological Sort using DFS – Implementation
1. T = []
2. visited = []
3. topological_sort( cur_vert, N, adj[][] )
4. { visited[cur_vert] = true
5. for i = 0 to N
6. if adj[cur_vert][i] is true and visited[i] is false
7. topological_sort(i, N, adj[][])
8. T.insert_in_beginning(cur_vert)
9. // T.push(cur_vert) }
Example
• T[]: 5 4 4→5→2→3
4 5
→1→0
3 2
2 3
1 1
0 0

0 1 2 3 4 5
5 0 1 1 1 1 1 1 0 0 0 0 0 0 0
4 0 0 0 0 0 0 1 1 0 0 0 0 0 0
3 0 0 0 0 1 1 1 2 0 0 0 1 0 0
2 0 0 0 1 1 1 1 3 0 1 0 0 0 0
1 0 0 0 0 0 1 1 4 1 1 0 0 0 0
• Visited[] 0 0 0 1 1 1 1 1 5 1 0 1 0 0 0
Example

• Topological sort:
0→1→2→3→4→5
Isomorphism
• Two graphs are isomorphic when the vertices of one
can be re-labeled to match the vertices of the other
in a way that preserves adjacency.
• Formally,
Graphs G1 and G2 are isomorphic if there exists
a one-to-one function, called an isomorphism,
f: V(G1) → V(G2)
such that uv is an element of E(G1) if and only
if f(u)f(v) is an element of E(G2).
Example

G1 G2 G3
• G1 and G2
f(a) = J, f(b) = K, f(c) = M, and f(d) = L.
• G1 and G3
f(a) = a, f(b) = d, f(c) = c, and f(d) = b.
Example
• f(a) = 1
• f(b) = 6
• f(c) = 8
• f(d) = 3
• f(g) = 5
• f(h) = 2
• f(i) = 4
• f(j) = 7
Graph Isomorphism Problem
• Determining whether two finite graphs are isomorphic or
not?
• It is neither an NP-complete problem nor a P-problem,
although this has not been proved (Skiena 1990).
• There is a famous complexity class called graph isomorphism
complete which is thought to be entirely disjoint from
both NP-complete and from P.
• Applications:
– Cheminformatics,
– Mathematical chemistry (identification of chemical compounds),
and
– Electronic design automation (verification of equivalence of
various representations of the design of an electronic circuit).
Connected and Disconnected Graph
• A graph is connected if all the vertices are
connected to each other, i.e.
– A path exists between every pair of vertices.
– No unreachable vertices.

• A graph is disconnected if it is not connected.


• A graph with just one vertex is connected.
• An edgeless graph with two or more vertices is
disconnected.
Example
Cut Vertex
• A vertex v of a graph G is a cut vertex or an
articulation vertex of G if the graph G−v consists of
a greater number of components than G.
Contd...
• A graph is separable if it is not connected or if there
exists at least one cut vertex in the graph.
Cut-set
• A cut set of the connected graph G = (V, E) is an
edge set F ⊆ E such that
1. G − F (remove the edges of F one by one) is not
connected, and
2. G − H is connected whenever H ⊂ F.
Example

• {e1, e4}, {e6, e7}, {e1, e2, e3}, {e8}, {e3, e4, e5, e6}, {e2,
e5, e7}, {e2, e5, e6} and {e2, e3, e4}.
Cut
• A cut is a partition of the vertices of a graph into
two disjoint subsets.
• Formally,
In a graph G = (V,E), a pair of subsets V1 and V2 of V
satisfying
V = V1 ∪ V2, V1 ∩ V2 = ∅, V1 ≠ ∅, V2 ≠ ∅
is called a cut (or a partition) of G. Denoted as (V1, V2).
• It is also defined as an edge set.
cut (V1, V2) = {those edges with one end vertex in V1
and the other end vertex in V2}.
Contd…
• In an unweighted undirected Min-cut
graph, the size or weight of
a cut is the number of edges
crossing the cut.

• In a weighted graph, Max-cut


the value or weight is
defined by the sum of the
weights of the edges
crossing the cut.
MAXIMUM FLOW

Max-Flow Min-Cut Theorem (Ford Fukerson’s Algorithm)


What is Flow Network?

 Flow network is a directed graph G = (V,E) such that


each edge has a non-negative capacity c(u,v) ≥ 0.
 Two distinguished vertices exist in G namely:
 Source (produces flow and denoted by s): In-degree
of this vertex is 0.
 Sink (consumes flow and denoted by t): Out-degree

of this vertex is 0.
 Flow in a network is an integer-valued function f
defined on the edges of G satisfying 0 ≤ f (u,v) ≤
c(u,v), for every edge (u,v) in E.
Contd…

 Each edge (u,v)  E has a non-negative capacity


c(u,v).
 If (u,v)  E assume c(u,v)=0.
 Two special vertices, source s and sink t.
 Every vertex v  V is on some path from s to t.
Capacity
c(s,v1) = 16
c(v1,s) = 0
c(v2,s) = 0
c(v4,t) = 4 …
Conditions for Flow Network
 For each edge (u,v) in E, the flow f (u,v) in G is a real valued
function f : V × V → R that satisfies the following properties:

 Capacity constraints – The flow along an edge can not exceed


its capacity.
 Skew symmetry – The net flow from u to v must be the
opposite of the net flow from v to u.

 Flow conservation – Unless u is s or t, the net flow to a node is


zero.
The Value of a Flow

 The value of a flow is given by:

| f |  f ( s, v)  f (v, t )
vV vV

 The flow into the sink node (t) is same as flow


going out from the source node (s) and thus the
flow is conserved.
 Total amount of flow from source s is equal to
total amount of flow into the sink t.
Example

 A flow f in G with value | f | = 19.


 Each edge (u, v) is labeled by
f (u,v)/c(u,v). Slash notation Edge Capacity Flow Comment
separates the flow and capacity; it (s,v1) 16 11 11 ≤ 16 Valid
does not indicate division. (s,v2) 13 8 8 ≤ 13 Valid
| f |  f ( s, v)  f (v, t )  19
vV vV (v2,v1) 4 1 1 ≤ 4 Valid
 The flow across nodes v1, v2 , v3,
(v1,v3) 12 12 12 ≤ 12 Valid
and v4 are also conserved as flow
(v3,v2) 9 4 4 ≤ 9 Valid
into them = flow out of them.
(v2,v4) 14 11 11 ≤ 14 Valid
 v1: 11 + 1 = 12
 v2 : 8 + 4 = 1 + 11 (v4,v3) 7 7 7 ≤ 7 Valid
 v3: 12 + 7 = 4 + 15, and (v3,t) 20 15 15 ≤ 20 Valid
 v4: 11 = 7 + 4. (v4,t) 4 4 4 ≤ 4 Valid
The Maximum Flow Problem

 Given: In simple terms maximize


 Graph G(V,E), the s to t flow, while ensuring
that the flow is feasible.
 f (u,v) = flow on edge (u,v),
 c(u,v) = capacity of edge (u,v),
 s = source node, t = sink node.
 Maximize: | f |
 Subject to:  f (u, v)  f (v, u)  0, u  s, t
vV vV

 f (s, v)   f (v, t ) 
vV vV
f

0  f (u, v)  c(u, v), (u, v)  E


Cuts of Flow Networks

 A Cut in a network is a partition of V into S and T


(T = V – S) such that s (source) is in S and t (target)
is in T. 2 9 5

Cut
10 15 15 10
4

s 5 3 8 6 10 t

4 6 15 10
15

4 30 7
Capacity of Cut (S,T)
c( S , T )   c(u, v)
uS ,vT
2 9 5

Cut
10 15 15 10
4

s 5 3 8 6 10 t

4 6 15 10
15

4 30 7
Capacity = 30
Min Cut

 Min s-t cut (also called as a Min Cut) is a


cut of minimum capacity
2 9 5

Min s-t Cut


10 15 15 10
4

s 5 3 8 6 10 t

4 6 15 10
15

4 30 7 Capacity = 28
Flow of Min Cut

 Let f be the flow and let (S,T) be a cut. Then | f | ≤


capacity(S,T). Capacity = c(v1,v3) + c(v2,v4)
= 12 + 14 = 26.
Flow = f(v1,v3) + f(v2,v4) - f(v3,v2)
= 12 + 11 – 4 = 19.
Ford-Fulkerson Method

 FORD-FULKERSON-METHOD(G,s,t)

1. initialize flow f to 0

2. while there exists an augmenting path p in the


residual network Gf

3. augment flow f along p

4. return f
Residual Network

 Given a flow network G = (V,E) and a flow f, the


residual network of G induced by f is Gf = (V,Ef),
where
Ef = {(u,v)  V × V : cf (u,v) > 0} with residual
capacity cf
Contd…
Augmenting Path

 An augmenting path p is a simple path from s to t


on a residual network Gf.
 The maximum capacity by which we can increase
the flow on each edge in an augmenting path p is
known as the residual capacity of p, given by

c f ( p)  min{c f (u, v) : (u, v) is on p}


Max-flow min-cut theorem
 If f is a flow in a flow network G = (V,E) with
source s and sink t, then the following conditions
are equivalent:
1. f is a maximum flow in G.
2. The residual network Gf contains no augmenting
paths.
3. | f | = c(S,T) for some cut (S,T) of G.
Note:
If | f | = c(S,T), then c(S,T) is the required min-cut.
Basic Ford-Fulkerson Algorithm

 Line 7. Add residual capacity to the flow over an existing


edge (u,v) in E.
 Line 8. Subtract residual capacity from the flow over an
existing edge (v,u) in E.
Example
Flow = 4

 Initial flow = 0. Thus original network and initial


residual network is same.
 Path: s → v1 → v3 → v2 → v4 → t.
 Residual capacity: min(16, 12, 9, 14, 4) = 4.
 All edges in path exists in E, so add 4 to the initial
flow for all the edges in the path (0 + 4 = 4).
Contd…

 Update residual network using,


Contd…

 Path: s → v2 → v1 → v3 → t.
 Residual capacity: min(13, 4, 8, 20) = 4.
 All edges in path exists in E, so add 4 to the flow for
all the edges in the path.
Flow = 4 + 4
Contd…

 Update residual network.


 Path: s → v1 → v2 → v3 → t.
Flow = 4 + 4 + 4
 Residual capacity: min(12, 4, 4, 16) = 4.
 Edges (v1,v2) and (v2,v3) in path doesn't exist in E, so
subtract residual capacity (4) from the previous flow for the
edges (v2,v1) and (v3,v2). For both the edges updated flow is
4 – 4 = 0.
Contd…

 Update residual network.


 Path: s → v2 → v4 → v3 → t.
 Residual capacity: min(9, 10, 7, 12) = 7.
 All edges in path exists in E, so add 7 to the flow for all
the edges in the path.
Flow = 4 + 4 + 4 + 7
Contd…

 Update residual network.


 Path: s → v1 → v3 → t.
 Residual capacity: min(8, 4, 5) = 4.
 All edges in path exists in E, so add 4 to the flow for all
the edges in the path.
Flow = 4 + 4 + 4 + 7 + 4
Contd…

 Update residual network.


 No path exists in the residual network from s to t.
 Loop terminates and the final flow is
Flow = 4 + 4 + 4 + 7 + 4 = 23

 | f | = 23.
 c({s, v1,v2,v4},{v3,t}) = 23.
4
Example 2 2
1
5
3 1 1
2 4 2
4 s t
2 5 3 2
1 1
3 1 1
2 3
2 4
s t
3 2 4
1 2 5
1
3 3 1 1
2/2 4 2/2
s t
3 2
Flow = 2 1
3
4
Contd… 2
1
5
3 1 1
2/2 4 2/2
4 s t
2 5 3 1 2
1
3 1 1
2 2
3
4
s t
3 2 4
1 2 5
1
3 3 1 1
2/2 4 2/2
s t
2/3 1 2/2
Flow = 2 + 2
3
4
Contd… 2
1
5
3 1 1
2/2 4 2/2
4 s t
2 5 2/3 1 2/2
1
3 1 1
2
3
4
s t
1 1/4
1 2 2 5
2 1
1/3 1 1/1
3
2/2 4 2/2
s t
2/3 1 2/2
Flow = 2 + 2 + 1
3
1/4
Contd… 2
1
5
1/3 1 1/1
3 2/2 4 2/2
s t
2 1 5 2/3 1 2/2
2 1 1
1 1
2 3
4
s t
1 2
1
2  | f | = 5.
3
 c({s,2,3,4,5},{t}) = 5.

Flow = 2 + 2 + 1 = 5
Analysis

 If f * is the maximum flow, then algorithm executes


while loop of lines 3–8 at most | f *| times, assuming
flow increases by at least one unit in each iteration.
 The time to find a path in a residual network is O(E).
 The time to update capacity and flow values is O(1).
 Each iteration of while loop thus takes O(E) time, as
does the initialization in lines 1–2.
 Thus, the total running time of the FORD-
FULKERSON algorithm is O(E | f *|).
Edmonds-Karp algorithm

 It’s similar to a Ford-Fulkerson method.

 It chooses the augmenting path as a shortest


path from s to t in the residual network.

 Edmonds-Karp algorithm runs in O(VE2) time.


Applications

 Application area of max flow min cut is very


vast.

 Interested students may refer the document


available at
http://www.cs.princeton.edu/~wayne/cs423/lec
tures/max-flow-applications
Objectives

• Introduction to Dynamic Programming


• Difference between Greedy Method and
Dynamic Programming
• Memoization and Tabular method
• Examples of Memoization and Tabular
method
• Matrix Chain multiplication problem
THANKS !
Objectives

• 0-1 knapsack problem


• Recursive formula for solving 0-1 knapsack using Dynamic
programming approach
• Example of 0-1 knapsack problem
• Algorithm and time complexity
Longest Common Subsequence Problem using DP

• Longest Common Subsequence (LCS)Problem


• Dynamic Programming Approach
• Example of LCS
• Algorithm using DP approach
• Time Complexity
THANKS !
Objectives

• Matrix multiplication
• Matrix chain multiplication
• Deriving formula using DP and How to use
the formula
• Example of matrix chain multiplication
THANKS !
Optimal Binary Search Tree (OBST)
• Binary Search Tree
• Possible no. of BST for given ‘n’
• Cost of BST
• Recursive formula for constructing
OBST
• Example of Constructing OBST using
DP
1 Backtracking

So far, we covered the following algorithm design techniques:

1. incremental,

2. divide-and-conquer,

3. dynamic programming,

4. greedy.

Each technique allows us to make progress, either implied (incremental ap-


proach), or as part of the optimal substructure (divide-and-conquer, dy-
namic programming, greedy approach). If we cannot use the problem’s
structure to make progress and we want an exact, not an approximate solu-
tion, we can use the backtracking technique, but only as a last resort.

1.1 Navigating a maze

Consider the problem of navigating a maze. In general, one does not know
the structure of the maze when beginning. In fact, a maze may be con-
structed not to have any particular structure. So there is no obvious way to
break down the problem into smaller pieces.
One way to solve a maze is to try every potential solution in the search
space, i.e. every possible sequence of steps from the maze entrance to its
exit. In order to narrow down and systematize the search, we will do the
following: if we hit a dead end, then we back up until the last choice that
we made and make another choice.

1.2 A definition

The basic strategy of backtracking is to systematically explore the space of


all potential solutions. It does so by:

• extending a partial solution in some feasible way,

• trying another extension if the extension fails,

1
• backing up to a smaller partial solution when the options for extending
a partial solution are exhausted.

We should use backtracking only as a last resort since it is expensive.

2 n-Queens problem

Input: An integer n ≥ 4.

Output: A placement of n queens on an n × n chessboard so that no two


can attack each other.

Recall that a queen may move any length along a diagonal or any length
along a row or column. So, for example, if n = 4, a solution is (1, 3), (2, 1), (3, 4)
and (4, 2).
In order to use backtracking for this problem, we need to describe what a
potential solution looks like.

2.1 Potential solutions

As seen in the example above, a potential solution is n points in an n × n


array:

(x1 , y1 ), (x2 , y2 ), ..., (xn , yn ).

Note that an exhaustive search will check (n2 )n possible solutions, one for
each set of n coordinates. Let us use backtracking to narrow down the
search space. To do this, we need to develop tests that check whether a
partial solutions cannot be extended to a complete solution.

2.2 Analyzing the partial solutions and pruning

We can make several observations that eliminate some of the possible partial
solutions:

2
1. There must be exactly one queen per row. This is because two queens
in a row would allow them to attack each other, while fewer than one
queen per row would not allow n queens to be located on the board.
This means that the solutions must have the form:

(1, y1 ), (2, y2 ), ..., (n, yn ).

2. There must be exactly one queen per column, for the same reason
as above. This means that (y1 , y2 , ..., yn ) must be a permutation of
(1, 2, ..., n).
The search space has thus been decreases to ”only” n! permutations.

3. There is at most one queen on each diagonal, since two queens on a


diagonal would be able to attack each other. Suppose we have queens
at positions (a, b) and (c, d) in a possible solution. This restriction
means that |a − c| = |b − d|.

We say that we prune a partial solution if we do not extend it any further


because it cannot be extended to a complete solution.

2.3 The strategy

We will generate all n! permutation until we find a solution. We do this by


building up the solutions one position at a time.

Step 1: Choose a column for the queen in row 1.

Step i: Given the positions of the first i − 1 queens (1, y1 ), (2, y2 ), ..., (i −
1, yi−1 ), choose the position for (i, yi ). If you cannot complete Step
i for some choice of previous positions (1, y1 ), (2, y2 ), ..., (i − 1, yi−1 ),
then backtrack and re-choose the position for (i − 1, yi−1 ).

We will prune a partial solution if it forces the placement of two queens


on the same column or diagonal. Consider the example of the 4-Queens
problem:

Step 1: Place a queen in the 1st row. The choices are (1, 1), (1, 2), (1, 3), (1, 4).
We first choose (1, 1)

3
Step 2: Place a queen in the 2nd row. Given our choise of (1, 1), we
cannot choose either (2, 1) [same column] or (2, 2) [same diagonal].
The remaining choices are (2, 3), (2, 4).

Step 3: Given our choices of (1, 1), (2, 3) we cannot choose (3, 1) [same
column], (3, 2) [same diagonal], (3, 3) [same column], or (3, 4) [same
diagonal]. So we eliminate choice (1, 1), (2, 3).

Step 2: Given our choice of (1, 1), the remaining choice in row 2 is (2, 4).

Step 3: Given the choice (1, 1), (2, 4), we cannot choose (3, 1) [same col-
umn], (3, 3) [same diagonal], (3, 4) [same column]. Our only choice is
(3, 2).

Step 4: Given our partial solution (1, 1), (2, 4), (3, 2), we cannot choose
any of (4, 1) [same column], (4, 2) [same diagonal], (4, 3) [same diago-
nal] or (4, 4) [same column]. Since we have no more choices in step 3
and step 2, we backtrack to step 1 and eliminate partial solution (1, 1).

Step 1: We choose (1, 2).

Step 2: given our choice of (1, 2), we cannot place a queen in (2, 1) [same
diagonal], (2, 2) [same column] or (2, 3) [same diagonal]. That leaves
only (2, 4) as a choice.

Step 3: Given choice (1, 2), (2, 4), we cannot place a queen in (3, 2) [same
column], (3, 3) [same diagonal] or (3, 4) [same column]. This leaves
only choice (3, 1).

Step 4: Given choice (1, 2), (2, 4), (3, 1), we cannot place a queen in (4, 1)
[same column], (4, 2) [same column], or (4, 4) [same column]. This
leaves only choice (4, 3).

A solution is thus: (1, 2), (2, 4), (3, 1), (4, 3). Draw the tree representing our
backtracking steps as an exercise!

2.4 Algorithms

We will consider a recursive version of the n-Queens algorithm. Let Q[1...n]


store the positions of the n queens, i.e. Q[i] will be the column of the queen
in row i.

4
// extends a partial solution of size k-1 to one of size k,
// unless the partial solution is a complete solution, in
// which case it is output; the initial call is NQueens(Q,1,N)
NQueens(Q,k,n)
if k=n+1 then
output Q[1..n]
quit // if only one solution is needed
for i = 1 to n
Q[k] = i
if Feasible(Q, k)
then NQueens(Q, k+1, n)

In both algorithms, we use a procedure Feasible(Q, k) that checks whether


Q[1..k] is feasible, given that Q[1..k − 1] is. This procedure checks that no
queen in rows 1, ..., k − 1 is in the same column or diagonal as the queen in
position (k, Q[k]).

Feasible(Q, k)
for i = 1 to k-1
if Q[k] = Q[i] or |Q[i] - Q[k]| = |i-k| then
return false
return true

3 General recursive backtracking

Backtrack(partial solution)
if partial solution is a solution then
output solution
return (or quit)
for all possible extensions of partial solution
if extension is feasible then
Backtrack(extension)

4 Graph Coloring

Input: A (undirected) graph G = (V, E) and an integer k.

5
Output: An assignment of one of k colors to each node, such that adjacent
nodes get different colors.

For example, let G = (V, E) where V = {1, 2, 3, 4} and E = {(1, 2), (2, 3), (2, 4), (3, 4)}
and suppose that k = 3. A valid coloring c of G is: c(1) = R, c(2) =
G, c(3) = B, c(4) = R.

Example 1 A classical theorem in graph theory, the Four Color Theorem,


proved in 1976 using a computer, states that any planar graph can be properly
colored with four colors. So, for example, K5 and K3,3 are not planar.

Example 2 The register allocation problem is a graph coloring problem in


disguise.

4.1 Potential solutions

Suppose that |V | = n. Then (c1 , c2 , ..., cn ) is a possible coloring of G where


ci is the color of node i in G. Note that there are k n possible colorings. A
coloring is feasible or valid if no two adjacent nodes are given the same color,
that is, if (i, j) ∈ E then ci 6= cj .
Consider a graph G = (V, E) where V = {1, 2, 3, 4} and E = {(1, 2), (1, 3),
(2, 3), (2, 4), (3, 4)} and let k = 3. There are six valid colorings of G given
in the following table:

node p q r s t u
1 R R G G B B
2 G B B R R G
3 B G R B G R
4 R R G G B B

Note that all these colorings are sort of equivalent. They all share the
following structure:

• The same color is used for both node 1 and node 4. For colorings p
and q, it is R, for colorings r and s, it is G, and for colorings t and u,
it is B.

6
• Nodes 2 and 3 must have distinct colors different from each other and
from the color used for nodes 1 and 4.

Definition 1 Two colorings are equivalent if one can be transformed into


another by permuting the k colors.

4.2 Using backtracking to find valid colorings

We will use the following strategy to find all valid colorings of a graph
G = (V, E):

1. Order nodes arbitrarily.

2. Assign the first node a color.

3. Given a partial assignment of colors (c1 , c2 , ..., ci−1 ) to the first i − 1


nodes, try to find a color for the i-th node in the graph.

4. If there is no possible color for the i-th node given the previous choices,
backtrack to a previous solution.

Consider the graph G given earlier, where E = {(1, 2), (1, 3), (2, 3), (2, 4), (3, 4)}
and k = 3.

Step 1: Choose a color for node 1. It can be one of: R, B or G. Say we


choose R.

Step 2: Given partial coloring (R), we choose a color for node 2. It can
be one of: G or B. Say we choose G.

Step 3: Given partial coloring (R, G), we choose a color for node 3. It
cannot be either R or G, so it must be B since k = 3.

Step 4: Given partial coloring (R, G, B), we choose a color for node 4. It
cannot be B or G, so it must be R. This gives the coloring (R, G, B, R)
which is coloring p.
We have no more choices of colors in step 4, and in step 3. We have
one choice in step 2.

Step 2: Given partial coloring (R), we choose a different color for node 2.
We choose B for node 2.

7
Step 3: Given partial coloring (R, B), we choose a color for node 3. It
cannot be R nor B, so it must be G.

Step 4: Given partial coloring (R, B, G), we choose a color for node 4. It
cannot be B or G, so it must be R. This gives the coloring (R, B, G, R)
which is coloring q.
We have no more choices of colors in steps 4, 3 and 2. We go back to
step 1.

Step 1: We choose a different color for node 1, say B. This will produce
a branch in the tree equivalent to the first branch where R and B are
switched. Thus we will get the colorings t and u.
If we choose G for node 1 then, we will again get a branch equivalent
to the first one with R and G swapped. This will produce the colorings
r and s.

4.3 Algorithm

We now give a recursive version of the graph coloring algorithm. Let C[1...j−
1] be a partial coloring for the first j − 1 nodes.

Color(C,j,k,n)
if j = n+1 then
output C
return or quit
for i = 1 to k
C[j] = i
if valid(C,j,n) then
Color(C,j+1,k,n)

where

Valid(C,j,n)
for all neighbors v of j with v < j
if C[v] = C[j] then
return false
return true

8
4.4 Pruning

If we are simply looking for a single solution, we can cut off the equivalent
branches of the tree to save time. For example, the three main branches of
the backtracking tree obtained in section 4.2 all gave equivalent solutions.
We need only consider the first branch if we want a single solution.
The following algorithm prunes the tree to remove equivalent branches. It
uses the following strategy:

• Keep track of the largest color used so far.

• Try the previously used colors first.

• Never try more than one new color.

ColorP(C,j,last,k,n)
if j = n+1 then
output C
return or quit
for i = 1 to last //try old colors first
C[j] = i
if valid(C,j,n) then
ColorP(C,j+1,last,k,n)
if last < k then
C[j] = last + 1
ColorP(C,j+1,last+1,k,n)

5 Subset sum

We now return to a problem introduced in homework 3.

Input: A set T = {t1 , t2 , ..., tn } of positive integers; an integer M .


P
Output: A subset S of T such that x∈S x = M .

You solved this problem using dynamic programming. Let us apply back-
tracking to it.

9
5.1 Potential solutions

Let (x1 , x2 , ..., xn ) be a representation of a potential solution such that xi = 1


if ti ∈ S and xi = 0 if ti 6∈ S.

Example 3 Let T = {4, 7, 6, 3, 1} and M = 10.

(1, 0, 1, 0, 0) gives 4 + 6 = 10, so it is a valid solution.

(0, 0, 1, 1, 1) gives 6 + 3 + 1 = 10, so it is a valid solution.

(1, 0, 0, 1, 1) gives 4 + 3 + 1 = 8, so it is not a valid solution.

We will use the following backtracking strategy:

1. Start with S = {}.

2. Given a partial list (of length i − 1) of elements used, check if element


i can be used.

3. Backtrack or prune as needed.

Let us develop a test for pruning a partial solution X = (x1 , x2 , ..., xk ). We


can clearly prune X if

k
X
xi ti > M
i=1

Furthermore, X can be pruned if

k
X n
X
xi ti + ti < M.
i=1 i=k+1

Let us implement this strategy on input T = {4, 7, 6, 3, 1} and M = 10. In


drawing the tree, we will label each internal node with the sum so far.

Step 1: Choose 4 or not?


Suppose we don’t and get partial solution (0).

10
Step 2: Choose 7 or not?
Suppose we don’t and get partial solution (0, 0).

Step 3: Choose 6 or not?


Suppose we don’t and get partial solution (0, 0, 0). Since 3 + 1 = 4 <
10, this partial solution won’t lead to a solution so we prune.
Suppose we choose 6, and get partial solution (0, 0, 1).

Step 4: Choose 3 or not?


Suppose we don’t and get partial solution (0, 0, 1, 0). Since 6 + 1 =
7 < 10, this partial solution won’t lead to a solution, so we prune.
Suppose we choose 3, and get partial solution (0, 0, 1, 1).

Step 5: Choose 1 or not?


Suppose we don’t and get partial solution (0, 0, 1, 1, 0). Since 6 + 3 =
9 < 10, this partial solution won’t lead to a solution, so we prune.
Suppose we choose 1, and get solution (0, 0, 1, 1, 1).
We have no more choices in steps 4, 5 and 6.

Step 2: Suppose we choose 7 and get partial solution (0, 1).

Step 3: Choose 6 or not?


Suppose we don’t and get partial solution (0, 1, 0).

Step 4: Choose 3 or not?


Suppose we don’t and get partial solution (0, 1, 0, 0). Since 7 + 1 =
8 < 10, this partial solution won’t lead to a solution, so we prune.
Suppose we choose 3, and get solution (0, 0, 1, 0, 1, 0).
We have no more choices to make in step 4.

Step 3: Suppose we choose 6 and get partial solution (0, 1, 1). Since 7+6 =
13 > 10, this is not a solution.
We have no more choices to make in steps 3 and 2.

Step 1: Suppose we choose 4, and get partial solution (1).

Step 2: Choose 7 or not?


Suppose we don’t and get partial solution (1, 0).

11
Step 3: Choose 6 or not?
Suppose we don’t and get partial solution (1, 0, 0). Since 4 + 3 + 1 =
8 < 10, this partial solution won’t lead to a solution, so we prune.
Suppose we choose 6, and get solution (1, 0, 1, 0, 0, 0).
We have no more choices to make in step 3.

Step 2: Suppose we choose 7 and get partial solution (1, 1). Since 4 + 7 =
11 > 10, this is not a solution.

5.2 The algorithm

SubsetSum(S,T,k,n)
if k = n+1 then
output S
return
S[k] = 0
if InRange(S,T,k,n) then
SubsetSum(S, k+1, n)
S[k] = 1
if InRange(S,T,k,n) then
SubsetSum(S, k+1, n)

where

InRange(S,T,k,n)
S1 = Sum_{i=1}^{k} T[i]S[i]
if S1 > M then
return false
S2 = Sum_{i=k+1}^{n} T[i]
if S1 + S2 < M then
return false
return true

12
String Matching

Finding all occurrences of a pattern in


the text.
Applications
• Spell Checkers.
• Search Engines.
• Spam Filters.
• Intrusion Detection System.
• Plagiarism Detection.
• Bioinformatics – DNA Sequencing.
• Digital Forensics.
• Information Retrieval, etc.
String Matching Problem
• Let,
• Text T [1..n] is an array of length n.
• Pattern P [1..m] is an array of length m ≤ n.
• P and T are drawn from a finite alphabet Σ.
– Σ = {0,1} or Σ = {a, b, …, z}.
• Example:
–T=abcabaabcabac
–P=abaa
Contd… T 1
a
2
b
3
c
4
a
5
b
6
a
7
a
8
b
9
c
10
a
11
b
12
a
13
c
shift = 0 P a b a a
shift = 1 P a b a a
shift = 2 P a b a a
shift = 3 P a b a a
shift = 4 P a b a a
shift = 5 P a b a a
shift = 6 P a b a a
shift = 7 P a b a a
shift = 8 P a b a a
shift = 9 P a b a a
Contd…

• Shift s a valid shift, if P occurs with shift s in T.


– 0 ≤ s ≤ n – m and T [s + 1..s + m] = P [1..m].
• Otherwise, shift s is an invalid shift.
• String-matching problem means finding all valid
shifts with which a given pattern P occurs in a given
text T .
Naive String Matching Algorithm
NAIVE-STRING-MATCHER(T,P)
1. n = T.length
2. m = P.length
3. for s = 0 to n – m
4. if P [1..m] == T [s + 1..s + m]
5. print “Pattern occurs with shift” s

• Complexity: O((n – m + 1)m)


Example

• Text T = acaabc, and pattern P = aab.


Rabin-Karp Algorithm
• Uses elementary number-theoretic notions.
– Equivalence of two numbers modulo a third
number.
• Let, Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
• String of k consecutive characters represents a
length-k decimal number.
– Thus, character string 31415 corresponds to the
decimal number 31,415.
• Note:
– In the general case, each character is a digit in
radix-d notation, where d = |Σ|.)
Example
P 3 1 4 1 5 mod 13 = 7

T 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1

mod 13
8
9
3
A few calculations…
• For a pattern P [1..m], let p denote its corresponding
value in radix-d notation.
• Using Horner’s rule, p can be computed in time ϴ(m).
p = P [m] + d(P [m – 1] + d(P [m – 2] + … + d(P [2] + dP[1]) …)).

• Similarly, for a text T [1..n], let ts denotes the radix-d


notation value of the length-m substring T [s + 1..s +
m], for s = 0, 1, …,n – m.
• Again, t0 can be computed from T [1..m] in ϴ(m).
Contd…
• Each of the remaining values t1, t2, …, tn – m can be
computed in constant time.
– Subtracting dm – 1T [s + 1] removes the high-order digit
from ts, multiplying the result by d shifts the number left
by one digit position, and adding T [s + m + 1] brings in
the appropriate low-order digit.
ts +1 = d (ts – dm – 1T [s + 1]) + T [s + m + 1].
• Let h = dm – 1, then
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1].
• Modulus:
– p modulo q takes ϴ(m) time.
– For all ts, ts modulo q takes ϴ(n – m + 1) time.
Algorithm
Example – 1
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 1: s = 0, t0 = 8, p = 7.
p == t0 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t1 = 10 (8 – 3 (2)) + 2 (mod 13)
t1 = 10 (2) + 2 (mod 13) = 22 (mod 13) = 9.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 2: s = 1, t1 = 9, p = 7.
p == t1 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t2 = 10 (9 – 3 (3)) + 3 (mod 13)
t2 = 10 (0) + 3 (mod 13) = 3 (mod 13) = 3.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 3: s = 2, t2 = 3, p = 7.
p == t2 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t3 = 10 (3 – 3 (5)) + 1 (mod 13) = 10 (-12) + 1 (mod 13)
Because -12 mod 13 = 1 t3 = 10 (1) + 1 (mod 13) = 11 (mod 13) = 11.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 4: s = 3, t3 = 11, p = 7.
p == t3 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t4 = 10 (11 – 3 (9)) + 4 (mod 13) = 10 (-16) + 4 (mod 13)
Because -16 mod 13 = 10 t4 = 10 (10) + 4 (mod 13) = 104 (mod 13) = 0.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 5: s = 4, t4 = 0, p = 7.
p == t4 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t5 = 10 (0 – 3 (0)) + 1 (mod 13)
t5 = 1 (mod 13) = 1.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 6: s = 5, t5 = 1, p = 7.
p == t5 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t6 = 10 (1 – 3 (2)) + 5 (mod 13) = 10 (-5) + 5 (mod 13)
Because -5 mod 13 = 8 t6 = 10 (8) + 5 (mod 13) = 85 (mod 13) = 7.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.
Step 7: s = 6, t6 = 7, p = 7.
p == t6 → Yes.
Character by character matching p[1..5] == T[7..11].
{3 1 4 1 5} == {3 1 4 1 5}. Match, hence s = 6 is a valid shift.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t7 = 10 (7 – 3 (3)) + 2 (mod 13) = 10 (-2) + 2 (mod 13)
Because -2 mod 13 = 11 t = 10 (11) + 2 (mod 13) = 112 (mod 13) = 8.
7
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 8: s = 7, t7 = 8, p = 7.
p == t7 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t8 = 10 (8 – 3 (1)) + 6 (mod 13)
t8 = 10 (5) + 6 (mod 13) = 56 (mod 13) = 4.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 9: s = 8, t8 = 4, p = 7.
p == t8 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t9 = 10 (4 – 3 (4)) + 7 (mod 13) = 10 (-8) + 7 (mod 13)
Because -8 mod 13 = 5 t9 = 10 (5) + 7 (mod 13) = 57 (mod 13) = 5.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 10: s = 9, t9 = 5, p = 7.
p == t9 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t10 = 10 (5 – 3 (1)) + 3 (mod 13)
t10 = 10 (2) + 3 (mod 13) = 23 (mod 13) = 10.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 11: s = 10, t10 = 10, p = 7.


p == t10 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t11 = 10 (10 – 3 (5)) + 9 (mod 13) = 10 (-5) + 9 (mod 13)
Because -5 mod 13 = 8 t11 = 10 (8) + 9 (mod 13) = 89 (mod 13) = 11.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 12: s = 11, t11 = 11, p = 7.


p == t11 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t12 = 10 (11 – 3 (2)) + 9 (mod 13)
t12 = 10 (5) + 9 (mod 13) = 59 (mod 13) = 7.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.
Step 13: s = 12, t12 = 7, p = 7.
p == t12 → Yes.
Character by character matching p[1..5] == T[13..17].
{3 1 4 1 5} == {6 7 3 9 9}. Mismatch occurs at first character.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t13 = 10 (7 – 3 (6)) + 2 (mod 13) = 10 (-11) + 2 (mod 13)
Because -11 mod 13 = 2 t
13 = 10 (2) + 2 (mod 13) = 22 (mod 13) = 9.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 14: s = 13, t13 = 9, p = 7.


p == t13 → No.
s < 14 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 13)
t14 = 10 (9 – 3 (7)) + 1 (mod 13) = 10 (-12) + 1 (mod 13)
Because -12 mod 13 = 1 t14 = 10 (1) + 1 (mod 13) = 11 (mod 13) = 11.
Contd…
Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Text (T) 2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
Pattern: 3 1 4 1 5
Σ = {0, 1, …, 9} d = |Σ| = 10 q = 13
• n = 19, m = 5, n – m = 14.
• h = 105-1 mod 13 = 104 mod 13 = 3.
• p = (3 × 104 + 1 × 103 + 4 × 102 + 1 × 101 + 5 × 100) mod 13 = 7.
• t0 = (2 × 104 + 3 × 103 + 5 × 102 + 9 × 101 + 0 × 100) mod 13 = 8.

Step 15: s = 14, t14 = 11, p = 7.


p == t14 → No.
s < 14 → No.

Step 16: s = 15.


Loop terminates.
Index
Example – 2 1

Text (T) a
2

a
3

b
4

b
5

c
6

a
7

b
8

a
ASCII 97 97 98 98 99 97 98 97
• n = 8, m = 3, n – m = 5. Pattern: c a b
• h = 263-1 mod 3
= 262 mod 3 = 1.
Σ = {a, b, …, z} d = |Σ| = 26 q=3
• p = (99 × 262 + 97 × 261 + 98 × 260) mod 3 = 1.
• t0 = (97 × 262 + 97 × 261 + 98 × 260) mod 3 = 2.

Step 1: s = 0, t0 = 2, p = 1.
p == t0 → No.
s < 5 → Yes.
ts +1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 3)
t1 = 26 (2 – 1 (97)) + 98 (mod 3)
t1 = 26 (2 – 1 (1)) + 2 (mod 3)
(because 97 mod 3 = 1 and 98 mod 3 = 2)
= 26 (1) + 2 (mod 3)
= 28 (mod 3) = 1.
Index
Contd… 1

Text (T) a
2

a
3

b
4

b
5

c
6

a
7

b
8

a
ASCII 97 97 98 98 99 97 98 97

Pattern: c a b

Step 2: s = 1, t1 = 1, p = 1.
p == t1 → Yes.
Character by character matching p[1..3] == T[2..4].
{c a b} == {a b b}. Mismatch occurs at first character.
s < 5 → Yes.
ts + 1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 3)
t2 = 26 (1 – 1 (97)) + 99 (mod 3)
t2 = 26 (1 – 1 (1)) + 0 (mod 3)
(because 97 mod 3 = 1 and 99 mod 3 = 0)
= 26 (0) (mod 3)
= 0.
Index
Contd… 1

Text (T) a
2

a
3

b
4

b
5

c
6

a
7

b
8

a
ASCII 97 97 98 98 99 97 98 97

Pattern: c a b

Step 3: s = 2, t2 = 0, p = 1.
p == t2 → No.
s < 5 → Yes.
ts + 1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 3)
t3 = 26 (0 – 1 (98)) + 97 (mod 3)
t3 = 26 (0 – 1 (2)) + 1 (mod 3)
(because 97 mod 3 = 1 and 98 mod 3 = 2)
= 26 (0 – 2) + 1 (mod 3)
= 26 (0 + 1) + 1 (mod 3)
(because 3's complement of -2 = 1)
= 26 (1) + 1 (mod 3) = 27 (mod 3) = 0.
Index
Contd… 1

Text (T) a
2

a b
3 4

b
5

c
6

a
7

b
8

a
ASCII 97 97 98 98 99 97 98 97

Pattern: c a b

Step 4: s = 3, t3 = 0, p = 1.
p == t3 → No.
s < 5 → Yes.
ts + 1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 3)
t4 = 26 (0 – 1 (98)) + 98 (mod 3)
t4 = 26 (0 – 1 (2)) + 2 (mod 3)
(because 98 mod 3 = 2)
= 26 (0 – 2) + 2 (mod 3)
= 26 (0 + 1) + 2 (mod 3)
(because 3's complement of -2 = 1)
= 26 (1) + 2 (mod 3) = 28 (mod 3) = 1.
Index
Contd… 1

Text (T) a
2

a
3

b
4

b
5

c
6

a
7

b
8

a
ASCII 97 97 98 98 99 97 98 97

Pattern: c a b

Step 5: s = 4, t4 = 1, p = 1.
p == t4 → Yes.
Character by character matching p[1..3] == T[5..7].
{c a b} == {c a b}. Match, hence s = 4 is a valid shift.
s < 5 → Yes.
ts + 1 = d (ts – hT [s + 1]) + T [s + m + 1] (mod 3)
t5 = 26 (1 – 1 (99)) + 97 (mod 3)
t5 = 26 (1 – 1 (0)) + 1 (mod 3)
(because 97 mod 3 = 1 and 99 mod 3 = 0)
= 26 (1 – 0) + 1 (mod 3)
= 26 (1) + 1 (mod 3) = 27 (mod 3) = 0.
Index
Contd… 1

Text (T) a
2

a
3

b
4

b
5

c
6

a
7

b
8

a
ASCII 97 97 98 98 99 97 98 97

Pattern: c a b

Step 6: s = 5, t5 = 0, p = 1.
p == t5 → No.
s < 5 → No.

Step 7: s = 6.
Loop terminates.

Text (T) a a b b c a b a

2 1 0 0 1 0
Complexity
• Takes ϴ(m) preprocessing time.
• Worst-case running time is O(m (n – m + 1)).‫‏‬
– Example: P = am and T = an, each of the [n – m +
1] possible shifts is valid.
• In many applications, there are a few valid shifts
(say some constant c). In such applications, the
expected matching time is only O(n – m + 1) + cm),
plus the time required to process spurious hits.
Contd…
• Probabilistic analysis
– The probability of a false positive hit for a random
input is 1/q.
– The expected number of false positive hits is O(n/q)‫‏‬.
– The expected run time is O(n) + O(m(v + n/q))), if v is
the number of valid shifts.‫‏‬
• Choosing q ≥ m and having only a constant number of
hits, then the expected matching time is O(n + m).
• Since m ≤ n, this expected matching time is O(n).
Knuth-Morris-Pratt Algorithm
• Based on the concept of prefix function for a
pattern.
– Encapsulates knowledge about how the pattern
matches against shifts of itself.
– This information can be used to avoid testing of
invalid shifts.
• T: b a c b a b a b a a b c b a b
• P: ababaca
Contd…
Contd…

• Given that pattern characters P [1..q] match text


characters T [s + 1..s + q], what is the least shift
s' > s such that for some k < q,
P [1..k] = T[s' + 1..s' + k], where s' + k = s + q?
i.e. s' = s + (q – k)
• Best case k = 0. Skips (q – 1) shifts.
Contd… Note: Pk = P[1..k]. Similarly, Pq = P[1..q].

• In other words, knowing that Pq is a suffix of Ts + q, find the


longest proper prefix Pk of Pq that is also a suffix of Ts + q.

• Since suffix Pk of Ts + q should also be suffix of Pq. Thus, an


equivalent statement is “determine the greatest k < q, such
that Pk is a suffix of Pq”.
Prefix Function
q 1 2 3 4 5 6 7
• P: a b a b a c a k 0 0 1 2
• Pq = P[1..q]
• P1 = {a}. Pk = {}. No matching prefix and suffix of P1.
• P2 = {a b}. Pk = {}. No matching prefix and suffix of P2.
• P3 = {a b a}. Pk = {a}. P4 = {a b a b}. Pk = {a b}.
Prefix Suffix k Prefix Suffix k
a a 1 a b 1
ab ba 2 ab ab 2
aba bab 3
Contd… q 1 2 3 4 5 6 7
k 0 0 1 2 3 0 1
• P: a b a b a c a
• Pq = P[1..q]
• P5 = {a b a b a}. Pk = {a b a}. • P7 = {a b a b a c a}. Pk = {a}.
Keep the longest. Prefix Suffix k
Prefix Suffix k a a 1
a a 1 ab ca 2
ab ba 2 aba aca 3
aba aba 3 abab baca 4
abab baba 4 ababa abaca 5
ababac babaca 6
• P6 = {a b a b a c}. Pk = {}. No matching prefix and suffix of P6 as 'c'
does not appear in any of the proper prefix.
Prefix Function
q→

k→

• Given a pattern P [1..m], the prefix function for the


pattern P is the function Π : {1, 2, …, m} → {0, 1,
…, m – 1} such that
Π [q] = max {k : k < q and Pk is a proper suffix of Pq.
• It contains the length of the longest prefix of P that
is a proper suffix of Pq.
Contd…
KMP Algorithm
Complexity
• The COMPUTE-PREFIXFUNCTION runs in ϴ(m) time as the while
loop in lines 6–7 executes at most m – 1 times altogether.
1. Line 4 starts k at 0, and the only way to increase k is the increment
operation in line 9, which executes at most once per iteration of the
for loop of lines 5–10. Thus, the total increase in k is at most m – 1.
2. Second, since k < q upon entering the for loop and each iteration of
the loop increments q and k < q always. Assignments in lines 3 and 10
ensure that Π[q] < q for all q = 1, 2, …, m, which means that each
iteration of the while loop decreases k.
3. Third, k never becomes negative.
– Altogether, the total decrease in k from the while loop is bounded
from above by the total increase in k over all iterations of the for loop,
which is m – 1.
• Using similar analysis, the matching time of KMP-MATCHER is
ϴ(n).
Example (Preprocessing)
• m = 7, Π[1] = 0, k = 0.
Step 1: q = 2, k = 0 P[k + 1] == P[q]
(if) P[1] == P[2]. Mismatch.
Π[2] = 0.
Step 2: q = 3, k = 0. P[k + 1] == P[q]
(if) P[1] == P[3]. Match. k++ = 1
Π[3] = 1.
Step 3: q = 4, k = 1. P[k + 1] == P[q]
(if) P[2] == P[4]. Match. k++ = 2
Π[4] = 2.
Contd…
Step 4: q = 5, k = 2 P[k + 1] == P[q]
(if) P[3] == P[5]. Match. k++ = 3
Π[5] = 3.
Step 5: q = 6, k = 3. P[k + 1] == P[q]
(while) P[4] == P[6]. Mismatch. k = Π[k] = 1
(while) P[2] == P[6]. Mismatch. k = Π[k] = 0
(if) P[1] == P[6]. Mismatch.
Π[6] = 0.
Step 6: q = 7, k = 0. P[k + 1] == P[q]
(if) P[1] == P[7]. Match. k++ = 1
Π[7] = 1.
Step 7: q=8 Loop terminates.
Contd… n = 15
m=7
(Matching)
• T: b a c b a b a b a b a c a a b

• P: a b a b a c a

Step 1: i = 1, q = 0 P[q + 1] == T[i]


(if) P[1] == T[1]. Mismatch.
b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 2: i = 2, q = 0 P[q + 1] == T[i]


(if) P[1] == T[2]. Match. q++ = 1

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 3: i = 3, q = 1 P[q + 1] == T[i]


(while) P[2] == T[3]. Mismatch.
q = Π[q] = 0
b a c b a b a b a b a c a a b

a b a b a c a

(if) P[1] == T[3]. Mismatch.


b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 4: i = 4, q = 0 P[q + 1] == T[i]


(if) P[1] == T[4]. Mismatch.

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 5: i = 5, q = 0 P[q + 1] == T[i]


(if) P[1] == T[5]. Match. q++ = 1

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 6: i = 6, q = 1 P[q + 1] == T[i]


(if) P[2] == T[6]. Match. q++ = 2

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 7: i = 7, q = 2 P[q + 1] == T[i]


(if) P[3] == T[7]. Match. q++ = 3

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 8: i = 8, q = 3 P[q + 1] == T[i]


(if) P[4] == T[8]. Match. q++ = 4

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 9: i = 9, q = 4 P[q + 1] == T[i]


(if) P[5] == T[9]. Match. q++ = 5

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 10: i = 10, q = 5 P[q + 1] == T[i]


(while) P[6] == T[10]. Mismatch. q = Π[q] = 3
b a c b a b a b a b a c a a b

a b a b a c a
(if) P[4] == T[10]. Match. q++ = 4
b a c b a b a b a b a c a a b
Shifts skipped = 1 and first three
characters are not compared.
a b a b a c a
Comparison starts from P[4].
Contd… n = 15
m=7

Step 11: i = 11, q = 4 P[q + 1] == T[i]


(if) P[5] == T[11]. Match. q++ = 5

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 12: i = 12, q = 5 P[q + 1] == T[i]


(if) P[6] == T[12]. Match. q++ = 6

b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 13: i = 13, q = 6 P[q + 1] == T[i]


(if) P[7] == T[13]. Match. q++ = 7
b a c b a b a b a b a c a a b

a b a b a c a
• q == m. Yes.
– Pattern occurs with shift i – m =13 – 7 = 6.
– q = Π[q] = 1.
Contd… n = 15
m=7

Step 14: i = 14, q = 1 P[q + 1] == T[i]


(while) P[2] == T[14]. Mismatch. q = Π[q] = 0
b a c b a b a b a b a c a a b

Shifts skipped = 5 and first character is not a b a b a c a


compared. Comparison starts from P[2].
(if) P[1] == T[14]. Match. q++ = 1
b a c b a b a b a b a c a a b

a b a b a c a
Contd… n = 15
m=7

Step 15: i = 15, q = 1 P[q + 1] == T[i]


(if) P[2] == T[15]. Match. q++ = 2
b a c b a b a b a b a c a a b

a b a b a c a
Step 16: i = 16
Loop terminates.
Boyer-Moore algorithm
• An efficient string searching algorithm that has
been the standard benchmark for practical string
search literature.
• Developed by Robert S. Boyer and J Strother
Moore in 1977
• A simplified version of it or the entire algorithm is
often implemented in text editors for the “search”
and “substitute” commands.
• The reason is that it works the fastest when the
alphabet is large and the pattern is relatively long.
Contd…
• The execution time of the Boyer-Moore algorithm
can be sub-linear.
– It does not need to check every character of the
text, but rather skips over some of them.
• The BM algorithm gets faster as the pattern
becomes longer.
– It utilizes the information gained from each
unsuccessful attempt to rule out as many
positions as possible of the text where the strings
cannot match.
The main ideas…
• Scan the pattern backwards (right to left).
– The strings are matched from the end of P to the start
of P, i.e. P[m], P[m-1], …P[1].
• The bad character shift rule.
– Avoids repeating unsuccessful comparisons against a
target character (from the text).
• The good suffix shift rule.
– Aligns only matching pattern characters against target
characters already successfully matched.
• Either rule works alone, but they are more effective
together.
Backward Scan
• Align the start of P with the start of T.
• Characters in P and T are then compared starting at
index m in P and k in T, moving backward.
• The comparisons continue until either the
beginning of P is reached (which means there is a
match) or a mismatch occurs.
• Example:
– T: This picture shows a nice view of the park.
– P: future
Contd…
• T: This picture shows a nice view of the park.
• P: future
1. P[6] == T[12] = e, 2. P[5] == T[11] = r,
3. P[4] == T[10] = u, 4. P[3] == T[9] = t,
5. P[2] == T[8] → u ≠ c
– Mismatch. Shift P to the right (relative to the text)
and start comparisons again from the right end.
• The length of shift is given by the two rules:
– The bad character shift rule.
– The good suffix shift rule.
Bad Character Shift Rule
• Upon mismatch, skip alignments until
– Mismatch becomes a match, or
– P moves past the mismatched character.
1. T: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
P: C C T T T T G C (can be a match as C exists at position 2)
2. T: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
P: Gs =C3T C C T T T T G C (A does not exist in pattern)
3. T: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
s = 10
P: G C T T C TGCTTCCTTTTGC
Contd…
• Assume that a mismatch occurs between characters
P[i] (= a) and T[i + j] (= b) during an attempt at shift j.
– That is, P[i + 1 .. m] = T[i + j + 1 .. j + m] = u and P[i] ≠ T[i + j].

• If “b” is not contained anywhere in P, then shift the


pattern P completely past T[i + j].
Contd…
• Otherwise, shift the pattern P until the rightmost
occurrence of the character “b” in P gets aligned
with T[i + j].
Preprocessing
• For all characters x in the alphabet, define the
function R(x) to compute the bad character shift.
• R(x) = 0 , if x does not occur in P.
max { i < m| P[i] = x} , otherwise
• When
P[i + 1 .. m] = T[i + j + 1 .. j + m] = u and
P[i] ≠ T[i + j] and T[i + j] = b,
• Then, shift P to the right by max {1, i - R(b)}.
• R(x) can be computed in O(m) time.
Contd…
If the right-most
occurrence of b in P[1..m-
1] is at k < i, then P[k] and
T[i + j] get aligned.

If the right-most
occurrence of b in P[1..m-
1] is at k > i, the pattern is
shifted to the right by one.

If b does not occur in


P[1..m-1], then shift = i, and
the pattern is next aligned
with T[i + j + 1.. i + j + m].
Example ∑
R(x)
A
0
C
max (1, 2) = 2
G
7
T
max (3, 4, 5, 6) = 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

T: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
P: C C T T T T G C
j = 0. Shift = 0 + max {1, i - R(b)} = 0 + max {1, 5 – R(C)}
= 0 + max { 1, 5 – 2} = 0 + 3 = 3.

T: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
P: CC T T T TGC
j = 3. Shift = 3 + max {1, i - R(b)} = 3 + max {1, 7 – R(A)}
= 3 + max { 1, 7 – 0} = 3 + 7 = 10.

T: G C T T C T G C T A C C T T T T G C G C G C G C G C G G A A
P: CC T T T TGC
Good Suffix Shift Rule
1. T: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
P: C T T A C T T A C
(TAC can be aligned with the rightmost occurrence TAC)
2. T: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
s=4
P: CTTACTTAC
(TACTTAC has no rightmost occurrence, go for longest prefix-
suffix match alignment in between CTTACTTAC and TACTTAC)
3. T: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
s=8
P: CTTACTTAC
Contd…
• Let P[i .. m] = T[i + j .. j + m] = u and P[i - 1] ≠ T[i + j - 1].
• The good-suffix shift consists in aligning the segment u = T[i
+ j .. j + m] (= P[i .. m]) with its rightmost occurrence in
pattern (which is not a suffix) that is preceded by a
character different from P[i – 1].

• The original weaker version of this rule did not require that
the character preceding the rightmost occurrence of u
should not be P[i – 1].
Contd…
• If there exists no such segment, the shift consists in
aligning the longest suffix v of T[i + j .. j + m] with a
matching prefix of P.
Contd…
• If no such shift is possible, i.e., the longest suffix v
of T[i + j .. j + m] which matches a prefix of P, is
empty, then shift P to the right by P.length (= m).
Example – 1
1. T: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
P: C T T A C T T A C

(TAC can’t be aligned with the rightmost occurrence TAC as


preceding character (T) is same in both the occurrences.)
Thus, the longest prefix-suffix match alignment in between
CTTACTTAC and TAC is used to compute the shift increment)

2. T: C G T G C C T A C T T A C T T A C T T A C T T A C G C G A A
P: CTTACTTAC
Example – 2
1...3456789……

• Align text[7..9] = "ted" with some character


sequence "xed" in the pattern, such that x ≠ l.

• In this example the character sequence pat[1..3] =


"ped" is the required one, so shift pattern such that
pat[1..3] gets aligned with text[7..9].
Example – 3

• The character sequence "ed" is not appearing again


in pattern, so check for longest suffix matching. In
this case a single character 'd' is the required match,
thus align pat[1] with text[21].
Example – 4

• In this case,
– The character sequence "led" is not appearing
again in pattern.
– The longest suffix matching also results in an
empty string.
• So shift the pattern completely.
Preprocessing (Compute L(i))
• For i = 2,.., m + 1, compute L(i) as follows:
– L(i) is the largest position, less than m, such that P[i..m]
matches a suffix of P[1..L(i)] and, moreover, this suffix is
not preceded by character P[i-1].
• If no such copy of P[i..m] exists, then let L(i) = 0.
Contd… (Compute L(i))
• Since P[m + 1..m] = ε, L(m + 1) is the right-most
position j such that P[j] ≠ P[m] (or 0 if all characters
are equal).

• L(i) gives the right end position of the rightmost


copy of P[i..m], which is not a suffix of pattern, and
is not preceded by P[i - 1].

• The values L(i) can be computed in O(m) time.


Contd…
• For a given i, the value L(i) < m is computed as
follows:
– Take the suffix P[i..m] = u.
– Search for the rightmost occurrence of u in P
such that the preceding letter is not P[i-1].
– If there is such an occurrence then L(i) = the
right-end position of this occurrence of u.
– Otherwise, L(i) = 0
Example
• P: antecedence
• P.length = 11, i = 2, 3, …, 11, 12

• L(12) = 10, P[12..11] = ε, antecedence


• L(11) = 8, P[11] = e, antecedence
• L(10) = 6, P[10..11] = ce, antecedence
• L(9) = 0, P[9..11] = nce antecedence
• L(8) = ... = L(2) = 0.
Contd… (Compute I(i))
• For i ≥ 2, let l(i) = the length of the largest suffix of
P[i..m] that is also a prefix of pattern, if one exists.
• Otherwise, l(i) = 0.
Example
• P: ababa
• P.length = 5, i = 2, 3, 4, 5, 6

• l(6) = 0 (since P[m+1..m] = ε)


• l(5) = 1 (P[5] = a, the largest suffix is 'a')
• l(4) = 1 (P[4..5] = ba, the largest suffix is 'a')
• l(3) = 3 (P[3..5] = aba, the largest suffix is 'aba')
• l(2) = 3 (P[2..5] = baba, the largest suffix is 'aba')
Computing Shifts
• The computed values L(i) and l(i) are used to get the
length of a shift using the good suffix rule.
• Suppose, an occurrence of P[i..m] exists but there is
a mismatch on P[i – 1]
Contd…
• If L(i) > 0, then shift P by m – L(i) characters to the
right, i.e., the prefix of length L(i) of the shifted
pattern aligns with the suffix of length L(i) of the
unshifted pat.
Contd…
• If L(i) = 0, then the good shift rule shifts pattern by
m – l(i) characters.
Contd…
• If mismatch occurs for the first comparison, i.e., on P[m],
then shift the pattern P by one position to the right.

• When an occurrence of P has been found, then shift


pattern to the right by m – l(2) positions, to align a prefix of
pattern with the longest matching proper suffix of the
occurrence.
– l(2) = the length of the largest suffix of P[2..m] that is also a prefix
of pattern, if one exists.
Boyer Moore Algorithm
• Given a pattern P of length m and text T of length n.
• Pre-processing
– Compute the values L(i) and l(i) for 2 ≤ i ≤ m+1.
– Also compute R(x) for all characters x from the alphabet.
• Matching
– Let index k represents the right-end of the current
occurrence of pattern that is being matched to the text.
– Thus, a shift of pattern is implemented by increasing k
with the appropriate amount of positions.
Contd…
1. k = m
2. while k ≤ n do begin
3. i = m, h = k
4. while i > 0 and P[i] == T[h] do begin
5. i - -, h - -
6. end
7. if i = 0 then begin
8. pattern found at a shift of h
9. k = k + m – I(2)
10. end
11. else
12. shift pattern (increase k) by the maximum amount
determined by the bad character rule and the good suffix
rule.
13. end
14. end
2 3 4 5 6 7 8 9 10
Example – 1 L: 0 0 0 0 0 6 0 7 8
A C G T I: 1 1 1 1 1 1 1 1 0
R: 3 8 7 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

T: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
P: G T A G C G G C G

Step 1: k = 9. Mismatch at i = 9.
Shift using BC = max{ 1, i – R(T)} = max(1,9-2) = 7.
Shift using GS = 1 (first character mismatched)
Maximum shift = 7.
k = k + 7 = 9 + 7 = 16
A C G T 2 3 4 5 6 7 8 9 10
Contd… R: 3 8 7 2 L: 0 0 0 0 0 6 0 7 8
I: 1 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

T: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
P: G T AG C GG C G
1 2 3 4 5 6 7 8 9 i

Step 2: k = 16. Mismatch at i = 6.


Shift using BC = max{ 1, i – R(C)} = max(1,6-8) = 1.
Shift using GS:
As L(i + 1) = L(7) = 6 ≠ 0.
Thus, shift = m – L(i + 1) = 9 – 6 = 3.
Maximum shift = 3.
k = k + 3 = 16 + 3 = 19
A C G T 2 3 4 5 6 7 8 9 10
Contd… R: 3 8 7 2 L: 0 0 0 0 0 6 0 7 8
I: 1 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

T: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
P: G T AG C GG C G
1 2 3 4 5 6 7 8 9 i

Step 3: k = 19. Mismatch at i = 3.


Shift using BC = max{ 1, i – R(C)} = max(1,3-8) = 1.
Shift using GS:
As L(i + 1) = L(4) = 0.
Thus, shift = m – I(i + 1) = 9 – 1 = 8.
Maximum shift = 8.
k = k + 8 = 19 + 8 = 27
A C G T 2 3 4 5 6 7 8 9 10
Contd… R: 3 8 7 2 L: 0 0 0 0 0 6 0 7 8
I: 1 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

T: G T T A T A G C T G A T C G C G G C G T A G C G G C G A A
P: G T AG C GG C G
1 2 3 4 5 6 7 8 9 i

Step 4: k = 27. Pattern found at T[k – m + 1..k]. i = 0.


Shift using BC = 1 (pattern found).
Shift using GS = m – I(2) = 9 – 1 = 8.
Maximum shift = 8.
k = k + 8 = 27 + 8 = 35.

Step 5: k > n. 35 > 29. Loop terminates.


A C G T 2 3 4 5 6 7 8 9
Example – 2 R: 7 2 6 0 L: 0 0 0 6 0 4 1 7
I: 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: G C A T C G C A G A G A G T A T A C A G T A C G
P: G C A G A G A G

Step 1: k = 8. Mismatch at i = 8.
Shift using BC = max{ 1, i – R(A)} = max(1,8-7) = 1.
Shift using GS = 1 (first character mismatched)
Maximum shift = 1.
k=k+1=8+1=9
A C G T 2 3 4 5 6 7 8 9
Contd… R: 7 2 6 0 L: 0 0 0 6 0 4 1 7
I: 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: G C A T C G C A G A G A G T A T A C A G T A C G
P: G C AGAGAG
1 2 3 4 5 6 7 8 i

Step 2: k = 9. Mismatch at i = 6.
Shift using BC = max{ 1, i – R(C)} = max(1,6-2) = 4.
Shift using GS:
As L(i + 1) = L(7) = 4 ≠ 0.
Thus, shift = m – L(i + 1) = 8 – 4 = 4.
Maximum shift = 4.
k = k + 4 = 9 + 4 = 13
A C G T 2 3 4 5 6 7 8 9
Contd… R: 7 2 6 0 L: 0 0 0 6 0 4 1 7
I: 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: G C A T C G C A G A G A G T A T A C A G T A C G
P: G C AGAGAG
1 2 3 4 5 6 7 8 i

Step 3: k = 13. Pattern found at T[k – m + 1..k]. i = 0.


Shift using BC = 1 (pattern found).
Shift using GS = m – I(2) = 8 – 1 = 7.
Maximum shift = 7.
k = k + 7 = 13 + 7 = 20
A C G T 2 3 4 5 6 7 8 9
Contd… R: 7 2 6 0 L: 0 0 0 6 0 4 1 7
I: 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: G C A T C G C A G A G A G T A T A C A G T A C G
P: G C AGAGAG
1 2 3 4 5 6 7 8 i

Step 4: k = 20. i = 6.
Shift using BC = max{ 1, i – R(C)} = max(1,6-2) = 4.
Shift using GS:
As L(i + 1) = L(7) = 4 ≠ 0.
Thus, shift = m – L(i + 1) = 8 – 4 = 4.
Maximum shift = 4.
k = k + 4 = 20 + 4 = 24
A C G T 2 3 4 5 6 7 8 9
Contd… R: 7 2 6 0 L: 0 0 0 6 0 4 1 7
I: 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: G C A T C G C A G A G A G T A T A C A G T A C G
P: G C AGAGAG
1 2 3 4 5 6 7 8 i

Step 5: k = 24. i = 7.
Shift using BC = max{ 1, i – R(C)} = max(1,7-2) = 5.
Shift using GS:
As L(i + 1) = L(8) = 1 ≠ 0.
Thus, shift = m – L(i + 1) = 8 – 1 = 7.
Maximum shift = 7.
k = k + 7 = 24 + 7 = 31

Step 6: k > n. 31 > 24. Loop terminates.


Complexity
• If BM algorithm uses only the strong good suffix rule, then
it has O(n) worst-case time complexity if the pattern does
not occur in the text.
• If the pattern does appear in the text, then the algorithm
runs in O(mn) worst-case.
• However, by slightly modifying this algorithm, it can achieve
O(n) worst-case time complexity in all cases.
– This was first proved by Galil in 1979.
• It can be proved that the BM algorithm has O(n) time
complexity also when it uses both shift rules (Bad character
and Good suffix shift rules)
• On natural language texts the running time is almost always
sub-linear.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X

• Solve it using all the string matching algorithm


giving total
– Number of shifts and
– Number of character comparisons.

• For Rabin-Karp
– Σ = {0, 1}, q = 13, X = 0, Y=1
Naïve approach
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: X Y X Y Y X Y X Y X X
P: Total shifts X Y X Y Y X Y X Y X X
= 24 – 11 + 1 = 14
P: X Y X Y Y X Y X Y X X
P: Total character comparisons X Y X Y Y X Y X Y X X
P: = 4 + 1 + 2 + 5 + 1 + 11 + 1 X Y X Y Y X Y X Y X X
+ 3 + 1 + 1 + 5 + 1 + 11 + 1
P: X Y X Y Y X Y X Y X X
= 48
P: X Y X Y Y X Y X Y X X
Σ = {0, 1}, q = 13, X = 0, Y=1
Rabin–Karp h = 211-1 mod 13 = 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 0 0 1
P: X Y X Y Y X Y X Y X X • p = (29 + 27 + 26 + 24+ 22) mod 13 = 9
0 1 0 1 1 0 1 0 1 0 0 • T[1..11] = t0 = (29 + 26 + 24 + 22+ 21) mod 13 = 0
• T[2..12] = t1 = (2(0 – 10.0) + 1) mod 13 = 1
• T[3..13] = t2 = (2(1 – 10.1) + 0) mod 13 = 8
• p = 9.
• T[4..14] = t3 = (2(8 – 10.0) + 1) mod 13 = 4
• t7 = 9. Compare T[8..18] with P[1..11].
• T[5..15] = t4 = (2(4 – 10.0) + 0) mod 13 = 8
– T[8] = P[1] = X
– T[9] = P[2] = Y • T[6..16] = t5 = (2(8 – 10.1) + 1) mod 13 = 10
– T[10] = Y and P[3] = X. Mismatch • T[7..17] = t6 = (2(10 – 10.0) + 1) mod 13 = 8
• t12 = 9. Compare T[13..23] with P[1..11]. • T[8..18] = t7 = (2(8 – 10.1) + 0) mod 13 = 9
– Match. • T[9..19] = t8 = (2(9 – 10.0) + 1) mod 13 = 6
• T[10..20] = t9 = (2(6 – 10.1) + 0) mod 13 = 5
Total shifts = 24 – 11 + 1 = 14 • T[11..21] = t10 = (2(5 – 10.1) + 1) mod 13 = 4
• T[12..22] = t11 = (2(4 – 10.0) + 0) mod 13 = 8
Total character comparisons = 3 + 11 = 14 • T[13..23] = t12 = (2(8 – 10.1) + 0) mod 13 = 9
• T[14..24] = t13 = (2(9 – 10.0) + 1) mod 13 = 6
KMP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
Π: 0 0 1 2 0 1 2 3 4 3 1
• i = 1, q = 0. T[1] = P[1] (Shift = 0). q = 1.
• i = 2, q = 1. T[2] = P[2] (Shift = 0). q = 2.
• i = 3, q = 2. T[3] = P[3] (Shift = 0). q = 3.
• i = 4, q = 3. T[4] ≠ P[4] (Shift = 0). q = Π[3] = 1.
T[4] ≠ P[2] (Shift = 2). q = Π[1] = 0.
T[4] = P[1] (Shift = 3). q = 1.
• i = 5, q = 1. T[5] = P[2] (Shift = 3). q = 2.
• i = 6, q = 2. T[6] = P[3] (Shift = 3). q = 3.
• i = 7, q = 3. T[7] = P[4] (Shift = 3). q = 4.
• i = 8, q = 4. T[8] ≠ P[5] (Shift = 3). q = Π[4] = 2.
T[8] = P[3] (Shift = 5). q = 3.
• i = 9, q = 3. T[9] = P[4] (Shift = 5). q = 4.
• i = 10, q = 4. T[10] = P[5] (Shift = 5). q = 5.
Contd…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
Π: 0 0 1 2 0 1 2 3 4 3 1 • i = 23, q = 10.
• i = 11, q = 5. T[11] = P[6] (Shift = 5). q = 6. T[23] = P[11] (Shift = 12). q = 11.
• i = 12, q = 6. T[12] = P[7] (Shift = 5). q = 7. Pattern found at shift 12. q = Π[11] = 1.
• i = 13, q = 7. T[13] = P[8] (Shift = 5). q = 8. • i = 24, q = 1.
• i = 14, q = 8. T[14] = P[9] (Shift = 5). q = 9. T[24] = P[2] (Shift = 22). q = 2.
• i = 15, q = 9. T[15] = P[10] (Shift = 5). q = 10.
• i = 16, q = 10. T[16] ≠ P[11] (Shift = 5). q = Π[10] = 3.
T[16] = P[4] (Shift = 12). q = 4.
• i = 17, q = 4. T[17] = P[5] (Shift = 12). q = 5.
• i = 18, q = 5. T[18] = P[6] (Shift = 12). q = 6. Total shifts = 6
• i = 19, q = 6. T[19] = P[7] (Shift = 12). q = 7.
• i = 20, q = 7. T[20] = P[8] (Shift = 12). q = 8. Total character comparisons = 28
• i = 21, q = 8. T[21] = P[9] (Shift = 12). q = 9.
• i = 22, q = 9. T[22] = P[10] (Shift = 12). q = 10.
2 3 4 5 6 7 8 9 10 11 12
Boyer–Moore L: 0 0 0 0 0 0 0 0 0 10 9

X Y I: 1 1 1 1 1 1 1 1 1 1 0

R: 10 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 1: k = 11. Shift = 0.
BC = max(1,10 – 9) = 1; GS = 11 – 10 = 1;
k = 11 + 1 = 12. Shift = 0 + 1 = 1.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 2: k = 12. Shift = 1.
BC = max(1,11 – 9) = 2; GS = 1;
k = 12 + 2 = 14. Shift = 1 + 2 = 3.
X Y 2 3 4 5 6 7 8 9 10 11 12
Contd… R: 10 9 L: 0 0 0 0 0 0 0 0 0 10 9
I: 1 1 1 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 3: k = 14. Shift = 3.
BC = max(1,11 – 9) = 2; GS = 1;
k = 14 + 2 = 16. Shift = 3 + 2 = 5.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 4: k = 16. Shift = 5.
BC = max(1,11 – 9) = 2; GS = 1;
k = 16 + 2 = 18. Shift = 5 + 2 = 7.
X Y 2 3 4 5 6 7 8 9 10 11 12
Contd… R: 10 9 L: 0 0 0 0 0 0 0 0 0 10 9
I: 1 1 1 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 5: k = 18. Shift = 7.
BC = max(1,10 – 9) = 1; GS = 11 – 10 = 1;
k = 18 + 1 = 19. Shift = 7 + 1 = 8.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 6: k = 19. Shift = 8.
BC = max(1,11 – 9) = 2; GS = 1;
k = 19 + 2 = 21. Shift = 8 + 2 = 10.
X Y 2 3 4 5 6 7 8 9 10 11 12
Contd… R: 10 9 L: 0 0 0 0 0 0 0 0 0 10 9
I: 1 1 1 1 1 1 1 1 1 1 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 7: k = 21. Shift = 10.
BC = max(1,11 – 9) = 2; GS = 1;
k = 21 + 2 = 23. Shift = 10 + 2 = 12.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

T: X Y X X Y X Y X Y Y X Y X Y X Y Y X Y X Y X X Y
P: X Y X Y Y X Y X Y X X
• Step 8: k = 23. Shift = 12.
Pattern found. Total shifts = 8
k = 21 + 11 - 1 = 31. Shift = 12 + 10 = 22.
Total character
comparisons = 20

You might also like