Graphs 2: MST & Kruskal's Introduction
Graphs 2: MST & Kruskal's Introduction
Graphs 2: MST & Kruskal's Introduction
Graphs 2
● Is always connected.
● Contains no cycle.
If there are n vertices and e edges in the graph, then any spanning tree
corresponding to that graph contains n
vertices and n
-1 edges.
1
● A connected and undirected graph can have more than one spanning tree.
● Spanning tree is free of loops, i.e., it is acyclic.
● Removing any one of the edges will make the graph disconnected.
● Adding an extra edge to the spanning tree will create a loop in the graph.
In a weighted graph, the MST is a spanning tree with minimum weight than all other
spanning trees of that graph. Refer to the image below for better understanding.
Kruskal’s Algorithm:
This algorithm is used to find MST for the given graph. It builds the spanning tree by
adding edges one at a time. We start by picking the edge with minimum weight,
adding that edge into the MST, and increasing the count of edges in the spanning
tree by one. Now, we will be picking the minimum weighted edge by excluding the
already chosen ones and correspondingly increasing the count. While choosing the
edge, we will also make sure that the graph remains acyclic after including the
2
same. This process will continue until the count of edges in the MST reaches n
-1.
Ultimately, the graph obtained will be MST.
3
This is the final MST obtained using Kruskal’s algorithm. It can be checked manually
that the final graph is the MST for the given graph.
Cycle Detection
While inserting a new edge in the MST, we have to check if introducing that edge
makes the MST cyclic or not. If not, then we can include that edge, otherwise not.
Now, let’s figure out a way to detect the cycle in a graph. The following are the
possible cases:
● By including an edge between the nodes A and B, if both the nodes A and B
are not present in the graph, then it is safe to include that edge as including
it, will not bring a cycle to the graph.
● Out of two vertices, if anyone of them has not been visited (or not present in
the MST), then that vertex can also be included in the MST.
● If both the vertices are already present in the graph, they can introduce a
cycle in the MST. It means we can’t use this method to detect the presence of
the cycle.
Now, before adding an edge to the MST, we will check if a path between two
vertices of that edge already exists in the MST or not. If not, then it is safe to add
that edge to the MST.
As discussed in previous lectures, the time complexity of the hasPath f unction is
O(E+V), where E is the number of edges in the graph and, V is the number of
vertices. So, for (n-1) edges, this function will run (n-1) times, leading to bad time
complexity, as in the worst case, E = V2.
4
Union-find algorithm:
Before adding any edge to the graph, we will check if the two vertices of the edge lie
in the same component of MST or not. If not, then it is safe to add that edge to the
MST.
● We will assume that initially, the total number of disjoint sets is equal to the
number of vertices in the graph starting from 0 to n-1.
● We will maintain a parent array specifying the parent vertex of each of the
vertex of the graph. Initially, as each vertex belongs to the different disjoint
set (connected component), hence each vertex will be its own parent.
● Now, before inserting any edge into the MST, we will check the parent of the
vertices. If their parent vertices are equal, they belong to the same connected
component; hence it is unsafe to add that edge. Otherwise, we can add that
5
edge into the MST, and simultaneously update the parent array so that they
belong to the same component(Refer to the code on how to do so).
6
7
Note: W
hile finding the parent of the vertex, we will be finding the topmost
parent(Oldest Ancestor). For example: suppose, the vertex 0 and the vertex 1 were
connected, where the parent of 0 is 1, and the parent of 1 is 1. Now, while
determining the parent of the vertex 0, we will visit the parent array and check the
vertex at index 0. In our case, it is 1. Now we will go to index 1 and check the parent
of index 1, which is also 1. Hence, we can’t go any further as the index is the parent
of itself. This way, we will be determining the parent of any vertex.
The time complexity of the union-find algorithm becomes O(V) for each vertex in
the worst case due to skewed-tree formation, where V is the number of vertices in
the graph. Here, we can see that time complexity for cycle detection has
significantly improved compared to the previous approach.
8
Till now, we have studied the logic behind Kruskal’s algorithm for finding MST. Now,
let’s discuss how to implement it in code.
Consider the code below and follow the comments for a better understanding.
#include <iostream>
#include <algorithm>
using namespace std;
class Edge { // Class that store values for each vertex
public:
int source;
int dest;
int weight;
};
bool compare(Edge e1, Edge e2) { / / Comparator function used to sort edges
return e1.weight < e2.weight; // Edges will sorted in order of their weights
}
int findParent(int v, int *parent) { / / Function to find the parent of a vertex
if (parent[v] == v) { // Base case, when a vertex is parent of itself
return v;
}
// Recursively called to find the topmost parent of the vertex.
return findParent(parent[v], parent);
}
void kruskals(Edge *input, int n, int E) {
sort(input, input + E, compare); / / In-built sort function: Sorts the edges in
// increasing order of their weights
Edge *output = new Edge[n-1]; // Array to store final edges of MST
int *parent = new int[n]; // Parent array initialized with their indexes
for (int i = 0; i < n; i++) {
parent[i] = i;
}
int count = 0; / / To maintain the count of number of edges in the MST
9
10
In our code, we have the following three steps: (Here, the total number of vertices is
n, and the total number of edges is E)
Hence, the total time complexity of Kruskal’s algorithm becomes O(E log(E) + n.E).
This time complexity is bad and needs to be improved. We can’t reduce the time
taken for sorting, but the time taken for cycle detection can be improved using
another algorithm named Union by Rank and Path Compression. You need to
explore this on yourselves. The basic idea in these algorithms is that we will be
avoiding the formation of skewed-tree structure, which reduces the time
complexity for each vertex to O(log(E)).
Prim’s Algorithm
This algorithm is used to find MST for a given undirected-weighted graph (which
can also be achieved using Kruskal’s Algorithm).
In this algorithm, the MST is built by adding one edge at a time. In the beginning,
the spanning tree consists of only one vertex, which is chosen arbitrarily from the
set of all vertices of the graph. Then the minimum weighted edge, outgoing from
this vertex, is selected and simultaneously inserted into the MST. Now, the tree
11
contains two edges. Further, we will be selecting the edge with the minimum weight
such that one end is already present there in the MST and the other one from the
unselected set of vertices. This process is repeated until we have inserted a total of
(n-1) edges in the MST.
12
13
Implementation:
● We are considering the starting vertex to be 0 with a parent equal to -1, and
weight is equal to 0 (The weight of the edge from vertex 0 to vertex 0 itself).
● The parent of all other vertices is assumed to be NIL, and the weight will be
equal to infinity, which means that the vertex has not been visited yet.
● We will mark the vertex 0 as visited and rest as unvisited. If we add any
vertex to the MST, then that vertex will be shifted from the unvisited section
to the visited section.
● Now, we will update the weights of direct neighbors of vertex 0 with the edge
weights as these are smaller than infinity. We will also update the parent of
these vertices and assign them 0 as we reached these vertices from vertex 0.
● This way, we will keep updating the weights and parents, according to the
edge, which has the minimum weight connected to the respective vertex.
14
#include <iostream>
#include <climits>
using namespace std;
int findMinVertex(int *weights, bool *visited, int n) {
int minVertex = -1; / / Initialized to -1 means there is no vertex till now
for (int i = 0; i < n; i++) {
// Conditions : the vertex must be unvisited and either minVertex value is -1
// or if minVertex has some vertex to it, then weight of currentvertex
// should be less than the weight of the minVertex.
if (!visited[i] && (minVertex == -1 || weights[i] < weights[minVertex])) {
minVertex = i;
}
}
return minVertex;
}
void prims(int **edges, int n) {
int *parent = new int[n];
int *weights = new int[n];
bool *visited = new bool[n];
// Initially, the visited array is assigned to false and weights array to infinity.
for(int i = 0; i < n; i++) {
visited[i] = false;
weights[i] = INT_MAX;
}
// Values assigned to vertex 0.(the selected starting vertex to begin with)
parent[0] = -1;
weights[0] = 0;
for (int i = 0; i < n-1; i++) {
// Find min vertex
int minVertex = findMinVertex(weights, visited, n);
visited[minVertex] = true;
// Explore unvisited neighbors
for (int j = 0; j < n; j++) {
if(edges[minVertex][j] != 0 && !visited[j]) {
if(edges[minVertex][j] < weights[j]) {
// updating weight array and parent array
weights[j] = edges[minVertex][j];
parent[j] = minVertex;
}
}
}
}
15
16
● The time complexity for finding the minimum weighted vertex is O(n) for
each iteration. So for (n-1) edges, it becomes O(n2).
● Similarly, for exploring the neighbor vertices, the time taken is O(n2).
It means the time complexity of Prim’s algorithm is O(n2). We can improve this in
the following ways:
● For exploring neighbors, we are required to visit each and every vertex
because of the adjacency matrix. We can improve this by using an adjacency
list instead of a matrix.
● Now, the second important thing is the time taken to find the minimum
weight vertex, which is also taking a time of O(n2). Here, out of the available
list, we are trying to figure out the one with minimum weight. This can be
optimally achieved using a priority queue where the priority will be taken as
weights of the vertices. This will take O(log(n)) time complexity to remove a
vertex from the priority queue.
These optimizations can lead us to the time complexity of O((n+E)log(n)), which is
much better than the earlier one. Try to write the optimized code by yourself.
Dijkstra’s Algorithm
This algorithm is used to find the shortest distance between any two vertices in a
weighted non-cyclic graph.
Here, we will be using a slight modification of the algorithm according to which we
will be figuring out the minimum distance of all the vertices from the particular
source vertex.
1. We want to calculate the shortest path between the source vertex C and all
other vertices in the following graph.
17
2. While executing the algorithm, we will mark every node with its minimum
distance to the selected node, which is C in our case. Obviously, for node C
itself, this distance will be 0, and for the rest of the nodes, we will assume
that the distance is infinity, which also denotes that these vertices have not
been visited till now.
18
3. Now, we will check for the neighbors of the current node, which in our case is
A, B, and D. Now, we will add the minimum cost of the current node to the
weight of the edge connecting the current node and the particular neighbor
node. For example, for node B, it’s weight will become minimum(infinity, 0+7)
= 7. This same process is repeated for other neighbor nodes.
4. Now, as we have updated the distance of all the neighbor nodes of the
current node, we will mark the current node as visited.
19
5. After this, we will be selecting the minimum weighted node among the
remaining vertices. In this case, it is node A. Take this node as the current
node.
6. Now, we will repeat the above steps for the rest of the vertices. The pictorial
representation of the same is shown below:
20
21
22
The distances finally marked at each node are minimum from node C.
Implementation:
Let’s look at the code below for a better explanation:(Code is nearly same as that of
Prim’s algorithm, just a change while updating the distance)
#include <iostream>
#include <climits>
using namespace std;
int findMinVertex(int *distance, bool *visited, int n) {
int minVertex = -1;
for (int i = 0; i < n; i++) {
if (!visited[i] && (minVertex == -1 || distance[i] < distance[minVertex]))
{
minVertex = i;
}
}
return minVertex;
}
void dijkstra(int **edges, int n) {
int *distance = new int[n];
bool *visited = new bool[n];
for(int i = 0; i < n; i++) {
visited[i] = false;
distance[i] = INT_MAX;
}
distance[0] = 0; // 0 is considered as the starting node.
for (int i = 0; i < n-1; i++) {
// Find min vertex
int minVertex = findMinVertex(distance, visited, n);
visited[minVertex] = true;
// Explore unvisited neighbors
for (int j = 0; j < n; j++) {
if(edges[minVertex][j] != 0 && !visited[j]) {
// distance of any node will be the current node’s distance + the weight
// of the edge between them
int dist = distance[minVertex] + edges[minVertex][j];
23
24
The time complexity is also the same as that of Prim’s algorithm, i.e., O(n2). This can
be reduced by using the same approaches as discussed in Prim’s algorithm’s
content.
25