Pathfinding and Graph Search Algorithms

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Advanced Algorithm Analysis Assignment

Pathfinding and Graph Search


Algorithms

Submitted by : Nasser Saleh (2023-MScs-28)


Submitted to: Dr. Amjad
Pathfinding and Graph Search
Algorithms
Graph search algorithms explore a graph either for general discovery or explicit
search. These algorithms carve paths through the graph, but there is no expectation
that those paths are computationally optimal. We will cover Breadth First Search and
Depth First Search because they are fundamental for traversing a graph and are often
a required first step for many other types of analysis.
Pathfinding algorithms build on top of graph search algorithms and explore routes
between nodes, starting at one node and traversing through relationships until the
destination has been reached. These algorithms are used to identify optimal routes
through a graph for uses such as logistics planning, least cost call or IP routing, and
gaming simulation.
Specifically, the pathfinding algorithms we’ll cover are:
• Shortest Path, finding the shortest path or paths between two chosen nodes
• All Pairs Shortest Path and Single Source Shortest Path: for finding the shortest
paths between all pairs or from a chosen node to all others
• Minimum Spanning Tree: for finding a connected tree structure with the smallest
cost for visiting all nodes from a chosen node
• Random Walk: because it’s a useful preprocessing/sampling step for machine
learning workflows and other graph algorithms
Graph Search Algorithms
1- Breadth-First Search (BFS)
Breadth-First Search (BFS) is a fundamental graph traversal algorithm used to explore all the nodes
of a graph in breadthward motion, starting from a given source node. It's often employed for finding
the shortest path in unweighted graphs and for a wide range of other graph-related tasks. Here's a
summary of BFS along with its time and space complexities:
Input:
A graph represented as a set of nodes (vertices) and edges.
A source node from which to begin the traversal.
Initialization:
Create a queue (FIFO) to store nodes to be processed.
Mark the source node as visited.
Enqueue the source node into the queue.
Algorithm:
While the queue is not empty:
Dequeue the front node from the queue (the first node added).
Process the node (e.g., record it, perform some operation, or check conditions).
Enqueue all unvisited neighbors of the node into the queue and mark them as visited.
Repeat step 1 until the queue is empty.
Time Complexity:
The time complexity of BFS is typically O(V + E), where V is the number of nodes (vertices) and E
is the number of edges in the graph.
In the worst case, BFS visits all nodes and explores all edges.
Space Complexity:
The space complexity of BFS is O(V) in addition to the input graph, where V is the number of nodes.
This space is used for maintaining the queue and marking nodes as visited.
Key Features:
BFS explores nodes level by level, making it suitable for finding the shortest path in unweighted
graphs.
It can be used to detect connected components, determine reachability, and perform graph-related
searches.
BFS guarantees that the shortest path to any visited node is found first.
In summary, Breadth-First Search (BFS) is a versatile graph traversal algorithm known for its
ability to find the shortest path in unweighted graphs and for a wide range of other graph exploration
tasks. Its time complexity is typically linear in the number of nodes and edges in the graph, making it
efficient for many practical applications.

2- Depth-First Search (DFS)


Depth-First Search (DFS) is a fundamental graph traversal algorithm used to explore all the nodes of
a graph in a depth ward motion, starting from a given source node. It's commonly employed for tasks
like searching, topological sorting, and finding connected components. Here's a summary of DFS
along with its time and space complexities:
Input:
A graph represented as a set of nodes (vertices) and edges.
A source node from which to begin the traversal.
Initialization:
Create a stack (or use a recursive function call stack) to store nodes to be processed.
Mark the source node as visited.
Push the source node onto the stack.
Algorithm:
While the stack is not empty:
Pop the top node from the stack (the last node added).
Process the node (e.g., record it, perform some operation, or check conditions).
Push all unvisited neighbors of the node onto the stack and mark them as visited.
Repeat step 1 until the stack is empty.
Time Complexity:
The time complexity of DFS is typically O(V + E), where V is the number of nodes (vertices) and E
is the number of edges in the graph.
In the worst case, DFS visits all nodes and explores all edges.

Space Complexity:
The space complexity of DFS is determined by the maximum depth of the recursion stack or the size
of the explicit stack data structure, in addition to the input graph.
In the worst case, the space complexity can be O(V) for the recursion stack or O(V + E) for an
explicit stack, where V is the number of nodes.
Key Features:
DFS explores nodes deeply before backtracking, making it suitable for tasks like topological sorting
and finding connected components.
It is often implemented using recursion, but it can also be implemented using an explicit stack.
DFS does not guarantee that the shortest path is found; it may find a longer path before a shorter one.
In summary, Depth-First Search (DFS) is a versatile graph traversal algorithm known for its depth
ward exploration of nodes. It is used in a variety of graph-related tasks and can be implemented
recursively or iteratively using a stack. Its time complexity is typically linear in the number of nodes
and edges in the graph.

Types of Shortest Path Problems:


1. Single-Source Shortest Path: Find the shortest path from one specific source node to all other
nodes in the graph.
2. Single-Destination Shortest Path: Find the shortest path from all nodes in the graph to a
specific destination node.
3. Single-Pair Shortest Path: Find the shortest path between a specific source node and a
specific destination node.
4. All-Pairs Shortest Path: Find the shortest paths between all pairs of nodes in the graph.

Shortest Path Algorithms :


1- Dijkstra's Algorithm

Dijkstra's Algorithm is a graph traversal algorithm used to find the shortest path between two nodes
in a weighted graph. Here's a summary of the algorithm along with its time and space complexities:
Input:
A weighted, directed or undirected graph.
A source node from which to start the search for the shortest path.
Typically, the graph should have non-negative edge weights for Dijkstra's Algorithm to work
correctly.
Initialization:
Initialize a distance array or dictionary, which stores the tentative shortest distances from the source
node to all other nodes. Set the distance to the source node as 0 and infinity for all other nodes.
Maintain a priority queue or min-heap to select the node with the smallest tentative distance
efficiently.
Algorithm:
While there are unvisited nodes:
Select the node with the smallest tentative distance from the priority queue.
This node becomes the current node.
For each neighboring node of the current node:
Calculate the tentative distance from the source through the current node.
If this tentative distance is shorter than the recorded distance for that neighbor, update the neighbor's
distance.
Mark the current node as visited.
Repeat steps 1-3 until the destination node is visited or there are no more unvisited nodes.
Path Reconstruction:
Once the destination node is visited, you can reconstruct the shortest path by backtracking from the
destination node to the source node using the recorded distances and predecessors.
Time Complexity:
With a basic implementation using arrays or lists, Dijkstra's Algorithm has a time complexity of
O(V^2), where V is the number of nodes (vertices) in the graph.
However, by using a priority queue or a min-heap to efficiently select the node with the smallest
tentative distance, the time complexity can be optimized to O(E + V*log(V)), where E is the number
of edges in the graph.
Space Complexity:
The space complexity of Dijkstra's Algorithm is O(V) for storing the distance array or dictionary.
If a priority queue or min-heap is used, the space complexity is O(V) for the priority queue.
Therefore, the overall space complexity is O(V) in most cases.
Dijkstra's Algorithm is a powerful tool for finding shortest paths in graphs, but it's important to note
that it works best when edge weights are non-negative. If negative edge weights or cycles with
negative total weight are present in the graph, a different algorithm like the Bellman-Ford algorithm
should be used.
2- Bellman-Ford Algorithm
The Bellman-Ford Algorithm is a graph traversal algorithm used to find the shortest path from a
source node to all other nodes in a weighted graph. It can handle graphs with negative edge weights
and detect negative weight cycles. Here's a summary of the algorithm along with its time and space
complexities:
Input:
A weighted, directed graph.
A source node from which to start the search for the shortest paths.
There are no restrictions on edge weights; they can be positive, negative, or zero.
Initialization:
Initialize a distance array or dictionary to store the tentative shortest distances from the source node
to all other nodes. Set the distance to the source node as 0 and infinity for all other nodes.
Algorithm:
Repeat the following process V-1 times, where V is the number of nodes in the graph:
For each edge (u, v) in the graph:
Relax the edge: Update the distance to node v if the distance to node u plus the weight of the edge (u,
v) is shorter than the current recorded distance to v.
After V-1 iterations, the distance array will contain the shortest distances from the source node to all
other nodes.
Negative Weight Cycle Detection:
Run the relaxation step once more. If any distance is updated in this additional step, it indicates the
presence of a negative weight cycle in the graph.
Time Complexity:
The Bellman-Ford Algorithm has a time complexity of O(V*E), where V is the number of nodes
(vertices) and E is the number of edges in the graph.
The algorithm iterates V-1 times, and in each iteration, it examines all E edges to potentially update
distances.
Space Complexity:
The space complexity of the Bellman-Ford Algorithm is O(V) for storing the distance array or
dictionary.
Additionally, some implementations may require space for data structures to store the graph itself,
which can be O(V + E).
Key Features:
The Bellman-Ford Algorithm is versatile and can handle graphs with negative edge weights and
detect negative weight cycles.
It is slower than Dijkstra's Algorithm for graphs with non-negative edge weights but is necessary
when negative weights are present.
Detection of negative weight cycles can be useful in various applications, such as identifying
arbitrage opportunities in financial markets or preventing routing loops in network protocols.
In summary, the Bellman-Ford Algorithm is a fundamental algorithm for finding shortest paths in
weighted graphs, even in the presence of negative edge weights or negative weight cycles. However,
it has a higher time complexity compared to Dijkstra's Algorithm for non-negative weight graphs.
3- Floyd-Warshall
The Floyd-Warshall Algorithm is a dynamic programming-based algorithm used to find the shortest
paths between all pairs of nodes in a weighted graph, including graphs with negative edge weights.
Here's a summary of the algorithm along with its time and space complexities:
Input:
A weighted, directed or undirected graph.
The graph may have positive, negative, or zero edge weights.
Initialization:
Initialize a 2D array (matrix) dist of size VxV, where V is the number of nodes in the graph.
Set dist[i][j] to the weight of the edge from node i to node j if it exists; otherwise, set it to infinity.
Set dist[i][i] to 0 for all nodes.
Algorithm:
For each intermediate node k (from 1 to V), perform the following step:
For each pair of nodes i and j (from 1 to V), update dist[i][j] to the minimum of its current value
dist[i][j] and the sum of dist[i][k] and dist[k][j].
After completing all iterations, the dist matrix will contain the shortest distances between all pairs of
nodes.
Negative Cycle Detection:
To detect the presence of negative weight cycles, check if there are any negative values on the
diagonal of the dist matrix. If any diagonal element dist[i][i] is negative, it indicates the presence of a
negative weight cycle involving node i.
Time Complexity:
The Floyd-Warshall Algorithm has a time complexity of O(V^3), where V is the number of nodes
(vertices) in the graph.
It performs V iterations, and in each iteration, it updates all pairs of distances using three nested
loops.
Space Complexity:
The space complexity of the Floyd-Warshall Algorithm is O(V^2) for storing the dist matrix, which
contains the shortest distances between all pairs of nodes.
Key Features:
The Floyd-Warshall Algorithm is capable of finding the shortest paths between all pairs of nodes in a
graph, making it suitable for solving the all-pairs shortest path problem.
It can handle graphs with both positive and negative edge weights.
The algorithm can also detect the presence of negative weight cycles in the graph.
In summary, the Floyd-Warshall Algorithm is a powerful tool for finding shortest paths between
all pairs of nodes in a weighted graph, and it can handle negative edge weights. However, its time
complexity makes it less efficient than other algorithms like Dijkstra's Algorithm and the Bellman-
Ford Algorithm for specific scenarios, such as finding the shortest path from one source node to all
other nodes.

4- The A* (pronounced "A-star") algorithm

The A* (pronounced "A-star") algorithm is a widely used search algorithm that finds the shortest
path from a starting node to a goal node in a graph. It is particularly efficient for pathfinding in
situations where a heuristic function can be used to estimate the cost from the current node to the
goal node. Here's a summary of the A* algorithm along with its time and space complexities:
Input:
A graph with nodes and edges.
A starting node (initial state).
A goal node (target state).
A heuristic function that estimates the cost from a node to the goal node.
Edge costs (non-negative values).
Initialization:
Create an open list (usually implemented as a priority queue or heap) to store nodes that need to be
explored.
Initialize the open list with the starting node and its estimated cost (f-score) as the heuristic value.
Create a closed list (or set) to keep track of nodes that have already been visited.
Initialize a dictionary or array to record the best-known cost from the starting node to each node (g-
score).
Algorithm:
While the open list is not empty:
Pop the node with the lowest f-score from the open list. This node becomes the current node.
If the current node is the goal node, a path has been found.
Otherwise, expand the current node by considering its neighbors.
For each neighbor node:
Calculate the tentative g-score from the starting node to the neighbor through the current node.
If this g-score is lower than the previously recorded g-score for the neighbor (or if the neighbor is not
in the closed list), update the neighbor's g-score and add it to the open list with its f-score (f = g + h).
If the neighbor is already in the open list but with a higher g-score, ignore it.
After processing all neighbors, add the current node to the closed list.
Termination:
The algorithm terminates when the goal node is found, or the open list becomes empty, indicating
that there is no path to the goal node.
Time Complexity:
In the worst case, A* can have exponential time complexity. However, with a good heuristic function
that follows certain conditions (admissibility and consistency), the algorithm often performs
efficiently, and its time complexity is closer to O(b^d), where b is the branching factor, and d is the
depth of the optimal solution.
Space Complexity:
The space complexity of A* is primarily determined by the data structures used to store nodes in the
open and closed lists. In the worst case, it can be exponential. However, in practice, it is often
bounded by O(b^d) due to the limited number of nodes kept in memory at any given time.
Key Features:
A* combines the advantages of uniform cost search (Dijkstra's Algorithm) with informed search by
using a heuristic function.
It is efficient when a good heuristic is available and satisfies the necessary conditions.
A* guarantees the optimal solution if the heuristic is both admissible (never overestimates true costs)
and consistent (satisfies the triangle inequality).
In summary, the A* algorithm is a powerful and widely used pathfinding algorithm that efficiently
finds the shortest path from a start node to a goal node in a graph, especially when a good heuristic is
provided. Its performance depends on the quality of the heuristic and the characteristics of the
problem space.

Minimum Spanning Tree Algorithms


For finding a connected tree structure with the smallest
cost for visiting all nodes from a chosen node
1- Kruskal's Algorithm:
Kruskal's Algorithm is a popular greedy algorithm used for finding the Minimum Spanning Tree
(MST) of a connected, undirected graph. The MST is a subset of edges that connects all nodes in the
graph while minimizing the total edge weight. Here's a summary of Kruskal's Algorithm along with
its time and space complexities:
Input:
A connected, undirected graph.
The graph is represented by a set of nodes and a set of edges, where each edge has a weight.
Initialization:
Create a forest of individual trees, where each node is initially its own tree.
Create a priority queue (min-heap) containing all the edges of the graph, sorted by their weights.
Algorithm:
While the forest contains more than one tree (i.e., it's not yet a single MST):
Remove the edge with the smallest weight from the priority queue.
If adding this edge to the forest does not form a cycle (i.e., it connects two different trees), include it
in the MST by merging the two trees into one.
If adding this edge would create a cycle, discard it.
When the MST is complete (i.e., there's only one tree left in the forest), the algorithm terminates.
Time Complexity:
The time complexity of Kruskal's Algorithm is dominated by the sorting of edges and the disjoint-set
data structure operations.
If the edges are sorted using efficient sorting algorithms (e.g., merge sort or heap sort), the sorting
step takes O(E*log(E)) time, where E is the number of edges.
The disjoint-set data structure operations (union and find) take nearly linear time on average, so they
contribute O(E*log(V)) time, where V is the number of nodes.
Thus, the overall time complexity is typically O(Elog(E)) or O(Elog(V)), depending on the
implementation.
Space Complexity:
The space complexity of Kruskal's Algorithm is mainly determined by the disjoint-set data structure
and the storage of edges and nodes.
In most cases, the space complexity is O(V + E), where V is the number of nodes (for storing
disjoint-set representatives and ranks) and E is the number of edges (for storing the edges).
Key Features:
Kruskal's Algorithm is a greedy algorithm that builds the MST incrementally by adding edges in
increasing order of weight.
It's suitable for sparse graphs with many nodes and relatively few edges.
Kruskal's Algorithm guarantees that the resulting MST has the minimum total edge weight.
In summary, Kruskal's Algorithm is a fundamental algorithm for finding the Minimum Spanning
Tree of a graph. It is efficient and provides an optimal solution for minimizing the total edge weight
while ensuring that all nodes are connected.
2- Prim's Algorithm

Prim's Algorithm is a popular greedy algorithm used for finding the Minimum Spanning Tree (MST)
of a connected, undirected graph. The MST is a subset of edges that connects all nodes in the graph
while minimizing the total edge weight. Here's a summary of Prim's Algorithm along with its time
and space complexities:
Input:
A connected, undirected graph.
The graph is represented by a set of nodes and a set of edges, where each edge has a weight.
Initialization:
Create an empty set to represent the MST.
Choose an arbitrary starting node as the initial node for the MST.
Create a priority queue (min-heap) containing all the edges incident to the initial node, sorted by their
weights.
Algorithm:
While the MST does not include all nodes in the graph (i.e., it's not yet a complete tree):
Remove the edge with the smallest weight from the priority queue.
If adding this edge to the MST does not create a cycle (i.e., it connects a node from the MST to a
node outside the MST), include it in the MST and add the node on the other end of the edge to the
MST as well.
If adding this edge would create a cycle, discard it.
Add all edges incident to the newly added node (if they are not already in the MST) to the priority
queue.
When the MST includes all nodes in the graph, the algorithm terminates.
Time Complexity:
The time complexity of Prim's Algorithm depends on the implementation of the priority queue and
can be optimized in various ways.
Using a binary heap-based priority queue, the overall time complexity is typically O(E*log(V)),
where V is the number of nodes and E is the number of edges.
With Fibonacci heap-based priority queues, the time complexity can be improved to O(E +
V*log(V)).

Space Complexity:
The space complexity of Prim's Algorithm is mainly determined by the data structures used for
bookkeeping, including the set representing the MST, the priority queue, and storage for edges and
nodes.
In most cases, the space complexity is O(V + E), where V is the number of nodes (for storing nodes
in the MST) and E is the number of edges (for storing edges and nodes).
Key Features:
Prim's Algorithm is a greedy algorithm that builds the MST incrementally by adding edges with the
smallest weights.
It's suitable for graphs with a relatively small number of nodes and a dense set of edges.
Prim's Algorithm guarantees that the resulting MST has the minimum total edge weight.
In summary, Prim's Algorithm is a fundamental algorithm for finding the Minimum Spanning Tree
of a graph. It efficiently constructs a tree that spans all nodes while minimizing the total edge weight.

Random Walk :

Random Walk algorithms are computational techniques that simulate or utilize the concept of
random walks to solve various problems or explore data. These algorithms often involve making
random steps or decisions to navigate through a space or search for a solution. Here are some
examples of Random Walk algorithms and their applications:
 Monte Carlo Methods:
Monte Carlo methods use random sampling to estimate numerical results or solve problems that may
be deterministic in nature.
Examples include the Monte Carlo integration method and the Metropolis-Hastings algorithm for
Markov chain Monte Carlo (MCMC) simulations.
Applications include estimating integrals, solving optimization problems, and simulating complex
physical systems.
 Random Walk in Markov Chain Monte Carlo (MCMC):
MCMC algorithms, like the Random Walk Metropolis-Hastings algorithm, perform random walks in
high-dimensional spaces to sample from complex probability distributions.
They are widely used in Bayesian statistics, machine learning, and statistical physics for parameter
estimation and model selection.
 PageRank Algorithm:
Google's PageRank algorithm, used in web search, can be thought of as a random walk on the web
graph.
It assigns importance scores to web pages based on the probability that a random surfer will land on
them by following links.
 Random Walk on Graphs:
Random Walks on graphs are used for various tasks, including node and graph traversal, community
detection, and recommendation systems.
Algorithms like the Random Walk with Restart (RWR) can rank nodes in a graph based on their
importance or relevance to a specific query or starting node.
 Reinforcement Learning:
Some reinforcement learning algorithms use random exploration (exploration-exploitation trade-off)
to learn optimal policies.
Q-learning and SARSA are examples of algorithms that involve random actions to explore an
environment and learn value functions.
Simulated Annealing:
Simulated Annealing is an optimization technique inspired by the annealing process in metallurgy.
It involves random changes to a solution and gradually reduces the randomness over time to find the
optimal solution to a problem.
 Protein Folding Simulation:
Random Walk algorithms are used in molecular biology and bioinformatics to simulate the random
motion of atoms and explore the conformational space of protein molecules.
Such simulations are essential for understanding protein folding and function.
 Randomized Algorithms in Computer Science:
Some randomized algorithms use random choices to achieve approximate solutions to problems
efficiently.
Examples include randomized sorting algorithms like QuickSort and randomized data structures like
skip lists.
 Image and Data Analysis:
In image processing and data analysis, Random Walk algorithms can be applied for tasks such as
image segmentation and clustering.
 Navigation and Robotics:
Random Walk algorithms can be used in robot navigation and path planning when dealing with
uncertain environments or when exploring unknown areas.
These algorithms leverage randomness to explore solution spaces, make decisions, or simulate
complex systems, making them valuable tools in various fields, including optimization, data science,
machine learning, and scientific research. The choice of a specific Random Walk algorithm depends
on the problem being addressed and the desired outcome.

You might also like