Graph Analytics For Python Developers
Graph Analytics For Python Developers
Graph Analytics For Python Developers
Developers
Lecture 1
Version 1.0
0
Table of Contents
Table of Contents 1
Basic concepts 1
What are graphs? 1
Graph types 2
NetworkX 7
Basic concepts in NetworkX 8
Graph types in NetworkX 8
Graph Creation 9
Nodes in NetworkX 9
Edges in NetworkX 9
Basic concepts
What are graphs?
Graphs are a way to formally represent a network or a collection of interconnected objects.
In computer science, graphs are common data structures consisting of a finite set of
nodes (or vertices) and a set of edges connecting them. The main components of a graph
are:
● Nodes are structures that represent entities.
● Edges represent connections between these entities.
● Attributes are associated values belonging to either nodes or edges.
Every graph always needs to have, at the very least, one single node. Most of the graphs
we’ll be dealing with are a bit more complex.
1
Graph types
We differentiate between two types of graphs depending on whether the edges have a
direction or not: undirected graphs and directed graphs:
● In an undirected graph, nodes are connected with straight lines, and transversals
can be done in both directions - think of it as a two-way street.
● In a directed graph or digraph, nodes are connected with arrows that indicate the
orientation of the edge, and transversals can be done only in that one direction -
think of it as a one-way street.
If a graph has numerical values assigned to its edges, we call it a weighted graph.
A tree is another useful type of graph frequently used in computer science. It is a graph
that doesn't contain any cycles. A cycle is a path in a graph where the first and last vertices
are the same.
A graph that can have more than one edge between a pair of nodes is called a multigraph.
2
What are graph analytics and
algorithms?
Graphs make it possible to capture relationships and connections between entities, but
modeling data with graphs is just the start. To discover the real value, we have to step into
the domain of graph analytics. We can use those relationships and connections in data
analysis to extract information about the graph/network.
Graph analytics, also called network analysis, is the use of a graph-based approach to
analyze highly connected data. It is a set of tools that helps us understand relationships
between entities and identify values or uncover insights within the data. Graph algorithms
are a subset of those tools.
An algorithm is a set of precise instructions that outline exactly how to complete a task or
solve a problem. Specifically, graph algorithms explore the paths and distance between
nodes, clustering of nodes, and help us determine the importance of a node in a graph.
The most common problems solved by different graph algorithms:
● Finding the shortest path
● Graph coloring
● Cycle detection
● Finding influential nodes
● Community detection
3
Social networks
Social media networks such as Instagram, Spotify, and LinkedIn are relationship and
connection-driven applications. With a large number of users and enormous amounts of
interactions between them, structuring data in the form of a graph is a step forward to
easily access information. Influencer marketing is an emerging trend due to the increasing
number of social media users and increasing customer scepticism with more established
forms of marketing. Graph analytics helps identify influential individuals or communities in
social media networks.
4
Recommendation engines
Recommendation systems are unavoidable in our daily online journeys. Companies like
Netflix, Facebook, Amazon, and Linkedin leverage them to help users discover new and
relevant items (products, people, jobs, music) in order to generate a significant amount of
income.
5
Supply chain management
Graph analytics can be applied in multiple sectors, such as process engineering and supply
chain optimization. By leveraging data analytics in a graph-based form, companies can
extract valuable insights about how their processes and components are performing,
identify some of the bottlenecks, and optimize their structure to minimize the cost of all
operations.
Fraud detection
Graph analytics can be applied to financial and insurance data to fight fraud. With the
graph, data can be organized in such a way to make it easy to search for the hidden
patterns and rules that may indicate some criminal activities. Graphs are especially
popular in the domain of real-time fraud detection because they enable flexibility when
modeling transactions and speed when analyzing them.
6
NetworkX
NetworkX is a Python package for the creation, manipulation, and study of the structure,
dynamics, and functions of complex networks.
A standard programming interface and graph implementation that is suitable for many
real-world applications.
Load and store networks in standard and non-standard data formats, build and analyze
network structure, generate many types of random or classic networks, and much more!
NetworkX resources:
● Official website
● Mailing list
● Source code - GitHub
● Documentation
● NetworkX Guide
7
Basic concepts in NetworkX
Graph types in NetworkX
As mentioned before, there are multiple types of graphs. Depending on the structure of the
graph you want to represent, you can use the following NetworkX graph classes:
The left graph is a NetworkX Graph while the right one is a MultiDiGraph
8
9
Graph Creation
NetworkX graph objects can be created in one of three ways:
Nodes in NetworkX
All NetworkX graph classes allow hashable Python objects (except None) as nodes.
Hashable objects include strings, tuples, integers, and more. Nodes can be added and
manipulated by using the following methods:
● G.add_node(): Reference guide
Add a single node to graph G.
● G.add_nodes_from(): Reference guide
Add multiple nodes to graph G.
Edges in NetworkX
Edges often have data associated with them. Any Python object can be assigned as an
edge attribute. Edges can be added and manipulated by using the following methods:
● G.add_edge(): Reference guide
Add an edge between two nodes in graph G.
● G.add_edges_from(): Reference guide
Add multiple edges to graph G.
● G.add_weighted_edges_from(): Reference guide
Add multiple weighted edges to graph G.
10
● G.remove_edge(): Reference guide
Remove the edge between two nodes in graph G.
● G.remove_edges_from(): Reference guide
Remove multiple edges from graph G.
11