Biol Sistemas 1 Redes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Redes I

Biologı́a de Sistemas – BTEE1131

Alberto J. Martin
[email protected]
Königsberg’s bridges (L. Euler, 1736):

is it possible to find city walk that crosses bridges only once?

”Konigsberg bridges” by Bogdan Giuşcă - Wikimedia Commons

Edges are bridges, nodes are mainland & isles

1
Graph representation of complex systems

Objects (Nodes, Vertices), connected among them (Edges)

G = (V , E);
I V is a set of nodes (vertices)
V := {1, 2, 3, 4}
I E is a set of edges
n o
E := {1, 2}, {1, 2}0 , {1, 4}, {1, 4}0 , {1, 3}, {2, 3}, {4, 3}

2
Network types:

3
Network types:

Regular graphs

4
Network types: trees

Connected undirected network with no loops.


Forest if all network components are trees
Rooted → root has highest degree.
Nodes with only one hedge are leaves
Only one path connects all nodes. Always n-1 edges.
5
Network types: adjacency matrix

Consider undirected graph with n nodes, V = 1, . . . , n


if edge between
( nodes i and j is denoted by (i, j)
1 if exists edge (i,j)
Aij = 0 otherwise

6
Network types: adjacency list

7
Network types: weighted networks

Additional information encoded in edges


I Frequency of contact (Social)
I Data flow (internet)
I Correlation (gene expression)
I Distance/Similarity (roads)
Negative/positive values

8
Network types: directed networks

I Networks in which each edges has a direction


I WWW, metabolic networks, GRNs, . . .
(
1 if exists edge f rom j to i
Aij = 0 otherwise

9
Network types: directed acyclic networks (DAGs)

I A cycle is a closed loop of edges with edges pointing the


same way around the loop
I Networks with no cycles are acyclic

10
Paths

I Sequence of nodes such as every consecutive pair of nodes


is connected by an edge
I Directed/Undirected
I Self-avoiding paths if do not intersect, eg Geodesic &
Hamiltonian paths
I Length is the number of edges transversed
I Element Aij is 1 if there is an edge from j to i. Product
Aik Akj is 1 if there’s a path of length 2 from j to i via k, and
0 otherwise.
I The total number Nij(2) of length 2 paths from j to i is:

(2) Pn
Nij = k=1 Aik Akj = [A2 ]ij

11
Geodesic Paths

I Path between two nodes such that no shorter path exists.


Also known as shortest paths
I Necessarily self-avoiding, paths that do not intersect
I Geodesic paths are not necessarily unique
I Graph diameter is the length of the longest geodesic path
for any pair of nodes that actually exists

3 geodesic paths between nodes i and j


12
Geodesic (Shortest) paths:

A)
A B C

B)
A B E

C D

A) path A to C? B to C?
B) path A to C?
path A to E? E to C?
would it be much different with directed networks?
13
Eulerian and Hamiltonian paths:

I An Eulerian path transverses each edge in a network exactly


once
If there are nodes with more than 2 neighbors, an Eulerian
path visits them more than once
I A Hamiltonian path transverses each node exactly once
I A network can have none, one or more Eulerian and
Hamiltonian paths
I Hamiltonian paths are self-avoiding, Eulerian ones not
necessarily

14
Geodesic (Shortest) paths:

A)
A B C

B)
A B E

C D

Eulerian? Hamiltonian?

15
Geodesic (Shortest) paths:

A)
A B C

B)
A B E

C D

Eulerian? Hamiltonian?

16
Network Components:

I A network has components whenever there is no path


connecting all nodes → disconnected network
I A network which has a path connecting all nodes is
connected
I Connected groups of nodes are called components

A network composed of 2 components, A and B. Average path


length between A & B is infinite

17
Components in Directed Networks:

I One should consider the direction of edges coming in and


out from nodes
I This network has two weakly connected components (nodes
A{3} and B{3}), and 3 strongly connected components
A{1, 2, 4}, B{1, 2} & B{4}

Examples of weakly connected networks: internet and the


WWW
18
Centrality measures:

I Which node is the most important?


I The most influential?
I Key to keep network structure?
I They can be averaged to describe network as a whole

19
Degree centrality:

I Node degree is the number of edges connected to it


I Average degree in a undirected graphs is:
2m
c= n
m edges in a graph, 2 ends each, n nodes

20
Maximum edges and connectance:

I The maximum possible number of edges, Max{e} in an


undirected graph (no multi-edges nor self-edges):
Max{e} = 12 n(n − 1);
I The connectance or connectivity density ρ of a graph is the
fraction of Max{e} actually present
m 2m c
ρ= Max{e}
= n(n−1)
= n−1
I ρ lies strictly in 0 ≤ ρ ≤ 1
I Most interesting networks are large, so ρ = c
n
I For some very large networks ρ = m
n2
; 0 ≤ ρ ≤ 1/2
I Dense network: its density tends to constant as n → ∞
Sparse network (WWW): ρ → 0 as n → ∞

21
Maximum edges and connectance:

A B C
A 0 1 1
B 1 0 0
C 1 0 0

I m?
I c?
I ρ?

22
Maximum edges and connectance:

A B C
A 0 1 1
B 1 0 0
C 1 0 0

I m=2
I c?
I ρ?

23
Maximum edges and connectance:

A B C
A 0 1 1
B 1 0 0
C 1 0 0

I m=2
I c= 2×2
3
I ρ?

24
Maximum edges and connectance:

A B C
A 0 1 1
B 1 0 0
C 1 0 0

I m=2
I c= 2×2
3
I ρ 2
=3

25
Degree in directed graphs

I Nodes have 2 degrees:


In-degree: number of in-going edges
Out-degree: number of out-going edges
I The number of edges m is equal to the sum of in-degrees or
out-degrees for all vertices

26
Closeness centrality

I Measures the mean distance from a node to the other nodes


I if dij is the length of the geodesic path i to j, then the mean
geodesic distance from node i to all other nodes, li is:
li = n1 j dij
P

I The influence of nodes on themselves is discarded (usually)


1 P
li = n−1 j,i dij ;
I Closeness centrality, Ci is the inverse of li
1 Pn ;
Ci = li =
j dij

27
Betweenness centrality

I Measures the extent to which a node lies on shortest paths


among other nodes
I Let nist be 1 if node i lies on geodesic path from s to t, and 0
otherwise. Then betweenness centrality xi is given by:
xi = st nist ;
P

I A more general definition: nist is the number of geodesic


paths from s to t that pass through i and gst is the total of
geodesic paths from s to t, then betweenness centrality of
node i is:
P ni
xi = st g st ;
st

28
Transitivity and clustering coefficient

I Transitivity:
Perfect transitivity: The friend of my friend is my friend
Partial transitivity: The friend of my friend can be my friend
I If three nodes, u, v and w are connected by a path forming a
clique (complete graph), they are a closed triad
I Clustering coefficient is the fraction of path length 2 in a
network that are closed
N
C = N cp ;
pl2
Ncp is the number of closed paths; Npl2 is the number of
closed paths of length two
I For undirected graphs Ci = 2ei , where ki is the number
ki (ki −1)
of neighbors of i and ei is the number of connected
neighbors of i; for directed Ci = k (k2e−1) ; Cavg = n1 ni=1 Ci ;
i
P
i i
I C = 1 reflects perfect transitivity, ie all network components
are cliques; C = 0 → no closed triads 29
Eigenvector centrality

I Measure of the importance of a node according to the


importance of its neighbors
I Gives each node a score proportional to the sum of scores of
its neighbors
I Can be computed:
xi0 = j Aij xj ;
P

where Aij is an element of the adjacency matrix, and j the


index of node i neighbors
The equation is recursive
I if ki are the eigenvalues of A, and k1 is the largest of them,
then the eigenvector centrality, xi , is proportional to the
sum of centralities of i’s neighbors:
xi = k1−1 j Aij xj ;
P

30
Centralities

Examples of A) Betweenness centrality, B) Closeness centrality, C) Eigenvector centrality, D) Degree centrality of the
same graph.

by Tapiocozzo Wikimedia Commons 31


Node centralities

I Degree: number of conections


Indegree Outdegree in directed networks
I Closeness: average distance to all other nodes
1 Pn ;
Ci = li =
j dij

I Betweenness: shortest paths traversing a node/all shortest


paths
P ni
xi = st g st ;
st
I Clustering coefficient:
Ncp
C= Npl2 ;
I Eigenvector centrality: xi = k1−1
P
j Aij xj ;

32
Network visualization and Analysis

Cytoscape
Centrifuge, Commetrix (dynamic nets), Gephi,. . .

33
Cytoscape

http : //www.cytoscape.org/

I Network creation: direct connection to online repositories


I Characterization and analysis
I Many available modules as plugins
I High quality pictures

34
Network file formats: Simple interaction file (.sif)

I source node, a relationship type (or edge type), and one or


more target nodes
source node1 interaction type1 target node2
source node3 interaction type2 target node4 target node5
...
I columns separated by space or tabs
I duplicate entries are ignored
multiple edges between the same nodes must have different
edge types.

http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats

35
sif node and edge attributes

I tsv or csv files that contain attributes:


nodes.tsv
node color value
source node1 red 4
target node2 green 2
source node3 red 3
target node4 green 0

edges.tsv
edge color thickness
interaction type1 red 0.5
interaction type2 green 0.3
interaction type3 red 1.5

36
Network file formats: XGMML (eXtensible Graph Markup
and Modeling Language)

I XML extension of of GML


I detailed network file format
edge and node attributes (color, shape, . . . )
also keeps layout
I extandard export file format in Cytoscape

37
Network attributes

<?xml version ="1.0" encoding ="UTF -8" standalone ="yes"?>


<graph id="18762" label="25249" directed="1" cy:documentVersion ="3.0" xmlns:dc="http
:// purl.org/dc/elements /1.1/" xmlns:xlink="http :// www.w3.org /1999/ xlink" xmlns:
rdf="http :// www.w3.org /1999/02/22 - rdf -syntax -ns#" xmlns:cy="http :// www.
cytoscape.org" xmlns="http :// www.cs.rpi.edu/XGMML">
<att name=" networkMetadata">
<rdf:RDF>
<rdf:Description rdf:about="http :// www.cytoscape.org/">
<dc:type >Protein -Protein Interaction </dc:type >
<dc:description >N/A</dc:description >
<dc:identifier >N/A</dc:identifier >
<dc:date >2017 -09 -13 19:30:19 </dc:date >
<dc:title >25249 </dc:title >
<dc:source >http :// www.cytoscape.org/</dc:source >
<dc:format >Cytoscape -XGMML </dc:format >
</rdf:Description >
</rdf:RDF>
</att>
<att name="shared name" value="netf1_mirtf.tsv" type="string" cy:type="String"/>
<att name="name" value="netf1_mirtf.tsv" type="string" cy:type="String"/>
<att name="selected" value="1" type="boolean" cy:type="Boolean"/>
<att name=" networkMetadata" type="string" cy:type="String"/>
<att name="__Annotations" type="list" cy:type="List" cy:elementType="String">
</att>
<att name=" layoutAlgorithm" value="Prefuse Force Directed Layout" type="string" cy:
type="String" cy:hidden="1"/>

38
graphics attributes

<graphics >
<att name="NETWORK_WIDTH" value="1451.0" type="string" cy:type="String"/>
<att name=" NETWORK_CENTER_Y_LOCATION " value=" -1597.8129075260113" type="string"
cy:type="String"/>
<att name=" NETWORK_CENTER_Z_LOCATION " value="0.0" type="string" cy:type="String"/
>
<att name="NETWORK_DEPTH" value="0.0" type="string" cy:type="String"/>
<att name=" NETWORK_CENTER_X_LOCATION " value=" 872.8819677113453 " type="string" cy:
type="String"/>
<att name="NETWORK_TITLE" value="" type="string" cy:type="String"/>
<att name=" NETWORK_BACKGROUND_PAINT " value="#FFFFFF" type="string" cy:type="
String"/>
<att name=" NETWORK_EDGE_SELECTION " value="true" type="string" cy:type="String"/>
<att name=" NETWORK_HEIGHT" value="626.0" type="string" cy:type="String"/>
<att name=" NETWORK_SCALE_FACTOR " value=" 0.27021982416519186 " type="string" cy:
type="String"/>
<att name=" NETWORK_NODE_SELECTION " value="true" type="string" cy:type="String"/>
</graphics >

39
node attributes
<node id="18861" label="nhr -2">
<att name="shared name" value="nhr -2" type="string" cy:type="String"/>
<att name="name" value="nhr -2" type="string" cy:type="String"/>
<att name="selected" value="0" type="boolean" cy:type="Boolean"/>
<att name="Node_type" value="NTF" type="string" cy:type="String"/>
<att name="Node_color" value="#FFFFFF" type="string" cy:type="String"/>
<graphics w="75.0" h="35.0" width="2.0" x=" 265.03131103515625 " type="
ROUND_RECTANGLE " fill="#FFFFFF" outline="#000000" z="0.0" y="
8.277727127075195 ">
<att name=" NODE_TRANSPARENCY " value="255" type="string" cy:type="String"/>
<att name="NODE_SELECTED" value="false" type="string" cy:type="String"/>
<att name=" NODE_LABEL_TRANSPARENCY " value="255" type="string" cy:type="String"/
>
<att name="NODE_DEPTH" value="0.0" type="string" cy:type="String"/>
<att name=" NODE_LABEL_COLOR" value="#000000" type="string" cy:type="String"/>
<att name="NODE_TOOLTIP" value="" type="string" cy:type="String"/>
<att name=" NODE_BORDER_TRANSPARENCY " value="255" type="string" cy:type="String"
/>
<att name=" NODE_SELECTED_PAINT " value="#FFFF00" type="string" cy:type="String"/
>
<att name=" NODE_LABEL_POSITION " value="C,C,c ,0.00 ,0.00" type="string" cy:type="
String"/>
<att name=" NODE_CUSTOMGRAPHICS_SIZE_3 " value="35.0" type="string" cy:type="
String"/>
<att name=" NODE_BORDER_STROKE " value="SOLID" type="string" cy:type="String"/>
<att name=" COMPOUND_NODE_PADDING " value="10.0" type="string" cy:type="String"/>
<att name=" COMPOUND_NODE_SHAPE " value=" ROUND_RECTANGLE " type="string" cy:type="
String"/>
<att name=" NODE_LABEL_FONT_SIZE " value="12" type="string" cy:type="String"/>
<att name=" NODE_LABEL_FONT_FACE " value="SansSerif.plain ,plain ,12" type="string"
cy:type="String"/>
<att name=" NODE_LABEL_WIDTH" value="200.0" type="string" cy:type="String"/>
<att name="NODE_VISIBLE" value="true" type="string" cy:type="String"/>
</graphics > 40
</node >
edge attributes
<edge id="19" label="cog1 -mir70" source="187" target="189" cy:directed="1">
<att name="shared name" value="cog1 -mir70" type="string" cy:type="String"/>
<att name="shared interaction" value="TF -gene" type="string" cy:type="String"/>
<att name="name" value="cog1 -mir70" type="string" cy:type="String"/>
<att name="selected" value="0" type="boolean" cy:type="Boolean"/>
<att name="interaction" value="TF -gene" type="string" cy:type="String"/>
<att name="Color" value="#3399 FF" type="string" cy:type="String"/>
<att name="Color2" value="grey" type="string" cy:type="String"/>
<graphics fill="#CCCCCC" width="2.0">
<att name=" EDGE_STROKE_SELECTED_PAINT " value="#FF0000" type="string" cy:type="
String"/>
<att name="EDGE_SELECTED" value="false" type="string" cy:type="String"/>
<att name=" EDGE_LABEL_COLOR" value="#000000" type="string" cy:type="String"/>
<att name=" EDGE_LABEL_WIDTH" value="200.0" type="string" cy:type="String"/>
<att name=" EDGE_TRANSPARENCY " value="255" type="string" cy:type="String"/>
<att name=" EDGE_SOURCE_ARROW_UNSELECTED_PAINT " value="#3399 FF" type="string" cy
:type="String"/>
<att name=" EDGE_LINE_TYPE" value="SOLID" type="string" cy:type="String"/>
<att name="EDGE_BEND" value="" type="string" cy:type="String"/>
<att name=" EDGE_TARGET_ARROW_UNSELECTED_PAINT " value="#CCCCCC" type="string" cy
:type="String"/>
<att name=" EDGE_LABEL_TRANSPARENCY " value="255" type="string" cy:type="String"/
>
<att name="EDGE_LABEL" value="" type="string" cy:type="String"/>
<att name=" EDGE_SOURCE_ARROW_SELECTED_PAINT " value="#FFFF00" type="string" cy:
type="String"/>
<att name="EDGE_TOOLTIP" value="" type="string" cy:type="String"/>
<att name="EDGE_CURVED" value="true" type="string" cy:type="String"/>
<att name=" EDGE_LABEL_FONT_SIZE " value="10" type="string" cy:type="String"/>
<att name=" EDGE_TARGET_ARROW_SHAPE " value="ARROW" type="string" cy:type="String
"/>
<att name=" EDGE_LABEL_FONT_FACE " value="Dialog.plain ,plain ,10" type="string" cy
:type="String"/>
<att name=" EDGE_SOURCE_ARROW_SHAPE " value="NONE" type="string" cy:type="String" 41
/>
Example: adjacency matrix

A B E

C D

42
Example: adjacency matrix

A B C D E
A B A
E
B
C
C D
D
E

42
Node centralities

I Degree: number of conections


Indegree Outdegree in directed networks
I Closeness: average distance to all other nodes
1 Pn ;
Ci = li =
j dij

I Betweenness: shortest paths traversing a node/all shortest


paths
P ni
xi = st g st ;
st
I Clustering coefficient:
Ncp
C= Npl2 ;
I Eigenvector centrality: xi = k1−1
P
j Aij xj ;

43
Global properties of networks

I characterize the network as a whole


say something about what is represented in the network
I consequence of the properties of components
complex systems: more than the sum of all their parts. . .

44
Small-world networks

I Most nodes are not neighbors of one another, but most


nodes can be reached from every other by a small number
of steps.
I The typical distance, L, between two randomly chosen
nodes grows proportionally to the log of the number of
nodes, N, in the network
L ∝ log N
I Tendency to have highly connected components (high
clustering coefficient)
I Components are connected by hubs (high degree nodes)
I Examples: neural networks, PPIs, transcription networks

45
The large scale structure of networks

n is the number of nodes; m is the number of edges; z is mean degree; l is mean geodesic distance; α is exponent of

degree distribution if it follows power law; C (1) is clustering coefficient; C (2) is clustering coefficient; r degree

correlation coefficient. From Newman “The structure and function of complex networks” (2010) 46
Degree distribution

I pk is the fraction of nodes in a network with degree k


pk is the probability that a node chosen randomly has
degree k
I The histogram of pk is the degree distribution of a network
Plot cumulative distribution function (probability that a
degree is ≥ k)

47
Degree distribution

I Many of these networks follow a power law Pk ∼ k −α


−k
other follow an exponential distribution Pk ∼ e λ
48
Scale-free networks

I Its degree distribution follows a power law


P (k) ∼ k −γ
I Preferential attachment and hierarchy: Hubs are connected
to smaller degree nodes forming subgraphs;
Subgraphs are connected by connections among hubs
I 2<γ <3
I Fault tolerant
I Examples: WWW, co-authorship,PPIs, . . .

49
Error and attack tolerance in complex networks

50
Self similarity of scale-free networks

51
Null models

I Random graphs
networks generated by a random process
I used to answer questions about networks
motifs/graphlets
- same global properties of real networks
- is this feature random?
I several types depending on how they are made
also different properties. . .

52
Erdös–Rényi model [1]

I All edges have the same probaility to exists


uniform random graph
I G(n, p); p probability; n number of nodes
I properties:
- If np < 1, then a graph in G(n, p) will almost surely have no
connected components of size larger than log(n)
- If np = 1, then a graph in G(n, p) will almost surely have a
largest component whose size is of order n2/3 .
- If np → c > 1, where c is a constant, then a graph in G(n, p)
will almost surely have a unique giant component
containing a positive fraction of the vertices.
No other component will contain more than log(n) vertices.

[1] Erdös, P. & Rényi, A. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5: 17–61

(1960) 53
Barabási–Albert model [1]

I Random scale-free graphs


preferential attachment (power law degree distribution)
I pi = Pki
j kj

pi probability of a new node connected to node i


ki degree of node i

[1] Réka, A. & AL, Barabási, Reviews of Modern Physics. 74 (1): 47–97 (2002)

54
Watts–Strogatz model [1]

I small world networks


n nodes and mean degree k
I algorithm:
- build regular network: each node ni connect k neighbours
- for each node i (n0 , n1 , . . . , nn−1 ), rewire each edge (ni , nj ) to
(ni , nk )
if i ≤ j, (ni , nk ) does not exists and i , k with prob β
I regular lattice (β = 0) and a random graph (β = 1)
- average path lenght falls as β increases

[1] Watts, DJ & Strogatz, SH, Nature. 393 (6684): 440–442 (1998)

55
I traditional null models reproduce global properties of real
nets
I you might be interested on reproducing local proporties

56
Local properties:

I Charasteristics of the network components


nodes, edges, and subsets of them
I node centralities:
how central is each node
but we already talked about this
I subsets of nodes and edges

57
Network graphlets

I Small subgraphs: few nodes from original network.


Network building blocks.
Induced: also keep all connections.
Characterized by number of nodes and connections.
I Motifs: if statistically over-represented in real networks [1]
Depend on Null model used to determine
over-representation

[1] Milo et al. Science. 298(5594), (2002)

58
Network graphlets in undirected networks

Przulj N. Bioinformatics. 23(2), (2007)

59
Graphlet based metrics

I Graphlet frequency[1]: counts of graphlet occurrence


I Graphlet degree[2]: number of graphlets in which a node
participates (or in which orbit)
I Graphlet correlation[3]: spearman rank correlation between
all orbits for all network graphlets
dependencies between orbits in the network

[1] Przulj et al. Bioinformatics. 20(18), (2004); [2] Milenković & Przulj. Cancer Informatics 6, (2008); [3] Yaveroğlu

et al. Sci. Reports 4, (2014)

60
Graphlet based metrics: directed networks

Sarajlić et al. Scientific Reports. 6(1), (2016)


61
3 nodes graphlets in directed networks

Milo et al. Science. 298(5594), (2002)


62
Network Comparison

I In PPI networks evolutionary insights from similarities


between networks of different species [1]
I Identification of therapeutic targets: disease GRN Vs non
disease [2,3]
I Types:
- Alignment based
- Alignment free
Network global/local properties
Graphlet based methods

[1] Singh R et al. PNAS 105(35) (2008); [2] Gaiteri et al. Genes, brain, and behavior. 13(1) (2014); [3] de la Fuente

,Trends Genet. 26(7):326-33, (2010);

63
Alignment based

I Only for non directed networks


I Aim to map nodes in a network to nodes in another
network maximizing topological and biological similarity
Identification of homologous proteins and subnetworks
Knowledge transfer by homology → complements sequence
similarity
Uncovering of large subnetworks in diverse species;
Reconstruction of phylogenetic relationships
I Pairwise (two nets) and global (many nets, similar to MSA)
I local (best matching subnets) Vs global (entire nets)

Clark and Kalita, Bioinformatics 30(16) (2014)

64
Alignment based methods

I Exact solution is intractable: all methods use


approximations
First compute local topology comparison for all pairs of
nodes
Build weighted bipartite graph (V1 → V2 ) and maximize
weights
I Methods mainly differ on how they perform these steps

65
Alignment free

I Comparison of network global properties


Degree distribution, centrality distribution, shortest paths,
...
I Graphlet based: more accurate than network properties
I Graphlet frequency
I Graphlet degree
I Graphlet correlation

Do not provide as much information as alignment based


but they are faster

66
Directed Networks

I No available solution for alignment based


treat networks as undirected → loss of information
I Alignment free methods are easier to use:
global properties
Graphlet based

67
Realizations of the same network

I Networks depicting the same system under different


conditions
same nodes different connectivity/centrality between them
I Variations in node centrality
relationships between nodes change
I Variations in graphlets [1]
includes neighbours in 2nd grade for graphlets of size 3

[1] Martin et al. PLOS One 2016

68

You might also like