WSC Week1 Graphs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 104

IOTA5001:

Social and Web Computing


Gareth Tyson ([email protected])
ABOUT ME

● Research interests
■ Social computing
■ Internet measurements
■ Web computing

● Contact me
○ Email: [email protected]

2
STRUCTURE OF SESSIONS

● One lecture per week (two hours)


○ Learn fundamental concepts of social and web
computing

● One research seminar per week (one hour)


○ Learn research aspects of web & social computing
○ Delivered by a mix of you and me!

3
OVERVIEW OF COURSE

1. Introduction to social graphs


2. Small World
3. Centrality and social influence Social network
4. Social communities analysis
5. Information dissemination and cascades
6. Viral information and epidemic spread
7. Web search
8. Online advertising
9. Online privacy Web applications
10. Decentralisation in the web
11. Online messaging
12. Data task presentations! Wrapping up
13. Revision & discussions
4
ASSESSMENT

1. Research Paper Presentation: Each week will involve a


research paper presentation by a student. [10% of
course]

2. Literature Review: A critical literature review of research


of your chosen sub-topic. [25% of course]

3. Data Science Task: A data science task, performing a


statistical analysis of an existing Web & Social
Computing dataset. You will be required to write this as
a short report. [50% of course]

5
ASSESSMENT

4. Data Science Presentation: A live presentation,


providing an overview of your data science task. You
will be asked to present your methodology and findings,
following by a Q&A session. [15% of course]

6
TIMELINE (NOT TO SCALE!)

5th Sept 12th Sept 3rd Oct 4th Nov 28th Nov 9th Dec

[CW1] List of [CW2] List of [CW3] Data task [CW2] [CW3]


papers literature topics specification Deadline Deadline
announced announced announced
[CW4] Student
data task
presentation
A BRIEF HISTORY OF THE
WEB…
A BRIEF HISTORY OF THE WEB

Leiner, Barry M., et al. "A brief history of the Internet." ACM SIGCOMM Computer Communication Review (2009)
A BRIEF HISTORY OF THE WEB

Leiner, Barry M., et al. "A brief history of the Internet." ACM SIGCOMM Computer Communication Review (2009)
A BRIEF HISTORY OF SOCIAL MEDIA

11
A BRIEF HISTORY OF SOCIAL MEDIA

What links all these


things together?

12
WEEK 1: NETWORKS, RANDOM
GRAPHS AND METRICS
NETWORKS ARE EVERYWHERE

14
NETWORKS ARE EVERYWHERE

A node is a person

A link means that the


two people are
friends

15
NETWORKS ARE EVERYWHERE

A node is a Twitter
account holder

A link means that


Lisa follows Homer

16
NETWORKS ARE EVERYWHERE

A node is a YouTube
publisher

A link means that


Lisa has watched a
video of Homer

17
NETWORKS ARE EVERYWHERE

18
FACEBOOK FRIENDSHIP NETWORK

19
TERRORIST NETWORK

“SIX DEGREES OF
MOHAMMED ATTA”

UNCLOAKING
TERRORIST
NETWORKS, BY
VALDIS KREBS

20
CRIMINAL NETWORK

21
THE INTERNET

Routers/computers
are the nodes

Cables are the links

22
https://www.wired.com/story/opte-internet-map-visualization/
THE INTERNET
AIRLINE NETWORK

24
METRO NETWORK

25
WHAT CAN NETWORKS REPRESENT?

• Who talks to whom?


• Who is friend with whom?
• What leads to what?
• Who is a relative of whom?
• Who eats whom?
• Who sends messages to whom?

27
IN THIS LECTURE

I will introduce:

• Networks/graphs
• Basic network measures
• Random Graphs

28
A NETWORK IS A GRAPH

A graph G is a tuple (V,E) of a set of vertices V and edges E. An edge in E


connects two vertices in V.

A neighbour set N(v) is the set of vertices adjacent to v:

29
A NETWORK IS A GRAPH
links or edges (E) A
vertices or nodes (V)

E B

D c
A graph G is a tuple (V,E) of a set of vertices V and edges E. An edge in E
connects two vertices in V.

A neighbour set N(v) is the set of vertices adjacent to v:

30
A NETWORK IS A GRAPH
links or edges (E) A
vertices or nodes (V)

E B
neighbour set N(B)

D c
A graph G is a tuple (V,E) of a set of vertices V and edges E. An edge in E
connects two vertices in V.

A neighbour set N(v) is the set of vertices adjacent to v:

31
A NETWORK IS A GRAPH
links or edges (E) A
vertices or nodes (V)

E B
neighbour set N(B)

D c
A graph G is a tuple (V,E) of a set of vertices V and edges E. An edge in E
connects two vertices in V.

A neighbour set N(v) is the set of vertices adjacent to v:

32
A NETWORK IS A GRAPH
G=
V
E

A graph G is a tuple (V,E) of a set of vertices V and edges E. An edge in E


connects two vertices in V. E A

c B
A neighbour set N(v) is the set of vertices adjacent to v: D

33
HOW TO MEASURE A GRAPH

34
NODE DEGREE

The node degree (k) is the number of neighbours of a node

E B
The study of the degree
distribution of networks
allows the classification of D c
networks in different
categories

35
NODE DEGREE

The node degree (k) is the number of neighbours of a node


E.g., Degree of A is 2

E B
The study of the degree
distribution of networks
allows the classification of D c
networks in different
categories

36
DIRECTED & UNDIRECTED GRAPHS
A
A

E B
E B

D C
D C

Undirected Directed
Graph Graph

Example of Undirected Graphs: Facebook, Co-presence

Examples of Directed: Twitter, Email, Phone Calls

37
DIRECTED & UNDIRECTED GRAPHS
A
A

E B
E B

D C
D C

Undirected Directed
Graph Graph
What is in-degree and out-
degree?
Example of Undirected Graphs: Facebook, Co-presence

Examples of Directed: Twitter, Email, Phone Calls

38
PATHS AND CYCLES A

E B

D C

A path is a sequence of nodes in which each pair of


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction. A
E.g. A-E-D is a path in both previous graphs
E B

D C

39
PATHS AND CYCLES A

E B

D C

A path is a sequence of nodes in which each pair of


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction. A
E.g. A-E-D is a path in both previous graphs
E B

D C

40
PATHS AND CYCLES A

E B

D C

A path is a sequence of nodes in which each pair of


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction. A
E.g. A-E-D is a path in both previous graphs
E B

D C

41
PATHS AND CYCLES A

E B

D C

A path is a sequence of nodes in which each pair of


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction. A
E.g. A-E-D is a path in both previous graphs
E B

D C

42
PATHS AND CYCLES A

E B

D C

A path is a sequence of nodes in which each pair of


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction. A
E.g. A-E-D is a path in both previous graphs
E B

D C

43
PATHS AND CYCLES A

E B

A path is a sequence of nodes in which each pair of D C


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction.
E.g. A-E-D is a path in both previous graphs
A
A cycle is a path where the start node is also the
end node E B
E.g. E-A-B-C is a cycle in the undirected graph
D C

44
PATHS AND CYCLES A

E B

A path is a sequence of nodes in which each pair of D C


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction.
E.g. A-E-D is a path in both previous graphs
A
A cycle is a path where the start node is also the
end node E B
E.g. E-A-B-C is a cycle in the undirected graph
D C

45
PATHS AND CYCLES A

E B

A path is a sequence of nodes in which each pair of D C


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction.
E.g. A-E-D is a path in both previous graphs
A
A cycle is a path where the start node is also the
end node E B
E.g. E-A-B-C is a cycle in the undirected graph
D C

46
PATHS AND CYCLES A

E B

A path is a sequence of nodes in which each pair of D C


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction.
E.g. A-E-D is a path in both previous graphs
A
A cycle is a path where the start node is also the
end node E B
E.g. E-A-B-C is a cycle in the undirected graph
D C

47
PATHS AND CYCLES A

E B

A path is a sequence of nodes in which each pair of D C


consecutive nodes is connected by an edge.
If graph is directed the edge needs to be in the right direction.
E.g. A-E-D is a path in both previous graphs
A
A cycle is a path where the start node is also the
end node E B
E.g. E-A-B-C is a cycle in the undirected graph
D C

48
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

49
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

50
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

51
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

52
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

53
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

54
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

55
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.

E B

D C

56
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.
The diameter of the graph is the maximum distance between any pair of its nodes.

E B

D C

57
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.
The diameter of the graph is the maximum distance between any pair of its nodes.

What’s the diameter?


E B

D C

58
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.
The diameter of the graph is the maximum distance between any pair of its nodes.

d(A,B) = 1
A d(A,C) = 2
d(B,D) = 3
E B

D C

59
PATH LENGTH/DISTANCE
The distance (d) between two nodes in a graph is the length of the shortest path linking
the two graphs.
The diameter of the graph is the maximum distance between any pair of its nodes.

d(A,B) = 1
A d(A,C) = 2
d(B,D) = 3
E B

D C

60
CONNECTIVITY

A graph is connected if there is a path between each pair of nodes.

Example of disconnected graph:

A A

E B E B

D C D C

61
COMPONENTS

A connected component of a graph is the subset of


nodes for which each of them has a path to all others (and
the subset is not part of a larger subset with this property).
Connected components: A-B-C and E-D

E B

D C

62
COMPONENTS

A connected component of a graph is the subset of


nodes for which each of them has a path to all others (and
the subset is not part of a larger subset with this property).
Connected components: A-B-C and E-D
A

A giant component is a connected component E containing


B
a significant fraction of nodes in the network.
Real networks often have one unique giant component:
D largest
C
connected component

63
WHAT DO REAL GRAPHS LOOK
LIKE?

64
SMALL-WORLD PHENOMENON
MILGRAM’S EXPERIMENT
Two random people are connected through
only a few (6) intermediate acquaintances.

Milgram’s experiment (1967) shows the


known “six degrees of separation”:

1. Choose 300 people at random


2. Ask them to send a letter through friends to
a stockbroker near Boston.
3. 64 successful chains.

http://www.bbc.co.uk/news/technology-15844230
65
RECENT REMAKE
• In 2003 the experiment was redone.

• Choice of 18 targets over the world. Choice of senders


(60K) from a commercially obtained email list.

• Website to control the “email” contact from one


participant to the next (two weeks to choose next hop)

• Verification of relationship by receiver (to avoid cheating


by web search).

66
FINDINGS

• Median of 5-7 steps

• Use of “weak ties” and professional relationships

• “Network structure alone is not everything”


• Some different incentives had a high impact on
completion rate of chains
• If the target was in a prominent place (e.g. professor)

67
IT’S STILL NOT RESOLVED…

And the Facebook study that shows the average distance in 2008 was 5.28 hops, while
in 2011 it was 4.74.

68
https://www.facebook.com/notes/facebook-data-team/anatomy-of-facebook/10150388519243859
ERDOS NUMBER

Calculated using a co-authorship graph


69
ERDOS NUMBER

70
ERDOS NUMBER

71
ERDOS NUMBER

72
ERDOS NUMBER

73
ERDOS NUMBER

74
ERDOS NUMBER

Erdos Number: distance from the


mathematician Paul Erdos

75
ERDOS NUMBER
Let’s have a Yehia
number...

76
ERDOS NUMBER

Erdos Number: distance from the


mathematician Paul Erdos
(LEGEND!)

(most people are 4-5 hops away)


based on collaboration.

77
BACON NUMBER

A network of actors who co-starred in


a movie.

Most actors are no more than ~3 hops


from Kevin Bacon.

78
https://oracleofbacon.org/
BACON NUMBER

A network of actors who co-starred in


a movie.

Most actors are no more than ~3 hops


from Kevin Bacon.

79
http://www.buzzfeed.com/jtes/17-people-with-surprising-bacon-numbers#.cf6zK3ope https://oracleofbacon.org/
DUNBAR NUMBER
British anthropologist Robin Dunbar theorised that there’s a limit
to the amount of relationships a person can maintain—roughly
150.

80
CAN WE MODEL THESE GRAPHS?

81
MODELLING GRAPHS

• Can we auto-generate a graph that “looks” like a real graph?

• Lets scientists/nerds have fun playing with graphs that match


certain characteristics

• E.g. we can understand how graphs evolve over time

82
RANDOM GRAPHS

First way to model these networks: Erdos-


Renyi Random Graph [Erdos-Renyi ‘59]:

• G(n,p): graph with n vertices where an edge


exists with independent random probability
0<p<1 for each edge.

• For each node n1, an edge to node n2


exists with probability p.

83
RANDOM GRAPH MODEL

84
DEGREE DISTRIBUTION
OF RANDOM GRAPHS

Is this how a typical social network works? 85


CLUSTERING COEFFICIENT

A B

E C D F

The clustering coefficient defines the proportion of A’s neighbours which are
connected by an edge (are friends).

The number of triangles in which A is involved wrt to the ones it could be involved
in.

86
FORMALLY: CLUSTERING
COEFFICIENT

Local Clustering
Coefficient

Network Clustering
Coefficient

87
FORMALLY: CLUSTERING
COEFFICIENT

Local Clustering Proportion of my friends who are also friends


Coefficient with my other friends...

Network Clustering The average all the node’s local clustering


coefficients
Coefficient

88
CLUSTERING COEFFICIENT:
EXAMPLE

89
CLUSTERING COEFFICIENT:
EXAMPLE

90
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

91
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

92
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

93
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

94
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

95
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

96
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

97
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

98
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

99
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

100
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

101
CLUSTERING COEFFICIENT:
EXAMPLE
Degree = 4

Links between
neighbours = 1

Fraction of possible interconnections


between my neighbour!

102
CLUSTERING COEFFICIENT
OF A RANDOM GRAPH
The clustering coefficient of a random graph is:

The probability that 2 neighbours of a node are connected


is equal to the probability that 2 random nodes are
connected

Is this mirroring the clustering coefficient of real networks?


103
QUESTION & SUMMARY

• Graphs are everywhere!

• We know how to describe them

• We have lots of ways to measure them

• Random graphs are one way of modelling


them...but we need to improve!

104
REFERENCES
• Material from Chapter 1, 2 of
• D. Easley, J. Kleinberg. Networks, Crowds, and Markets:
Reasoning About a Highly Connected World. Cambridge
University Press.
• P. Sheridan Dodds, R. Muhamad, and D. J. Watts. An
Experimental Study of Search in Global Social
Networks. Science 8. August 2003: 301 (5634), 827-
829.
• R. Albert, A. Barabasi. Statistical Mechanics of
Complex Networks. Reviews of Modern Physics (74).
Jan. 2002.
• S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U.
Hwang, Complex Networks: Structure and Dynamics
Physics Reports 424 (2006) 175 .

You might also like