SNA Unit2
SNA Unit2
SNA Unit2
UNIT II
The GraphML format represents an advancement over the previously mentioned formats
in terms of both interoperability and extensibility.
GraphML originates from the information visualization community where a shared
format greatly increases the usability of new visualization methods.
GraphML is therefore based on XML with a schema defined in XML Schema. This has
the advantage that GraphML files can be edited, stored, queried, transformed etc. using
generic XML tools.
Common to all these generic graph representations is that they focus on the graph
structure, which is the primary input to network analysis and visualization.
Attribute data when entered electronic form is typically stored separately from network
data in Excel sheets, databases or SPSS tables.
1
SIT1610 – Social Network Analysis – Unit IV
2
SIT1610 – Social Network Analysis – Unit IV
In research:
Analyzing Wikipedia conflicts (PARC)
Natural language processing (CMU)
Bioinformatics (Maryland)
Particle physics (Nebraska)
Ocean climate simulation (Washington)
2. Cost-efficiency:
Commodity nodes (cheap, but unreliable)
Commodity network
Automatic fault-tolerance (fewer admins)
Easy to use (fewer programmers)
3
SIT1610 – Social Network Analysis – Unit IV
Node specs (Yahoo! terasort): 8 x 2.0 GHz cores, 8 GB RAM, 4 disks (= 4 TB?)
Challenges
Cheap nodes fail, especially if you have many
- Mean time between failures for 1 node = 3 years
- MTBF for 1000 nodes = 1 day
- Solution: Build fault-tolerance into system
Commodity network = low bandwidth
- Solution: Push computation to the data
Programming distributed systems is hard
- Solution: Users write data-parallel “map” and “reduce” functions, system handles
work
distribution and faults
Hadoop Components:
Distributed file system (HDFS)
- Single namespace for entire cluster
- Replicates data 3x for fault-tolerance
MapReduce framework
- Executes user jobs specified as “map” and “reduce” functions
- Manages work distribution & fault-tolerance
4
SIT1610 – Social Network Analysis – Unit IV
Reduce function:
5
SIT1610 – Social Network Analysis – Unit IV
6
SIT1610 – Social Network Analysis – Unit IV
Drawbacks:
7
SIT1610 – Social Network Analysis – Unit IV
8
SIT1610 – Social Network Analysis – Unit IV
For example, the SIOC (Semantically Enabled Online Communities) project aims
at connecting discussions across various types of for a Usenet, discussion boards, blogs,
mailing lists etc by exposing the postings according to a shared ontology.
The key concepts of this ontology are the sioc:User account that is used to create a
sioc:Post, which is part of a sioc:Forum at a certain sioc:Site. A sioc: User is not a
subclass of foaf :Person (as a person may have multiple accounts), but related to the
description of a person using the sioc:account of property. While FOAF has a rich
ontology for characterizing individuals—especially with respect to their online
presence—, but it is rather poor as a vocabulary for describing relationships.
(1) To support the automated integration of social information on a semantical basis and
(2) To capture established concepts in Social Network Analysis.
9
SIT1610 – Social Network Analysis – Unit IV
student. Both the relationship and the roles may be limited in their interpretation
and use to a certain social context.
Ideally, all users of all these services would agree to a single shared typology of social
relations and shared characterizations of relations. However, this is neither feasible nor
necessary. What is required from such a representation is that it is minimal in order to
facilitate adoption and that it should preserve key identifying characteristics such as the
case of identifying properties for social individuals.
Conceptual model
Social relations could be represented as n-ary predicates; however, n-ary relations are
not supported directly by the RDF/OWL languages. There are several alternatives to n-
ary relations in RDF/ OWL
In all cases dealing with n-ary relations we employ the technique that is known as
reification: we represent the relation as a class, whose instances are concrete relations of
that type.
One may recall that RDF itself has a reified representation of statements: the rdf
:Statement object represents the class of statements.
This class has three properties that correspond to the components of a statement,
namely rdf: subject, rdf :predicate, rdf :object.
These properties are used to link the statement instance to the resources involved in the
statement.
In other words relationships become subclasses of the rdf :Statement class. Common
is that the new Relationship class is related to a general Parameter class by the
hasParameter relationship. Relationship types such as Friendship are subclasses of the
Relationship class, while their parameters (such as strength or frequency) are subtypes of
the Parameter class.
Two alternatives:
The first scheme borrows from the design of OWL-S for representing service
parameters, as used in the specification of the profile of a Web Service. Here, parameters
are related by the valued-by metaproperty to their range. For example in an application
Strength may be a subclass of Parameter valued-by integers. The disadvantage of this
solution is that specifying values requires two statements or the introduction of a
10
SIT1610 – Social Network Analysis – Unit IV
constructed property.
The second alternative differs in that the “native “representing meth parameters: the
generic Parameter class is defined as a subclass of rdf :Property. This model has the
advantage that it becomes more natural to represent parameter values and restrictions on
them. The disadvantage is that this solution is not compliant with OWL.
DL Social relations are socially constructed objects: they are constructed in social
environments by assigning a label to a common pattern of interaction between individuals.
Cognitive structuring, works by applying the generic pattern we associate with such a
relationship to the actual state-of-affairs we observe. For example, a student/professor
relationship at the Free University of Amsterdam is defined by the social context of the
university and this kind of relationship may not be recognizable outside of the university.
The below figure shows descriptions and Situations ontology design pattern that provides
a model of context and allows to clearly delineate these two layers of representation.
D&S is a generic pattern for modeling non-physical objects whose intended meaning
results from statements, i.e. it emerges in combination with other entities. For example, a
norm, a plan, or a social role is usually represented as a set of statements and not as a
concept.
11
SIT1610 – Social Network Analysis – Unit IV
D&S is an ontology-design pattern in the sense that it is used as a template for creating
domain ontologies in complex areas. D & S has been successfully applied in a wide
range of real-life ontology engineering projects from representing Service Level
Agreements (SLAs) to the descriptions of Web Services.
Dissolve
A community c(tk−1) in C(tk) has dissolved, when c(tk−1) shares
no URLs with any community in C(tk)
12
SIT1610 – Social Network Analysis – Unit IV
Growth Rate
The growth rate, Rgrow(c(tk−1), c(tk)), represents the increase of URLs per unit time. It
allows us to find most growing or shrinking communities.
Stability
13
SIT1610 – Social Network Analysis – Unit IV
Represents the amount of disappeared, appeared, merged and split URLs per
unit time. A stable community on a topic is the best starting point for finding
interesting changes around the topic.
Disappearance rate
The number of disappeared URLs from c(tk−1) per unit time. Higher disappear
rate means that the community has lost URLs mainly by disappearance.
Merge rate
The number of absorbed URLs from other communities by merging per unit time.
Higher merge rate means that the community has obtained URLs mainly by
merging.
Split Rate
The split rate, Rsplit (c(tk−1), c(tk)), is the number of split URLs from c(tk−1) per
unit time. When the split rate is low, c(tk) is larger than other split communities.
Otherwise, c(tk) is smaller than other split communities.
14
SIT1610 – Social Network Analysis – Unit IV
Other Metrics
The novelty metrics of a main line (c(ti), c(ti+1), ..., c(t j)) is calculated as follows.
Web crawlers are used for automated capture due to the massive size and
amount of information on the Web.
From each archive, a Web graph is built with URLs and links by extracting
anchors from all pages in the archive.
The graph included not only URLs inside the archive, but also URLs outside
pointed to by inside URLs.
The size distribution of communities also follows the power law and its
exponent did not change so much over time. Although the size distribution of
communities is stable, the structure of communities changes dynamically. The
structure of the chart changes mainly by split and merge, in which more than
half of communities are involved.
Split and Merged Communities
Both distributions roughly follow the power law, and show that split or
merge rate is small in most cases.
Growth Rate
The growth rate is small for most of communities, and the graph has
clear y-axis symmetry.
Detecting communities from given social networks are practically important for
the following reasons:
16
SIT1610 – Social Network Analysis – Unit IV
where the sum runs over all pairs of vertices, A is the adjacency matrix, ki is the
degree of vertex i and m is the total number of edges of the network. Modularity
can be rewritten as follows:
would be there if the network were a random network with the same degree for
each vertex. Figure 3.b illustrates the meaning of modularity.
Fig.1 Modularity
The latter formula implicitly shows the definition of a community: a sub network is
a community if the number of edges inside it is larger than the expected number in
modularity’s null model. The modularity of the whole network, taken as a single
community, is zero. Modularity is always smaller than one, and it can be negative
as well.
There are naive methods for dividing given networks into sub networks, such as
graph partitioning, hierarchical clustering, and k-means clustering. The methods
for detecting communities are roughly classified into the following categories:
18
SIT1610 – Social Network Analysis – Unit IV
Divisive Algorithms:
Edge betweenness is the number of shortest paths between all vertex pairs
that run along the edge.
Modularity Optimization:
19
SIT1610 – Social Network Analysis – Unit IV
Spectral Algorithms:
Spectral algorithms are to cut given network into pieces so that the number
of edges to be cut will be minimized. One of the basic algorithms is spectral graph
bipartitioning. The Laplacian matrix L of a network is an n * n symmetric matrix,
with one row and column for each vertex. Laplacian matrix is defined as L =D - A
, where A is the adjacency matrix and D is the diagonal degree matrix with
All eigenvalues of L are real and non-negative, and L has a full set of n real
and orthogonal eigenvectors. In order to minimize the above cut, vertices are
partitioned based on the signs of the eigenvector that corresponds to the second
smallest eigenvalue of L. In general, community detection based on repetitive
bipartitioning is relatively fast.
Other Algorithms:
There are many other algorithms for detecting communities, such as the
methods focusing on random walk, and the ones searching for overlapping cliques.
Network Reduction:
20
SIT1610 – Social Network Analysis – Unit IV
constructed from the bibliography of the book entitled “graph products: structure
and recognition”. The bibliography contains 360 papers written by 314 authors.
This section show how community mining techniques can be applied to the
analysis of scientific collaborations among researchers. Flink is a social network
that describes the scientific collaborations among 681 semantic Web researchers
(http://flink.semanticweb.org/).
The network was constructed based on semantic Web technologies and all
related semantic information was automatically extracted from “Web-accessible
information sources”, such as “Web pages, FOAF profiles, email lists, and
publication archives”. The weights on the links measure the degrees of
collaboration.
21
SIT1610 – Social Network Analysis – Unit IV
(1) Based on an iPDA user’s communication traces, selecting individuals who have
frequently contacted or been contacted with the user during a certain period of
time;
(3) Ranking and recommending new persons who might not be included the
current acquaintance book, the user.
22
SIT1610 – Social Network Analysis – Unit IV
understand network data and convey the result of the analysis. Visualization
often also facilitates qualitative interpretation of network data. With respect to
visualization, network analysis tools are used to change the layout, colors, size
and other properties of the network representation.
Some SNA software can perform predictive analysis.This includes using network
phenomena such as a tie to predict individual level outcomes (often called peer
influence or contagion modeling), using individual-level phenomena to predict
network outcomes such as the formation of a tie/edge (often called homophily
models) or particular type of triad, or using network phenomena to predict other
network phenomena, such as using a triad formation at time 0 to predict tie
formation at time 1.
Gephi Graph exploration and Any system supporting Open Source (GPL3), Gephi[, is an interactive
manipulation software Java 1.6 and OpenGL seeking contributors visualization and
exploration platform for
all kinds of networks and
complex systems,
dynamic and hierarchical
graphs. It is a tool for
people that have to
explore and understand
graphs. The user
interacts with the
representation.
23
SIT1610 – Social Network Analysis – Unit IV
Java Universal network and graph Any platform Open source (BSD JUNG is a Java API
Network/Graph manipulation, supporting Java license) and library that
(JUNG) Framework analysis, provides a common
and visualization and extensible
language for the
modeling, analysis,
and visualization of
relational data. It
supports a variety of
graph types (including
hypergraphs),
supports graph
elements of any type
and with any
properties
Mathematica is a
Mathematica Graph analysis, Windows, Commercial general purpose
statistics, data Macintosh, Linux computation and
visualization, analysis environment.
optimization, image
recognitio
24
SIT1610 – Social Network Analysis – Unit IV
25