Leveraging Social Media Networks For Classification
Leveraging Social Media Networks For Classification
Leveraging Social Media Networks For Classification
DOI 10.1007/s10618-010-0210-x
Received: 10 August 2009 / Accepted: 15 December 2010 / Published online: 14 January 2011
© The Author(s) 2011
Abstract Social media has reshaped the way in which people interact with each
other. The rapid development of participatory web and social networking sites like
YouTube, Twitter, and Facebook, also brings about many data mining opportunities
and novel challenges. In particular, we focus on classification tasks with user inter-
action information in a social network. Networks in social media are heterogeneous,
consisting of various relations. Since the relation-type information may not be avail-
able in social media, most existing approaches treat these inhomogeneous connections
homogeneously, leading to an unsatisfactory classification performance. In order to
handle the network heterogeneity, we propose the concept of social dimension to rep-
resent actors’ latent affiliations, and develop a classification framework based on that.
The proposed framework, SocioDim, first extracts social dimensions based on the net-
work structure to accurately capture prominent interaction patterns between actors,
then learns a discriminative classifier to select relevant social dimensions. SocioDim,
by differentiating different types of network connections, outperforms existing repre-
sentative methods of classification in social media, and offers a simple yet effective
approach to integrating two types of seemingly orthogonal information: the network
of actors and their attributes.
L. Tang (B)
Advertising Sciences, Yahoo! Labs, Santa Clara, CA 95054, USA
e-mail: [email protected]
H. Liu
Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
e-mail: [email protected]
123
448 L. Tang, H. Liu
1 Introduction
Social media, such as Facebook, MySpace, Twitter, BlogSpot, Digg, YouTube, and
Flickr, has streamlined ways for people to express their thoughts, voice their opinions,
and connect to each other anytime and anywhere. For instance, popular content-sharing
sites like Del.icio.us, Flickr, and YouTube allow users to upload, tag and comment on
different types of content (bookmarks, photos, or videos). Users registered at these sites
can also become friends, fans, or followers of others. The prolific and expanded use of
social media has turned online interactions into a vital part of the human experience.
The election of Barack Obama as the President of United States was partially attributed
to his smart Internet strategy and access to millions of younger voters through a novel
use of social media. As reported in the New York Times, in response to the disputed
Iranian presidential election in 2009, Twitter has emerged as a global communication
tool during the protest.
The increasing online traffic in social media brings about many data mining oppor-
tunities for user profiling, targeting, recommendation, and crowd-opinion analysis.
Take social networking advertising as an example. Advertising in social media has
encountered many challenges.1 A common approach to targeting is to build a model
that maps from user profiles (e.g., the geography location, education level, gender) to
ad categories. Since social media often comes with a friendship network between users
and many daily interactions, how can we exploit this rich user interaction information
to infer the ads that might attract a user? This can be considered as a classification prob-
lem. The key task boils down to classifying users into relevant ad categories. For the
classification problem, some labeled data can be collected. Online activities of users
such as clicking on an ad, purchasing a product, or hobbies on their profiles reflect the
users’ potential interests. In reality, however, the label information is limited. Given
a social network of users with their interaction information, we investigate how to
leverage this rich interaction information for classification tasks in social media.
The analysis of social structures based on network topology has been studied in
social sciences [49]. A traditional social science study involves the circulation of ques-
tionnaires, asking respondents to detail their interaction with others. Then a network
can be constructed based on the response, with nodes representing individuals, and
edges the interactions between them. This type of data collection confines most tradi-
tional social network analysis to a limited scale, typically hundreds of actors at most in
one study. Various relations are present in one network. The relations between actors
are often explicitly known, e.g., actor a is the mother of actor b; actors b and c are
colleagues. This type of relation information enables the anatomy of group process,
group norm and social role analysis [49].
Diverse relations also make up the connections in large-scale networks present in
social media. For example, one user might connect to her friends, relatives, college
1 http://www.nytimes.com/2008/12/14/business/media/14digi.html.
123
Leveraging social media networks 449
2 Problem statement
In this work, we study classification with networked data. For instance, in an adver-
tising campaign, advertisers attempt to deliver ads to those users who are interested in
their products, or similar categories. The outcome of user interests can be represented
using + or −, with + denoting a user is interested in the product and − otherwise. We
assume that the interests of some users are already known. This can be extracted from
user profiles or their response to a displayed ad. The task is to infer the preference of
the remaining users within the same social network.
In social media, some individuals are highly idiosyncratic. Their interests cannot be
captured by merely one class. It is normal to have multiple interests in a user profile.
Rather than concentrating on univariate cases of classification in networked data [29]
(with each node having only one class label), we examine a more challenging task
in which each node in a network can have multiple labels. In this setup, the univar-
iate classification is just a special case. The multi-label classification problem with
networked data can be formally described below:
Given:
– K categories Y = {Y1 , . . . , YK };
123
450 L. Tang, H. Liu
In this part, we briefly review the commonly-used method to handle classification with
networked data and discuss its limitations when applied directly to networks in social
media.
When data instances are connected in a network, they are not independent and identi-
cally distributed (i.i.d.) as in conventional data mining. It is empirically demonstrated
that linked entities have a tendency to belong to the same class [29]. This correlation in
the class variable of connected objects can be explained by the concept of homophily
in social science [30]. Homophily suggests that a connection between similar people
occurs at a higher rate than among dissimilar ones. It is one of the first characteristics
studied by early social network researchers [2,3,50], and holds for a wide variety of
relationships [30]. Homophily is also observed in social media [10,45].
Based on the empirical observation that labels of neighboring entities (nodes) are
correlated, the prediction of one node cannot be made independently, but also depends
on its neighbors. To handle the interdependency, collective inference is widely used
to address classification problems in networked data [20,29,37]. A common Markov
assumption is that, the label of one node depends on that of its neighbors (plus other
attributes if applicable). In particular,
123
Leveraging social media networks 451
Here A is the network, yi the label of node vi , and Ni a set of its “neighbors”. The
neighbors are typically defined as nodes that are 1-hop or 2-hop away from vi in the
network [20,12]. For training, a relational classifier based on the labels of neighbors
(plus other available node attributes) is learned via the labeled nodes V L . For pre-
diction, collective inference [20] is applied to find an equilibrium status such that the
inconsistency between neighboring nodes in the network is minimized. Relaxation
labeling [5], iterative classification [26], and Gibbs sampling [13] are the commonly-
used techniques. All the collective inference variants share the same basic principle: it
initializes the labels of unlabeled nodes V U , and then applies the constructed relational
classifier to assign class labels (or update class membership) for each node while fixing
the labels (or class membership) of its neighboring nodes. This process is repeated
until convergence.
This collective inference procedure shares the same spirit as the harmonic func-
tion [55] in semi-supervised learning. It is shown in [29] that a simple weighted-vote
relational neighborhood classifier [28] based on collective inference yields almost
identical performance as the Gaussian field for semi-supervised learning [55].
123
452 L. Tang, H. Liu
(as shown in Fig. 3) and find out which affiliation is more correlated with the targeted
class label, we can infer the class membership of each actor more precisely. Notice
that an actor can be present in multiple affiliations. For instance, actor 1 belongs to
both Affiliation-1 and Affiliation-2 in the example.
Given a network, differentiating its connections into distinct affiliations is not an
easy task, as the same actor is involved in multiple affiliations. Moreover, one connec-
tion can be associated with more than one affiliation. For instance, one can connect
to another as they are colleagues as well as going to the same sports club frequently.
Rather than capturing affiliations among actors via connection differentiation, we
resort to latent social dimensions, with each dimension representing a plausible affil-
iation of actors. Next, we introduce the concept of social dimensions and describe a
classification framework based on that.
123
Leveraging social media networks 453
Numerical values, instead of boolean values, can also be used to represent affiliation
membership.
We assume that the actor’s label depends on his latent social dimensions. In partic-
ular, we assume
where Si ∈ IRk denotes the social dimensions (latent affiliations) of node vi . This
is fundamentally different from the Markov assumption in Eq. 1 used in collective
inference, which assumes the label of one node relies on that of its neighbors. The
Markov assumption does not capture the weak dependency between nodes that are not
directly connected. In our approach, we assume the labels are determined by latent
social dimensions. The nodes within the same affiliation tend to have similar labels
even though they are not directly connected. Based on the assumption in Eq. 2, we
propose a learning framework called SocioDim to handle the network heterogeneity
for classification. The overview of the framework is shown in Fig. 4. The training is
composed of two phases. We first extract the latent social dimensions Si for each node
and then build a classifier based on the extracted dimensions to learn P (yi |Si ).
123
454 L. Tang, H. Liu
The output should be the social dimensions (S ∈ Rn×k ) of all nodes in the network. It
is desirable that the extracted social dimensions satisfy the following properties:
– Informative. The social dimensions should be indicative of latent affiliations of
actors.
– Plural. The same actor can engage in multiple affiliations, thus having non-zero
entries in different social dimensions.
– Continuous. The actors might have different degrees of association to one affilia-
tion. Hence, a continuous value, rather than discrete {0, 1}, is more favorable.
One key observation is that when actors belong to the same affiliation, they tend to
connect to each other. For example, people of the same department interact with each
other more frequently than any two random people in a network. In order to infer the
latent affiliations, we need to find out a group of people who interact with each other
more frequently than random. This boils down to a classical community detection
problem in networks. Most existing community detection methods partition nodes of
a network into disjoint sets. But in reality, each actor is likely to subscribe to more than
one affiliation. So a soft clustering scheme is preferred to extract social dimensions.
Many approaches developed for clustering on graphs serve the purpose of social
dimension extraction, including modularity maximization [34], latent space mod-
els [18,36], block models [1,35] and spectral clustering [27]. Spectral clustering is
originally proposed to address the partition of nodes in a graph. Spectral clustering
has been shown to work reasonably well in various domains, including graphs, text,
images, and microarray data. It is also proved [52] to be equivalent to a soft version of
the classical k-means algorithm for clustering. We choose spectral clustering to extract
social dimensions due to its effectiveness in various domains and the availability of a
huge number of existing linear algebra packages to help solve the problem. We want to
emphasize that this is not the only method of choice. Alternatives can be plugged in for
this phase. This is also one nice feature of our framework, as it allows for convenient
plug-ins of existing soft clustering packages. Later in the experiment part in Sect. 7.2,
we will compare different strategies to extract social dimensions.
Below, we briefly review the principle of spectral clustering for presentation con-
venience. Spectral clustering was originally proposed to address the problem of parti-
tioning a graph into disjoint sets. Here, the edges of a graph can have weights, denoting
the similarity between nodes. Intuitively, we want to find a partition of the graph, so
that the edges between groups have a small weight and the edges within a group have
a large weight. This is closely related to the minimum-cut problem. For two disjoint
vertex sets B, C ⊂ V , the cut between B and C is defined as
Cut (B, C) = Aij .
vi ∈B, vj ∈C
k
min cut (C1 , C2 , . . . , Ck ) = cut (Ci , V /Ci )
i=1
123
Leveraging social media networks 455
In practice, this might return trivial partitions like a group consisting of only one vertex
separated from the remaining network. To find a somehow “balanced” partition, some
alternative objectives are proposed to take into account the group size. One commonly
used objective is normalized cut [38], which is defined below:
where vol(Ci ) = vj ∈Ci dj , and dj represents the degree of node vj . The coeffi-
cient 1/k is added to normalize the score between 0 and 1. If we define a community
indicator matrix H as
1/ vol(Cj ) if vertex i belongs to community Cj
Hij = (4)
0 otherwise
1
N cut (C1 , C2 , . . . , Ck ) = T r(H T LH )
k
L=D−A
min T r(H T LH )
C1 ,...,Ck
s.t. H T DH = I
H in form of Eq. 4
min T r(H T LH )
H
s.t. H T DH = I
Let
S = D 1/2 H, (5)
123
456 L. Tang, H. Liu
123
Leveraging social media networks 457
)) = T r((SP )T L(SP
T r((S )T L(S P T ) = T r(S T LS)
)) = T r(S T LSP
Essentially, the solutions are equivalent under an orthogonal transformation. But this
non-uniqueness does not affect the discriminative learning if a linear SVM is employed.
The linear SVM with social dimensions S can be considered as a kernel machine with
a linear kernel K = SS T . With an orthogonal transformation P , the new kernel K
does not change:
It follows that the classifier and thus its predictions remain the same. Therefore, the
overall classification is not affected by the non-uniqueness of social dimensions in the
previous phase.
123
458 L. Tang, H. Liu
This does not restrict us from using alternative choices. Any soft clustering scheme
can be used to extract social dimensions in the first phase. The classification learning
phase can also be replaced with any classifier other than SVM. This flexibility enables
the convenient use of many existing software packages developed for clustering or
classification. One minor issue is that, depending on the adopted classifier, the final
decision function might not be unique because of the non-uniqueness of social dimen-
sions as discussed before. It is beyond the scope of this work to study how the final
classification performance fluctuates with respect to social dimensions under different
transformations.
5 Experiment setup
In the experiment below, we will compare our proposed SocioDim framework with
representative collective-inference methods when heterogeneity is present in a net-
work. Before we proceed to the details, we describe the data collected for experiments
and the baseline methods for comparison.
In this work, we focus on classification tasks in social media. We shall examine how
different approaches behave in real-world social networks. Two data sets are collected:
one from BlogCatalog2 and the other from the popular photo sharing site Flickr:3
– BlogCatalog A blog in BlogCatalog is associated with various information pieces
such as the categories the blog is listed under, blog level tags, snippets of 5 most
recent blog posts, and blog post level tags. Bloggers submit their blogs to Blog
Catalog and specify the metadata mentioned above for improved access to their
blogs. This way the blog sites are organized under pre-specified categories. A
blogger also specifies his connections with other bloggers. A blogger’s interests
could be gauged by the categories he publishes his blogs in. Each blogger could list
his blog under more than one category. Note that we only crawl a small portion
of the whole network. Some categories occur rarely, and they demonstrate no
positive correlation between neighboring nodes in the network. Thus, we pick
39 categories with a reasonably large sample pool for evaluation purposes. On
average, each blogger lists their blog under 1.6 categories.
– Flickr is a popular website to host photos uploaded by users, and it has become
an active community platform. Users in Flickr can tag photos and add contacts.
Users can also subscribe to different interest groups ranging from “black and
white photos”4 to a specific subject (say “bacon”5 ). Among the huge network and
numerous groups on Flickr, we randomly chose around 200 interest groups as the
2 http://www.blogcatalog.com/.
3 http://www.flickr.com/.
4 http://www.flickr.com/groups/blackandwhite/.
5 http://www.flickr.com/groups/everythingsbetterwithbacon/.
123
Leveraging social media networks 459
class labels and crawled the contact network among the users subscribed to these
groups for our experiment. The users with only one single connection are then
removed from the data set.
Table 3 lists some statistics of the network data. As seen in the table, the connections
among social actors are extremely sparse. The degree distribution is highly imbal-
anced, a typical phenomenon in scale-free networks. We also compute the normalized
cut score (Ncut) for each category and report the average Ncut. Note that if a category
is nearly isolated from the rest of a network, the Ncut score should be close to 0.
Clearly, for both data sets, the categories are not well-separated from the rest of the
network, demonstrating the difficulty of the classification tasks. Both data sets are
publicly available from authors’ homepages.
We apply SocioDim to both data sets. Spectral clustering is employed to extract social
dimensions. The number of latent dimensions is set to 500 and one-vs-rest linear SVM
is used for discriminative learning. We also compare SocioDim to two representative
relational learning methods based on collective inference (Weighted-Vote Relational
Neighbor Classifier [28] and Link-Based Classifier [26]), and two baseline methods
without learning (a Majority Model and a Random Model):
1
p(yi |Ni ) = wij · p(yj |Nj ) (10)
j wij vj ∈Ni
1
= p(yj |Nj ) (11)
|Ni |
{j :(vi ,vj )∈E}
123
460 L. Tang, H. Liu
where wij in (10) are the weights associated with the edge between node vi and vj .
Eq. 11 is derived, because the networks studied here use {0, 1} to represent con-
nections between actors, and we only consider the first order Markov assumption
that the label of one actor depends on his connected friends). Collective inference
is exploited for prediction. We iterate over each node of the network to predict
its class membership until the membership change is small enough. WVrn has
been shown to work reasonably well for classification in the univariate case and
is recommended as a baseline method for comparison [29].
– Network Only Link-Based Classifier (LBC) [26]. This classifier creates relational
features of one node by aggregating the label information of its neighbors. Then,
a relational classifier can be constructed based on labeled data. In particular, we
use average class membership (as in Eq. 11) of each class as relational features,
and employ SVM to build the relational classifier. For prediction, relaxation label-
ing [5] is utilized as the collective inference scheme.
– Majority Model (MAJORITY). This baseline method uses the label information
only. It does not leverage any network information for learning or inference. It
simply predicts the class membership as the proportion of positive instances in
the labeled data. All nodes are assigned the same class membership. This model
is inclined to predict categories of larger size.
– Random Model (RANDOM). As indicated by the name, this model predicts the
class membership for each node randomly. Neither network nor label information
is used. This model is included for the relative comparison of various methods.
In our experiments, actors often have more than one label. We apply the methods
above to each category independently and report the average performance. Since most
methods yield a ranking of labels rather than an exact assignment, a thresholding pro-
cess is normally required. It has been shown that different thresholding strategies lead
to quite different performances [9,42]. To avoid the effect of thresholding, we assume
the number of labels on the test data is already known and check how the top-ranking
predictions match with the true labels. Two commonly used measures Micro-F1 and
Macro-F1 [42] are adopted to evaluate the classification performance.
6 Experiment results
In this section, we experimentally examine the following questions: How is the classi-
fication performance of our proposed framework compared to that of collective infer-
ence? Does differentiating heterogeneous connections presented in a network yield a
better performance?
We gradually increase the number of labeled nodes from 10% to 90%. For each set-
ting, we randomly sample a portion of nodes as labeled. This process is repeated 10
times, and the average performance is recorded. The performances of different meth-
ods and the standard deviation are plotted in Fig. 5. Clearly, our proposed SocioDim
123
Leveraging social media networks 461
0.4
0.3
0.35
0.25
0.3
Macro−F1
Micro−F1
0.2 0.25
0.15 0.2
0.15
0.1
0.1
0.05
0.05
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Proportion of Labeled Nodes (%) Proportion of Labeled Nodes (%)
Fig. 5 Performance on BlogCatalog with 10, 312 nodes (better viewed in color)
outperforms all the other methods. wvRN, as shown in the figure, is the runner-up most
of the time. MAJORITY performs even worse than RANDOM in terms of Macro-F1,
as it always picks the majority class for prediction.
The superiority of SocioDim over other relational learning methods with collective
inference is evident. As shown in the figure, the link-based classifier (LBC) performs
poorly with few labeled data. This is because LBC requires a relational classifier before
the inference. When samples are few, the learned classifier is not robust enough. This
is indicated by the large deviation of LBC in the figure when labeled samples are less
than 50%. We notice that LBC in this case takes many iterations to converge. wvRN
is more stable, but its performance is not comparable to SocioDim. Even with 90%
of nodes being labeled, a substantial difference between wvRN and SocioDim is still
observed. Comparing all three methods (SocioDim, wvRN and LBC), SocioDim is
the most stable and achieves the best performance.
Compared to BlogCatalog, the Flickr network is larger, with around 100,000 nodes.
In practice, the label information in large-scale networks is often very limited. Here
we examine a similar case. We change the proportion of labeled nodes from 1% to
10%. Roughly, the number of labeled actors increases from around 1, 000 to 10, 000.
The performances are reported in Fig. 6.
123
462 L. Tang, H. Liu
0.35
0.25
0.3
0.2
0.25
Macro−F1
Micro−F1
0.15 0.2
0.1 0.15
0.1
0.05
0.05
0
0
−0.05 −0.05
0 2 4 6 8 10 0 2 4 6 8 10
Proportion of Labeled Nodes (%) Proportion of Labeled Nodes (%)
Fig. 6 Performance on Flickr with 80, 513 nodes (Better viewed in color)
The methods based on collective inference, such as wvRN and LBC, perform poorly.
The LBC fails most of the time (almost like random) and is highly unstable. This can be
verified by the fluctuation of its Micro-F1. LBC tries to learn a classifier based on the
features aggregated from a node’s neighbors. The classifier can be problematic when
the labeled data are extremely sparse and the network is noisy as presented here. While
alternative collective inference methods fail, SocioDim performs consistently better
than other methods by differentiating heterogeneous connections within the network.
It is noticed the prediction performance on both data sets is around 20–30% for
F1-measure, suggesting that social media networks are very noisy. As shown in later
experiments, the performance can be improved when other actor features are also
included for learning.
One might suspect that the poor performance of collective inference is due to the spar-
sity of a given network. As shown in Table 3, the density of the BlogCatalog network
123
Leveraging social media networks 463
Macro−F1
0.2
0.15
0.1
0.05
0
10 20 30 40 50 60 70 80 90
Percentage of Labeled Nodes (%)
is only 6.3 × 10−3 . Flickr is even sparser. When a network is too sparse, Gallagher
et al. [12] suggest expanding the neighborhood of one node by adding “ghost edges”
to connect those nodes that are 2-hop away. Following their idea, we construct a net-
work by linking all nodes that are within 2 hops. After the expansion for BlogCatalog,
the network density leaps to 6.16 × 10−1 . We cannot expand any more as the net-
work becomes almost a complete graph when nodes within 3 hops are connected.
Flickr becomes quite dense after a 2-hop expansion, causing computational problems.
Therefore, we report the performance of collective inference of the expanded network
on BlogCatalog only in Fig. 7, where i denotes the number of hops to consider for
defining the neighborhood. For both wvRN and LBC, the performance deteriorates
after the neighborhood is expanded. This is because of the increase of connections.
Though it alleviates the sparsity problem, it also introduces more heterogeneity.
Collective inference, no matter how we define the neighborhood, is not comparable
to SocioDim.
6.3.2 H2 : Does SocioDim win because nodes within a category are well isolated
from the rest of a network?
123
464 L. Tang, H. Liu
0.6 Ncut=0.48
0.5
F1
0.4
0.3
0.2
Ncut=0.50
0.1
0
0 50 100 150 200
Categories sorted in ascending order by NCut
0.4
0.3
0.2
0.1
−0.1
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Category Ncut
set all the nodes not belonging to that category. The normalized cut can be computed.
If a category is well-isolated from a network, the Ncut should be close to 0. The larger
the NCut is, the less the category is isolated from the remaining network. The average
category Ncut scores on BlogCatalog and Flickr are 0.48 and 0.46, respectively, as
reported in Table 3. This implies that most categories are actually well connected to
the remaining network, rather than being an isolated group as one might suppose.
Figure 8 shows the performance of SocioDim on the 195 individual categories of the
Flickr data when 90% of nodes are labeled. As expected, SocioDim performance tends
to decrease when the Ncut increases. An interesting pattern emerges when we plot the
improvement of SocioDim over wvRN with respect to category Ncut in Fig. 9. The
largest improvements happen for those categories that are not well-isolated. Notice
the plus at the bottom left, which corresponds to the case when N cut = 0.12. For this
category, SocioDim achieves 90% F1 as shown in Fig. 8. But the improvement over
wvRN is almost 0 as in Fig. 9. That is, wvRN is comparable. Most of SocioDim’s
improvements occur when N cut > 0.35. Essentially, when a category is not well-iso-
lated from the remaining network, SocioDim tends to outperform wvRN substantially.
123
Leveraging social media networks 465
In the previous subsection, we notice that SocioDim performs considerably better than
wvRN when a category is not so well-separated. We attribute the gain to taking into
account connection heterogeneity. Hence, we adopt a benchmark relational data (imdb
used in [29]) with varying heterogeneity. The imdb network is collected from the Inter-
net Movie Data base, with nodes representing 1377 movies released between 1996
and 2001. The goal is to estimate whether the opening weekend box-office receipts of
a movie “will” exceed 2 million dollars. Two versions of network data are constructed:
imdbprodco and imdball . In imdbprodco , two movies are connected if they share a pro-
duction company. While in imdball , they are connected if they share actors, directors,
producers or production companies. Clearly, the connections in imdball are more het-
erogeneous. Both network data sets have one giant connected component each, with
others being singletons or trivial-size components. Here, we report the performance
on the largest components.
We notice that these two data sets demonstrate different characteristics from the
previously-studied social media data. (1) the connections are denser. For instance,
the density of imdball is 0.049 (7 times denser than BlogCatalog and 27 times than
Flickr). (2) the classification task is also much easier. It is a binary classification task.
The class distribution is balanced, different from the imbalanced distribution present
in the social media data. Hence, we report classification performance in terms of accu-
racy as in [29]. Finally, (3) classes are well separated as suggested by the low category
Ncut in Table 4.
Figure 10 plots the performance of SocioDim and wvRN on imdbprodco and
imdball . When connections are relatively homogeneous (e.g., imdbprodco data),
SocioDim and wvRN demonstrate comparable classification performance. When con-
nections become heterogeneous, the category NCut increases from 0.24 to 0.36. The
performances of both methods decrease as shown in Fig. 10b. For instance, with
50% of nodes labeled, SocioDim’s accuracy decreases from 80% to about 77%,
but wvRN’s performance drops severely, from 80% to around 66% with introduced
heterogeneity.
We notice that the performance decrease of wvRN is most observable when labeled
data are few. With increasing labeled data, wvRN’s performance climbs up. The com-
parison on the two networks of distinctive degree of heterogeneity confirms our original
hypothesis: SocioDim, by differentiating heterogeneous connections, performs better
than collective inference. This effect is more observable when a network presents
heterogeneity and the labeled data are few.
123
466 L. Tang, H. Liu
75
Accuracy (%)
70
65
60
55
SocioDim
wvRN
50
0 20 40 60 80 100
Percentage of Labeled Nodes (%)
(b) 85
80
75
Accuracy (%)
70
65
60
55
SocioDim
wvRN
50
0 20 40 60 80 100
Percentage of Labeled Nodes (%)
In order to get some tangible idea of extracted social dimensions, we examine tags asso-
ciated with each dimension. It is impractical to show tag clouds of all the extracted
dimensions (500 social dimensions for both data sets). Thus, given a category, we
investigate its dimension with the maximum SVM weight and check whether it is
truly informative about the category.
Since we use soft clustering to extract social dimensions, each dimension is repre-
sented by continuous values (as in the last column in Table 2). For simplicity, we pick
the top 20 nodes with the maximum positive values as representatives of one dimen-
sion. For example, nodes 8, 9, 7 and 6 are the top 4 nodes for the social dimension
extracted following spectral clustering in Table 2. Tags of those representative nodes
in one dimension are aggregated as the tags of that dimension. For the sake of clear
visualization, only those tags that occur more than once are kept in a tag cloud with
font size denoting their relative frequency.
Due to the space limit, we showcase only two examples from BlogCatalog. Figure 11
lists the tag clouds of selected social dimensions for categories Health and Animal,
123
Leveraging social media networks 467
(a) (b)
Fig. 11 Social dimensions selected by health and animal, respectively. a Health. b Animal
respectively. Evidently, both are quite relevant to their target categories. Based on
the tag cloud in Fig. 11a, it is not difficult to figure out that the social dimension is
about food and weight loss, which is highly relevant to Health. Similarly, the social
dimension in Fig. 11b is about dogs and pets, thus relevant to Animal. These examples
suggest that our extracted social dimensions are sensible, and that relevant dimen-
sions can be selected accordingly by the classification learning phase of the SocioDim
framework.
In the previous section, we have shown that SocioDim outperforms methods based on
collective inference. In this section, we analyze properties of SocioDim from different
aspects. The following questions will be explored:
– Social media provides more than mere actor network information. How can we
include different types of information in the classification framework?
– Are there any other strategies to extract social dimensions? How do they perform
compared with spectral clustering?
– The current instantiation of SocioDim requires users to provide a parameter to set
the number of social dimensions. How sensitive is the classification performance
to this parameter?
In social media, various kinds of user information besides social networks can be col-
lected. For instance, in blogosphere, people post blogs, write comments and upload
tags. Some users also provide profile information. It is desirable to utilize all the infor-
mation available to achieve more accurate classification. However, the actor features
(e.g., user profiles, social content or tag information) and the networks are presented
in disparate formats, hence some efforts are required for the integration.
One nice property of SocioDim is that it converts a network into features. So, if
actor features are available, it is straightforward to couple the network features with
actor features: simply combine the extracted social dimensions with actor features, and
the discriminative learning procedure determines which features are more informative
123
468 L. Tang, H. Liu
0.35
Macro−F1
Tag Alone
0.3
0.25
Network Alone
0.2
0.15
10% 30% 50% 70% 90%
Proportion of Labeled Nodes
(b) 0.55
Tag + Network
0.5
0.45
Micro−F1
Tag Alone
0.4
0.35
Network Alone
0.3
0.25
10% 30% 50% 70% 90%
Proportion of Labeled Nodes
of a class label. This simple combination of network information and actor features
allows for integration of data in disparate format and can lead to more accurate clas-
sification in general. Here we take BlogCatalog as an example to show the effect. In
BlogCatalog, the blogger can upload some tags descriptive of his blog site. We use
tag information as actor features for the bloggers. The performance of using tag or
network alone, or the combination of the two, are plotted in Fig. 12.
Tags are normally quite descriptive of a blogger while networks tend to be noisy.
It should not be surprising that the performance based on tags alone is better than
the performance based on networks. It is noticed that increasing the labeled samples
does not help much for performance based on tags, partly because some users do not
provide tags. But if we combine the social dimensions extracted from a network with
the tag features, the performance is increased by 3–6%. Networks in social media
are noisy. They provide complementary, though maybe weak, information of user
interests. There are other relational models to capture the dependency of connected
nodes and additional attributes (e.g.,[44,43]), but they normally require a lot of effort.
123
Leveraging social media networks 469
Our proposed SocioDim provides a simple yet effective approach to integrating net-
work information and actor features for accurate classification.
Each social dimension represents one latent affiliation. One actor is allowed to par-
ticipate in multiple different affiliations, hence a soft clustering scheme is preferred
to extract social dimensions. In previous parts, we adopted spectral clustering, which
is proved to equal to a soft version of k-means partition [52], to extract social dimen-
sions. One basic question is: how different is the performance of a hard partition
from that of a soft clustering? Spectral clustering involves the calculation of the top
eigenvectors of a normalized graph Laplacian whereas a k-means partition algorithm
is normally faster and more scalable. Should these two methods be comparable in
terms of classification performance, then a k-means partition should be preferred.
For completeness, we also explore other alternative soft clustering schemes.
Recently, modularity [33] is proposed to calibrate the strength of community structure
in scale-free networks. Modularity is like a statistical test where the null model is a
uniform random graph in which one actor connects to others with uniform probability
while keeping the same degree as a given network. Consider dividing the interac-
tion matrix A of n vertices and m edges into k non-overlapping communities. Let si
denote the community membership of vertex vi , and di represents the degree of vertex
i. For two nodes with degree di and dj respectively, the expected number of edges
between the two in a uniform random graph model is di dj /2m. Modularity measures
how far the interaction deviates from a uniform random graph with the same degree
distribution. It is defined as:
1 di dj
Q= Aij − δ(si , sj ) (12)
2m 2m
ij
ddT
B =A− (13)
2m
123
470 L. Tang, H. Liu
Macro−F1
0.25
0.2
0.15
0.1
0.05
10 20 30 40 50 60 70 80 90
Proportion of Labeled Nodes (%)
(b) 0.5
K−Means Partition
0.45 Modularity Maximization
Spectral Clustering
0.4
Micro−F1
0.35
0.3
0.25
0.2
10 20 30 40 50 60 70 80 90
Proportion of Labeled Nodes (%)
for the subsequent discriminative learning. To be fair, we fix the dimensionality to 500
for all the methods. For k-means, we adopt a similar strategy as in [19] by considering
the connections of each user as features and using k-means with cosine similarity
for clustering. For modularity maximization, we compute the top eigenvectors of the
modularity matrix defined in Eq. 13. The performances on BlogCatalog and Flickr are
plotted in Figs. 13 and 14, respectively.
Clearly, different methods yield quite different performances. This indicates that
the extraction of social dimensions can be crucial to our SocioDim framework. The
difference of soft clustering and hard partition is evident on BlogCatalog. Both spec-
tral clustering and modularity maximization outperform k-means partition. When the
network scales to a larger size as in Flickr, modularity maximization does not show a
strong superiority over hard partition. Indeed, the performance of modularity maximi-
zation and that of k-means partition are comparable on Flickr. Spectral clustering, on
the contrary, excels in all cases. Spectral clustering seems to capture the latent affil-
iations more accurately for within-network classification. A related study shows that
maximizing modularity tends to find communities composed of small clusters [11].
123
Leveraging social media networks 471
0.2
Macro−F1
0.18
0.16
0.14
K−Means Partition
0.12
Modularity Maximization
0.08
1 2 3 4 5 6 7 8 9 10
Proportion of Labeled Nodes (%)
(b) 0.36
0.34
0.32
0.3
Micro−F1
0.28
0.26
K−Means Partition
0.24
Modularity Maximization
0.22 Spectral Clustering
0.2
0.18
1 2 3 4 5 6 7 8 9 10
Proportion of Labeled Nodes (%)
In the experiments above, the social dimensionality is fixed to 500 for SocioDim. In
this subsection, we examine how the performance fluctuates with a varying number of
social dimensions. On both data sets, we vary the dimensionality from 100 to 1, 000
and observe its performance variation. The respective performance changes on Blog-
Catalog and Flickr are plotted in Fig. 15. To make the figure legible, we only plot the
cases when 10, 50 or 90% of nodes in the network are labeled on BlogCatalog, and
1, 5 or 9% on Flickr.
123
472 L. Tang, H. Liu
0.45
0.3
Macro−F1
0.4
Micro−F1
0.25 0.35
0.3
0.2
0.25
0.2
0 200 400 600 800 1000 0 200 400 600 800 1000
Latent Dimensionality Latent Dimensionality
0.2
Macro−F1
Micro−F1
0.35
0.15
0.3
0.1
0.05 0.25
0 200 400 600 800 1000 0 200 400 600 800 1000
Latent Dimensionality Latent Dimensionality
%1 %5 %9
123
Leveraging social media networks 473
above. This provides a high-level guideline to set the parameter, which can save some
time if extensive cross validation is required.
8 Related work
Relational learning [14] refers to the classification when objects or entities are pre-
sented in multiple relations or network format. In this work, we study a special case:
within-network classification [29] when the objects are connected in one network.
The data instances in the network are not independently identically distributed (i.i.d.)
as in conventional data mining. In order to capture the correlation between labels of
neighboring data objects, a Markov dependency assumption is widely adopted. That
is, the label of one node depends on the labels (and attributes) of its neighbors. Based
on the assumption, collective inference [20] is proposed for prediction. Normally, a
relational classifier is constructed based on the relational features of labeled data, and
then an iterative process is required to determine class labels for unlabeled data. It
is shown that a simple weighted vote relational neighborhood classifier [28] works
reasonably well on some benchmark relational data and is recommended as a baseline
for comparison [29].
In our implementation of collective inference, we define the neighborhood to be
the nodes that are only 1-hop away. Gallagher et al. [12] propose to add “ghost edges”
before relational learning when the network is too sparse. The ghost edges essen-
tially connect nodes that are 2 hops away. After the expansion of the neighborhood
of one node for collective inference, a better classification performance is observed.
However, this strategy cannot be applied to networks in social media. In social net-
works, the small-world effect [46] is often observed [4]. That is, any pair of nodes in
a large-scale social network are only several hops away, relating to the well-known
“six degree of separation”. For instance, in our Flickr data, the average degree of one
node is 146. Roughly, the nodes that are two hops away from one node can be as
high as 146 × 146 = 21, 316. Of course, this number is not precise as the friends
of friends may overlap. This huge number of neighbors brings in much more noise
and heterogeneity in connections, which can worsen the performance of collective
inference. This is empirically verified in a smaller BlogCatalog network in Sect. 6.3.1.
Often, a network becomes very dense after neighborhood expansion. As a result, the
scalability can be a concern as well.
There are many more complicated relational models to model the dependence
between connected entities. For instance, probabilistic relational model (PRM) as
introduced in [44,43]. Please refer to [14] for a comprehensive treatment. No doubt
such models are quite powerful to model various dependencies amongst entities,
123
474 L. Tang, H. Liu
though the subsequent inference always requires certain approximation. Their com-
plexity and scalability are often a barrier for practical use. Indeed, Macskassy and
Provost compared wvRN with PRM, and found that wvRN outperforms PRM on sev-
eral relational data sets [29]. Given the extreme simplicity of wvRN and its outstanding
performance, wvRN is adopted as a baseline in our experiments.
Many relational classifiers only capture the local dependency based on the Markov
assumption. To capture the long-distance correlation, the latent group model [32] and
the nonparametric infinite hidden relational model [51] are proposed. Both present
generative models such that the links (and actor attributes) are generated based on
actors’ latent cluster membership. They share a similar spirit as SocioDim. But the
model intricacy and high computational cost for inference hinders their direct appli-
cation to huge networks. So Neville and Jensen in [32] propose to use a clustering
algorithm to find the hard cluster membership of each actor first, and then fix the
latent group variables for later inference. In social media, a network is often very
noisy. Some nodes do not show a strong community membership and hard clustering
might assign them randomly [19]. The resultant community structure can change dras-
tically even with the removal of one single edge in the network. Our social dimensions
are represented as continuous values. Each node is allowed to be involved in different
dimensions in a flexible manner. It is also empirically verified that hard partition is
not comparable to soft clustering in our experiment in Sect. 7.2. Another difference is
that both the latent group model and nonparametric infinite hidden relational model
are generative, while SocioDim allows the plug-in of any discriminative classifier. In
conjunction with the discriminative power of SVM, SocioDim yields more accurate
and stable performances.
123
Leveraging social media networks 475
Extracting latent social dimensions is related to community detection [41]. That has
been an active field in social network analysis, and various methods have been pro-
posed including stochastic block models [35,1], the latent space model [18,17], spec-
tral clustering [27] and modularity maximization [33]. A comprehensive treatment is
presented in [14]. In this work, spectral clustering is employed for SocioDim, but any
other soft clustering methods should also serve the purpose.
Recently, Kumar et al. [22] found that real-world networks consist of a giant con-
nected component with others being singletons and small-size connected components.
Leskovec [23] studied the statistical properties of communities on the giant connected
component and found a similar pattern. The optimal spectral cut always returns a com-
munity of 100 to 200 nodes, loosely connected (say, one or two edges) to the remaining
network. A further comprehensive comparison of various community detection algo-
rithms is reported in [24]. In these papers, most community detection methods focus on
discrete binary cases, i.e., extracting one community from a network based on certain
criterion. Whereas SocioDim employs soft clustering to extract social dimensions, and
typically many more dimensions instead of just one or two are extracted. We believe
a comprehensive comparison of different soft clustering approaches for the extraction
of social dimensions and their scalability is an interesting line of future work.
Social media provides a virtual social networking environment. The presence of partial
label information and networking information allows us to build better classifiers. This
work proposes a novel approach to dealing with heterogeneous connections prevalent
in social media. To differentiate heterogeneous connections, we propose to extract
latent social dimensions via soft clustering such as modularity maximization and spec-
tral clustering. Based on the extracted social dimensions, a discriminative classifier
like SVM can be constructed to determine which dimensions are informative for clas-
sification. Extensive experiments on social media data demonstrated that our proposed
social dimension approach outperforms alternative relational learning methods based
on collective inference, especially when the labeled data are few. It is noticed that some
relational models perform poorly in social media data. This is due to the heterogene-
ity of connections and high irregularity of human interactions in social media. Our
approach, by differentiating disparate types of connections among social actors and
converting network information into conventional features, achieves effective learning
for classification.
Many interesting directions can be explored along with the SocioDim framework.
– The SocioDim framework converts networks into features, thus enabling con-
venient integration of data in disparate formats. How is it compared with other
123
476 L. Tang, H. Liu
relational learning approaches that model network dependency and actor attri-
butes? Is there any more effective method other than simple juxtaposition to inte-
grate social dimensions and actor features?
– In this work, spectral clustering is employed to extract social dimensions. The
resultant dimensions are dense, causing computational problems. On the contrary,
a hard partition delivers sparse social dimensions, as each actor is associated with
only one affiliation. But this constraint also limits its corresponding classification
performance. It is imperative to marry the advantages of both soft clustering and
hard partition. That is, each actor is allowed to participate in more than one affil-
iation, yet the corresponding social dimensions remain sparse. Some preliminary
results aiming at extracting sparse social dimensions have been presented in [40].
On the other hand, Menon and Elkan [31] show that supervised and unsupervised
extraction of social dimensions yield comparable results.
– Another line of research is parallel and distributed computing to handle evolving,
large-scale networks. Luckily, our SocioDim consists of two well studied steps:
spectral clustering and SVM learning. Both have been extended to distributed
cases [16,6,8]. Thus SocioDim can be deployed to harness the power of parallel
computing. In social media, networks are highly dynamic. Each day, new mem-
bers join a social network, and new connections are established among existing
members. It remains a challenge to achieve efficient updates in a parallel setting.
– In the current model,we do not employ the relationship between different labels. In
order to handle this multi-label classification [47], a commonly-used one-vs-rest
scheme [42] is used. In certain scenarios, the labels can present certain struc-
tures like a hierarchical taxonomy. It merits further research to employ both social
networks and label networks for joint classification.
Acknowledgements This research is, in part, sponsored by the Air Force Office of Scientific Research
grant FA95500810132. We thank BlogCatalog and Flickr for providing APIs. We acknowledge Xufei Wang
and Munmun De Choudhury for their help with data collection. We also wish to acknowledge Subbarao
Kambhampati and Pat Langley for their suggestions to improve this work. We thank the anonymous review-
ers wholeheartedly for their expert opinions and constructive suggestions.
References
1. Airodi EM, Blei D, Fienberg SE, Xing EP (2008) Mixed membership stochastic block models. J Mach
Learn Res 9:1981–2014
2. Almack JC (1922) The influence of intelligence on the selection of associates. Sch Soc 16:529–530
3. Bott H (1928) Observation of play activities in a nursery school. Genet Psychol Monogr 4:44–88
4. Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput
Surv 38(1):2
5. Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. In: SIG-
MOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data.
ACM, New York, NY, USA, pp 307–318
6. Chang E, Zhu K, Wang H, Bai H, Li J, Qiu Z, Cui H (2007) Psvm: parallelizing support vector machines
on distributed computers. Adv Neural Inf Process Syst 20:1081–1088
7. Chen G, Wang F, Zhang C (2008) Semi-supervised multi-label learning by solving a sylvester equa-
tion. In: Proceedings of the SIAM international conference on data mining, Bethesda, MD, USA,
pp 410–419
123
Leveraging social media networks 477
8. Chen W-Y, Song Y, Bai H, Lin C-J, Chang EY (2010) Parallel spectral clustering in distributed systems.
IEEE Trans Pattern Anal Mach Intell 99
9. Fan R-E, Lin C-J (2007) A study on threshold selection for multi-label classication. Technical report,
National Taiwan University
10. Fiore AT, Donath JS (2005) Homophily in online dating: when do you like someone like yourself?. In:
CHI ’05: CHI ’05 extended abstracts on human factors in computing systems. ACM, New York, NY,
USA, pp 1371–1374
11. Fortunato S, Barthelemy M (2007) Resolution limit in community detection. PNAS 104(1):36–41
12. Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely
labeled networks. In: KDD ’08: proceeding of the 14th ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, New York, NY, USA, pp 256–264
13. Geman S, Geman D (1990) Stochastic relaxation, gibbs distributions, and the bayesian restoration of
images, San Francisco, CA, USA, pp 452–472
14. Getoor L, Taskar B (Eds) (2007) Introduction to statistical relational learning. The MIT Press, London,
England
15. Golub GH, Van Loan CF (1996) Matrix computations. 3. Johns Hopkins University Press, Baltimore
16. Graf H, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the
cascade svm. Adv Neural Inf Process Syst 17(521-528):2
17. Handcock MS, Raftery AE, Tantrum JM. (2007) Model-based clustering for social networks. J R Stat
Soc A 127(2):301–354
18. Hoff PD, Raftery AE, Handcock MS (2002) Latent space approaches to social network analysis. J A
Stat Assoc 97(460):1090–1098
19. Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. In: KDD
’03: proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and
data mining. ACM, New York, NY, USA, pp 541–546
20. Jensen D, Neville J, Gallagher B (2004) Why collective inference improves relational classification. In:
KDD ’04: proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery
and data mining. ACM, New York, NY, USA, pp 593–598
21. Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: ICML, New
York, NY, USA
22. Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: KDD ’06:
proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data
mining. ACM, New York, NY, USA, pp 611–617
23. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community struc-
ture in large social and information networks. In: WWW ’08: proceeding of the 17th international
conference on world wide web. ACM, New York, NY, USA, pp 695–704
24. Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community
detection. In: WWW ’10: proceedings of the 19th international conference on World wide web. ACM,
New York, NY, USA, pp 631–640
25. Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix
factorization. In: AAAI, Orlando, FL, USA
26. Lu Q, Getoor L (2003) Link-based classification. In: ICML: New York, NY, USA
27. Luxburg Uv (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
28. Macskassy SA, Provost F (2003) A simple relational classifier. In: Proceedings of the multi-relational
data mining workshop (MRDM) at the ninth ACM SIGKDD international conference on knowledge
discovery and data mining, ACM Press, New York, NY, USA
29. Macskassy SA, Provost F (2007) Classification in networked data: a toolkit and a univariate case study.
J Mach Learn Res 8:935–983
30. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks.
Annu Rev Sociol 27:415–444
31. Menon AK, Elkan C (2010) Predicting labels for dyadic data. Data Min Knowl Discov 21(2):327–343
32. Neville J, Jensen D (2005) Leveraging relational autocorrelation with latent group models. In: MRDM
’05: proceedings of the 4th international workshop on Multi-relational mining. ACM, New York, NY,
USA, pp 49–55
33. Newman M (2006) Finding community structure in networks using the eigenvectors of matrices. Phys
Rev E Stat Nonlin Soft Matter Phys 74(3)
34. Newman M (2006) Modularity and community structure in networks. PNAS 103(23):8577–8582
123
478 L. Tang, H. Liu
35. Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat
Assoc 96(455):1077–1087
36. Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. SIGKDD
Explor Newsl 7(2):31–40
37. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in
network data. AI Mag 29(3):93
38. Shi J, Malik J (1997) Normalized cuts and image segmentation. In: CVPR ’97: proceedings of the
1997 conference on computer vision and pattern recognition (CVPR ’97). IEEE Computer Society,
Washington, DC, USA, pp 731
39. Tang L, Liu H (2009a) Relational learning via latent social dimensions. In: KDD ’09: proceedings of
the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,
New York, NY, USA, pp 817–826
40. Tang L, Liu H (2009b) Scalable learning of collective behavior based on sparse social dimensions.
In: CIKM ’09: proceeding of the 18th ACM conference on Information and knowledge management.
ACM, New York, NY, USA, pp 1107–1116
41. Tang L, Liu H (1996) Community detection and mining in social media. Synthesis lectures on data
mining and knowledge discovery. Morgan and Claypool Publishers, USA
42. Tang L, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In:
WWW ’09: proceedings of the 18th international conference on world wide web. New York, NY,
USA, pp 211–220
43. Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: UAI,
Edmonton, Canada, pp 485–492
44. Taskar B, Segal E, Koller D (2001) Probabilistic classification and clustering in relational data. In:
IJCAI’01: proceedings of the 17th international joint conference on artificial intelligence. Morgan
Kaufmann Publishers Inc, San Francisco, CA, USA, pp 870–876
45. Thelwall M (2009) Homophily in myspace. J Am Soc Inf Sci Technol 60(2):219–231
46. Travers J, Milgram S (1969) An experimental study of the small world problem. Sociometry
32(4):425–443
47. Tsoumakas G, Katakis I (2007) Multi label classification: an overview. Int J Data Wareh Min 3(3):1–13
48. Tsuda K, Noble WS (2004) Learning kernels from biological networks by maximizing entropy. Bio-
informatics 20:326–333
49. Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge Univer-
sity Press, Cambridge
50. Wellman B (1926) The school child’s choice of companions. J Edu Res 14:126–132
51. Xu Z, Tresp V, Yu S, Yu K (2008) Nonparametric relational learning for social network analysis. In:
KDD’2008 workshop on social network mining and analysis, Las Vegas, NV, USA
52. Zha H, He X, Ding CHQ, Gu M, Simon HD. (2001) Spectral relaxation for k-means clustering. In:
NIPS, Vancouver, Canada, pp 1057–1064
53. Zhou D, Bousquet O, Lal T, Weston J, Scholkopf B (2004) Learning with local and global consis-
tency. In: Advances in neural information processing systems 16: proceedings of the 2003 conference.
Bradford Book, Cambridge, pp 321
54. Zhu X (2006) Semi-supervised learning literature survey. MIT Press, Cambridge, USA
55. Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using gaussian fields and harmonic
functions. In: ICML, New York, NY, USA
123