Tracking Community Strength in Dynamic

Prasad Banoth

Tracking Community Strength in Dynamic

Prasad Banoth

visibility

…

description

6 pages

link

1 file

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 3, March 2016 Tracking Community Strength in Dynamic Networks [1] K Lakshmi Priya [2] M Pallavi, [3] Prasad B [1] II/IV, [2][3]Associate Professor [1][2][3] Department of CSE, Marri Laxman Reddy Institute of Technology and Management (MLRITM) Hyderabad [1] [email protected][2] [email protected] [3][email protected] Abstract: Analysis on dynamic networks has become a popularly discussed topic today, with more and more emerging data over time. In this paper we investigate the problem of detecting and tracking the variation communities within a given time period. We first define a metric to measure the strength of a community, called the normalized temporal community strength. And then, we propose our analysis framework. The community may evolve over time, either split to multiple communities or merge with others. We address the problem of evolutionary clustering with requirement on temporal smoothness and propose a revised soft clustering method based on non-negative matrix factorization. Then we use a clustering matching method to find the soft correspondence between different community distribution structures. This matching establishes the connection between consecutive snapshots. To estimate the variation rate and meanwhile address the smoothness during continuous evolution, we propose an objective function that combines the conformity of current variation and historical variation trend. In addition, we integrate the weights to the objective function to identify the temporal outliers. An iterative coordinate descent method is proposed to solve the optimization framework. We extensively evaluate our method with a synthetic dataset and several real datasets. The experimental results demonstrate the effectiveness of our method, which is greatly superior to the baselines on detection of the communities with significant variation over time. Keywords: Dynamic, Social network, Community, Tracking, Clustering. I. INTRODUCTION Tracking community strength in dynamic networks has been a hot topic in data mining which has attracted much attention. Recently, there are many studies which focus on discovering communities successively from consecutive snapshots by considering both the current and historical information. However, these methods cannot provide us with much historical or successive information related to the detected communities. Different from previous studies which focus on community detection in dynamic networks, we define a new problem of tracking the progression of the community strength a novel measure that reflects the community robustness and coherence throughout the entire observation period. To achieve this goal, we propose a novel framework which formulates the Problem as an optimization task. The proposed community strength analysis also provides foundation for a wide variety of related applications such as discovering how the strength of each detected community changes over the entire observation period. To demonstrate that the proposed method provides precise and meaningful evolutionary patterns of communities which are not directly obtainable from traditional methods, we perform extensive experimental studies on one synthetic and five real datasets: social evolution, tweeting interaction, actor relationships, bibliography and biological datasets. Experimental results show that the proposed approach is highly effective in discovering the progression of community strengths and detecting interesting communities. In this, we define that a community is with high strength if it has relatively stronger internal interactions connecting its members than the external interactions with the members to the All Rights Reserved © 2016 IJERCSE 548 International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 3, March 2016 rest of the world. Dense internal interactions and weak external interactions guarantee that the community is under a low risk of member change (current members leaving or/and new members joining). Intuitively, a friend community is “strong” if its members tie together closely and ignore the temptation from the outside world. On the contrary, a friend community is regarded as a “weak” community if it is likely to confront a member alteration situation. To illustrate this concept, Fig. 1(a) shows a toy example; where the nodes represented by the same geometric shape belong to the same community, solid lines represent internal interactions and dash lines represent external interactions. The circle community (i.e. nodes A, B, C and D) is considered to be stronger than the rectangle community (i.e. nodes E, F, G and H), due to the weaker external attractions. On the other hand, node H has a close relationship with the diamond community (i.e. nodes I, J and K), which makes the rectangle community in the risk of losing its members. In other words, the higher strength score a community obtains, the less possible member alternation occurs in it. It is worth noticing that community strength is a measure which synthetically considers both the community cohesion (i.e. how close the members are in a community) and separation (i.e. how distinct a cluster is from the other clusters). Furthermore, community strength should be a temporal measure whose value may change as the network evolves. Here‟s an example in the real world. A set of authors have collaborated closely from 2000 to 2006. During this period, they cooperated frequently among themselves and barely with others outside the community. However, after 2006, because of interest changes, some authors‟ attentions have been attracted to some other fields. Thus the internal cooperation decreased and the external cooperation increased. In this case, this author community‟s strength is high and stable during 2000-2006, but begins to decrease after 2006. As a toy example, in Fig. 1(b) (i.e. the network in the 2ndsnapshot) which evolves from Fig. 1(a) (i.e. the network at the 1snapshot), the strength of the rectangle community decreases, because the internal connections become weaker and external connections become stronger. Discovering the progression of community strengths can offer significant insights in a variety of applications. It can help us discover some interesting community information. Which cannot be directly obtained from traditional community analysis. Interesting examples of communities‟ strength progression can be commonly observed in real-life scenarios. Here we discuss two specific cases in detail. Fig 1: A toy example illustrating community strength Strengths Progression in Actor Community: As a strong actor community, the cooperation should be more frequent between the members themselves than between members and non-members. For example, considering the popular and long-running television sitcom „Friends‟1, its six main actors J. Aniston, C. Cox, M. Perry, M. LeBlanc, L. Kudrow and D. Schwimmer collaborated closely when this sitcom was aired from 1994 to 2004. Let‟s consider each year‟s co-starring relationships as one snapshot. We can see that the strength of this community is very low before 1994 (little cooperation between them), and then dramatically increases and keeps stable from 1994 to 2004 (average 23 episodes each year). Finally, the strength of this community apparently becomes weaker after 2004 (much less cooperation comparing to the previous years). The progression of this actor community‟s strengths shows an interesting pattern of cooperation history among these six actors. Learning the strength progression of actor communities helps we better understand the entertainment industry. Strength Progression in Gene Community: In the biological domain, the interactions between genes change gradually in dynamic gene co expression networks. Thus the strength of gene communities also changes. For example, it has been reported that the expression profiling of some key genes will change as the cancer progresses. In such cases, the corresponding gene communities‟ strength also changes. Discovering the strengths of gene communities throughout a specific disease progression can help us find significant clues in the fields of medicine and biology. For a specific disease All Rights Reserved © 2016 IJERCSE 549 International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 3, March 2016 if a gene community is found strong only at the early stage, it is very likely to be a crucial trigger for the disease deterioration. From the above cases, we can see that discovering the progression of community strengths helps us understand the underlying behavior of communities. The initial idea was published in which covers the basic definition of community strength and the evolutionary analysis on dynamic networks. By utilizing the community strength value, the consistent communities can be detected and tracked over an observation period. This paper extends the original idea to formulate a solid method with broader applications and provide more supportive and comprehensive experiments. In this paper, our goal is to detect the temporal strength of each detected community throughout all the snapshots so that we can answer the following questions: How does the strength of each community change over the observation period? What are the top-K strong communities throughout the observation? Period. How do the communities from adjacent snapshots influence the strength of each other? To sum up, our main contributions in this paper are as follows: we introduce the notion of progression analysis of community strengths. To the best of our knowledge, this is the first work on analyzing the temporal community quality or structure information considering both time and community information. We formulate the problem as an optimization framework that can effectively detect the temporal strength of communities and track the strength progression pattern. Experiments on the synthetic dataset show the proposed approach is effective on identifying strong communities. On real datasets, interesting and meaningful communities are detected. Case studies suggest that the proposed approach can provide more reasonable results. II. METHODOLOGY The objective of this study is to track the evolution of communities over time in dynamic social networks. We represent a social network by an undirected weighted graph, where the nodes of the graph represent the members of the network, and the edge weights represent the strengths of social ties between members. The edge weights could be obtained by observations of direct inter-action between nodes, such as physical proximity, or inferred by similarities between behavior patterns of nodes. We represent a dynamic social network by a sequence of time snapshots, where the snapshot at time step t is represented by Wt =[wt], the matrix of edge weights at time t. Wt is commonly referred to as the adjacency matrix of the network snapshot. The problem of detecting communities in static networks has been studied by researchers from a wide range of disciplines. Many community detection methods originated from methods of graph partitioning and data clustering. Popular community detection methods include modularity maximization and spectral clustering. In this paper, we address the extension of community detection to dynamic networks, which we call community tracking. We propose to perform community tracking using adaptive evolutionary clustering frameworks, which we now introduce. We present our method for solving the problem of temporal community strength analysis. We begin by introducing the method of partitioning the network from each snapshot into communities in Section 2.1, and then show the method of tracking the strength of each community over time in Section 2.2. 2.1 Community Detection at Each Snapshot Given a series of temporal networks we first partition each network independently into Kt communities at each timestamp t. Due to the change of network; the value of Kt may not be the same across different snapshots. Then we store all the detected communities from all the snapshots in a community pool. To detect communities from each temporal network, we use Non-negative Matrix Factorization (NMF) techniques there are two major reasons to choose NMF: First, it can be easily applied to both hard clustering (i.e. each object belongs to exactly one community) and soft clustering (i.e. each object can belong to multiple communities). The property of soft clustering very well fits many real social scenarios. For instance, each user in social network usually participates in more than one discussion group, as he may have a variety of interested topics. Second, it could uncover the underlying intercommunity Relationships quite accurately, that can be utilized for other related tasks like progression. 2.2 Tracking communities over time There are several additional issues that also need to be addressed in order to track communities over time. The communities detected at adjacent time steps need to be matched so that we can observe how any particular community evolves over time. This can be achieved by ending an optimal permutation of the communities at time t to maximize agreement with All Rights Reserved © 2016 IJERCSE 550 International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 3, March 2016 those at time t ฀ 1. If the number of Communities at time t is small, it is possible to exhaustively search through all such permutations. This is, however, impractical for many applications. We employ the following heuristic: match the two communities at time t and t ฀1 with the largest number of nodes in agreement, remove these communities from consideration, and match the two communities with the second largest number of nodes in agreement, remove them from consideration, and so on until all communities have been exhausted. Another issue is the selection of the number of community‟s k at each time. Since the evolutionary clustering framework involves simply taking convex combinations of adjacency matrices, any heuristic for choosing the number of communities in ordinary spectral clustering can also be used in evolutionary spectral. III. EXPERIMENTS 3.1 Reality Mining Data Description: The MIT Reality Mining data set was collected as part of an experiment on inferring social networks by monitoring cell phone usage rather than by traditional means such as surveys. The data was collected by recording cell phone activity of 94 students and staff at MIT for over a year. Each phone recorded the Media Access Control (MAC) addresses of nearby Bluetooth devices at minute intervals. Using this device proximity data, we construct a sequence of adjacency matrices where the edge weight between two participants corresponds to the number of intervals where they were in close physical proximity within a time step. We divide the data into time steps of one week, resulting in 46 time steps between August 2004 and June 2005. In this data set, we have partial ground truth to compare against. From the MIT academic calendar, we know the dates of important events such as the beginning and end of school terms. In addition, we know that 26 of the participants were incoming students at the university's business school, while the rest were colleagues working in the same building. Thus we would expect the detected communities to match the participant affiliations, at least during the school terms when students are taking classes. is the same heat map when ordinary community detection at each time is used, which is equivalent to setting _t = 0 in (1). Notice that two clear communities appear in the heat map to the left, where the proposed method is used. The participants above the black line correspond to the colleagues working in the same building, while those below the black line correspond to the incoming business school students. On the heat map to the right, corresponding to ordinary. 3.2 Project Honey Pot Data Description: Project Honey Pot is an ongoing project targeted at identifying spammers. It consists of a distributed network of decoy web pages with trap email addresses, which are collected by automated email address harvesters. Both the decoy web pages and the email addresses are monitored, providing us with information about the harvester and email server used for each spam email received at a trap address. A previous study on the Project Honey Pot data found that harvesting is typically done in a centralized manner. Thus harvesters are likely to be associated with spammers, and in this study we assume that the harvesters monitored by Project Honey Pot are indeed representative of spammers. This allows us to associate each spam email with a spammer so that we can track communities of spammers. Unlike in the previous experiment, we cannot observe direct interactions between spammers. The interactions must be inferred through indirect observations. We take the edge weight between two spammers i and j to be the total number of emails sent by i and j through shared email servers, normalized by the product of the number of email addresses collected by i and by j. Since servers act as resources for spammers to distribute emails, the edge weight is a measure of the amount of resources shared between two spammers. We divide the data set into time steps of one month and consider the period from January 2006 to December 2006. The number of trap email addresses monitored by Project. Honey Pot grows over time, so there is a large of new spammers being monitored at each time step. Some spammers also leave the network over time. Observations: We make several interesting observations about the community structure of this data set and its evolution over time. The importance of temporal smoothing for tracking communities can be seen in Fig. 1. On the left is the heat map of community membership over time when the proposed method is used. On the right All Rights Reserved © 2016 IJERCSE IV. RELATED WORKS 551 International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 3, March 2016 There have been several other recent works on the problem of tracking communities in dynamic social networks. Proposed to identify communities by graph coloring; however, their framework assumes that the observed network at each time step is a disjoint union of cliques, whereas we target the more general case where the observed network can be an arbitrary graph. Proposed a method for tracking the evolution of communities that applies to the general case of arbitrary graphs. The method involves rest performing ordinary community detection on time snapshots of the network by maximizing modularity. A graph of communities detected at each time step is then created, and meta-communities of communities are detected in this graph to match communities over time. The main drawback of this approach is that no temporal smoothing is incorporated, so the detected communities are likely to be unstable. Other algorithms for evolutionary clustering have also been proposed. Relevant algorithms for community tracking include and which extend modularity maximization and spectral clustering, respectively, to dynamic data. Proposed an evolutionary spectral clustering algorithm for dynamic multi-mode Networks, which have different classes of nodes and interactions. Such an algorithm is particularly interesting for data where both direct and indirect interactions can be observed. However, one shortcoming in these algorithms is that they require the user to determine to choose the values for parameters that control how smoothly the communities evolve over time. There are generally no guidelines on how these parameters can be chosen in an optimal manner. V. EXTENSIBILITY TO OTHER APPLICATIONS By formulating the problem as the task of measuring the community strength over an observation period, we can extend our method to perform some additional tasks. In this section, we explain the extensibility of Algorithm 1to: (1) Measure the impact and consequently the change in community strength based on immediate preceding timestamps, and (2) Identify top-k strongest and weakest communities. 5.1 Community Strength Progression Net The output of Algorithm 1 provides information on how all the communities‟ strength evolve over time. In addition to that, we also want to know how the communities from immediate preceding snapshots (i.e. Ct฀1 and Ct) influence the strength of each other. To illustrate these relationships, we construct a bipartite network that represents the relationship between communities detected at snapshot t-1 and communities detected at snapshot. In such a network, the nodes on the left represent the communities detected at previous timestamp, the nodes on the right represent the communities detected at the current timestamp and the edges connecting the nodes denote the influence transmission between the communities. 5.2 Top-K strongest/weakest communities By applying Algorithm 1, we obtain the community strength for each detected community at each snapshot. Based on this output, we can compute an overall strength for each community, which is useful to identify interesting communities that are the strongest/weakest throughout the entire observation period. There are mainly two methods to aggregate the temporal community Strength scores: unweighted and weighted. In the unweighted case, we can regard each temporal score to be of equal importance and take the sum, i.e.PTt=1 azt. However, in some cases, the community strength is more important at some particular snapshots, e.g. the early stage of cancer. In such a case, we should give different weights to different snapshots and the aggregation function Can be defined as PT t=1 htazt, where ht is the weight for the specific snapshot t. In addition, when choosing the top strongest or weakest communities, we may also want to consider the size of the communities. When the target networks are very sparse, the penalty from the external connections may be very small, thus the penalty from the external interaction would be very limited. In such a case, the community strength value would be biased to the large-size communities which will contain more internal connections. To mitigate this effect, the aggregated function for community z can be redefined as:PT t=1 azt jCzj or PT t=1 htazt jCzj so that the community strength is normalized by its size. VI. CONCLUSION In this paper, we introduced a method for tracking communities in dynamic social networks by adaptive evolutionary clustering. The method incorporated temporal smoothing to stabilize the variation of communities over time. We applied the method to two real data sets and found good agreement between our results and ground truth, when it was available. We also obtained a statistic that can be used for identifying change points. Finally, we were able to All Rights Reserved © 2016 IJERCSE 552 International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 3, March 2016 track communities where the members were continually changing or perhaps assuming multiple identities, which suggests that the proposed method may be a valuable tool for tracking communities in networks of illegal activity. The experiments highlighted several challenges that temporal tracking of community‟s presents in addition to the challenges present in static community detection. One major challenge is in the validation of communities, both with and without ground truth information. Another major challenge is the selection of the number of communities at each time step. A poor choice for the number of communities may create the appearance of communities merging or splitting when there is no actual change occurring. This remains an open problem even in the case of static networks. The availability of multiple network snapshots may actually simplify this problem since one would expect that the number of communities, much like the community memberships, should evolve smoothly over time. Hence, the development of methods for selecting the number of communities in dynamic networks is an interesting area of future research REFERENCES [1]. Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: Evolutionary spectral clustering by incorporating temporal smoothness. In: Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007) [2]. Eagle, N., Pent land, A., Laser, and D.: Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences 106(36), 15274{15278 (2009) [3]. Falkowski, T., Bartelheimer, J., Spiliopoulou, and M.: Mining and visualizing the evolution of subgroups in social networks. In: Proc. IEEE/WIC/ACM International Conference on Web Intelligence (2006) dependent, multistate, and multiplex networks. Science 328(5980), 876{878 (2010) [7]. Newman, M.E.J.: Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103(23), 8577{8582 (2006) [8]. Project Honey http://www.projecthoneypot.org Pot, [9]. Prince, M., Dahl, B., Holloway, L., Keller, A., Langheinrich, and E.: Understanding how spammers steal your e-mail address: An analysis of the rest six months of data from Project Honey Pot. In: Proc. 2nd Conference on Email and AntiSpam (2005) [10]. Tang, L., Liu, H., Zhang, J., Nazeri, and Z.: Community evolution in dynamic multimode networks. In: Proc. 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008) [11]. Tantipathananandh, C., Berger-Wolf, T., Kempe, D.: A framework for community identification in dynamic social networks. In: Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007) [12]. von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395{416 (2007) [13]. Xu, K.S., Kliger, M., Hero III, A.O.: Evolutionary spectral clustering with adaptive forgetting factor. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (2010) [14]. Yu, S.X., Shi, J.: Multiclass spectral clustering. In: Proc. 9th IEEE International Conference on Computer Vision (2003) [4]. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, and M.W.: Statistical properties of community structure in large social and information networks. In: Proc. 17th International Conference on the World Wide Web (2008) [5].MIT academic calendar 2004-2005, http://web.mit.edu/registrar/www/calendar0405.html [6]. Mucha, P.J., Richardson, T., Macon, K., Porter, M.A., Onnela, and J.P.: Community structure in time- All Rights Reserved © 2016 IJERCSE 553

Log In

Tracking Community Strength in Dynamic

Sign up for access to the world's latest research

Related papers