The Reality Mining Dataset," CSCS 535 Networks: Theory and Application, 2007
The Reality Mining Dataset," CSCS 535 Networks: Theory and Application, 2007
The Reality Mining Dataset," CSCS 535 Networks: Theory and Application, 2007
Main Problem/question addressed in the paper This paper talks about the social network analysis of the Reality mining dataset. The author contrasts the similarities and the counterparts of mobile networks to that of social networks. In this paper an exploratory analysis of reality mining data(MIT Media Lab) is performed and related with the help of mobile devices and networks. This paper is inclined towards the study of community structure in the underlying networks of phone calls being made and their study relates with the group distribution in the real world. Also the inter-relationship between the calls made and the location of calls source and destination is studied with the factors like duration of the call,etc.
Main Claims/Contributions of the work done Foci of Activity or organisational structure is not the only indicators of community formation.The interesting information for the call behaviour of people is that the total and average duration of calls made when people were not at the same location was higher , instead there were more calls of median duration when people were collocated. During the instances of call network, the mutual exchange of calls or call reciprocity was higher than when two people mostly called each other when they were in the same location, however the exchange is mostly asymmetric when one person mostly calls the other. For modularity detection and communities formation algorithm by clauset, Newman and Moore is used. Study of Correlations between calls made and spatial location was carried out.
How the paper supports the main claims: A Critical review Data analysis: Focused on using community detection algorithms to understand community structure of the communication network, firstly the filtering/cleaning of the data obtained by Reality Mining Project was done. In order to visualise the networks the data was transformed into a graph format suitable for diffrent tools like GUESS(used for building graphs for display and to gain macro level insight into the network structure) and Pajek(used for finding clustering coefficients, community finding support, etc) with the help of ruby classes. A series of programs to visualise the relationship(mobile users: nodes as people in
study or called by someone in study, edges as aggregated communication between two people ) were developed. Finally, Newman Community tool (which is a agglomeration algorithm for detecting community structures for large networks) was used to find the modularity and find communities in a network. Call analysis: Bipartite graph was build between nodes-in-study and nodes-out-of-study. Nodes with degree 1 were removed and if more than two people called a similar contact not in study then only the node not in study was included in the graph.It provided an idea with who communicated with whom. Because the relative uselessness of the RAW reality data mining data, a refinement was carried out in which they first removed all the nodes with a degree 1. Bipartite graph of study participants and non-study nodes are made which gives the insight into groups of people that communicate with each other. Call location graphs: These graphs were drawn in order to examine the localities and the calling between the nearby participant. Graph of who calls who they are colocated tells about the number of outgoing calls and about the people who receives calls from many different people. Bipartite graph of callers and Locations they call from is made which tells calls which were made and received in the same location. Review quote: It was concluded that call duration when people were colocated was lesser when the callers were at distinct locations. Also there were people with more of incoming calls but very few outgoing calls and vice versa. Analysis of communities (Network Measures): Data was analyzed using the tools Guess and Pajek. Network structure was relying on the properties like clustering coefficient, average path and diameter. For example: small world network is implied by a dense participants region(high clustering coefficient) and small average path. Communities in Complete Dataset vs network of participant: Using Clauset-Newman modularity algorithm modularity of 0.85 with 34 groups was obtained for the whole dataset. It showed that people called a lot of people outside the study thus forming good number of communities. However since data about the outside people being called hence the modularity algorithm was applied only for the participants graph. To do analysis of communities on the network of participants- two groups were made. One is those who are likely to be physical closer to each other and another group which is returned by Newman Modularity algorithm. Now the comparison is done between real world groups with call network communities. The number of calls made within the participant was more frequent than the rest. The results clearly gives that there exist many factors except just spatial proximity. They associate with each other also who are not only in their foci of activity. Its due to the increased ease due to instant communication with others also.
Shortcomings/limitations The paper only have a limited insight about the data collected through 100 participants. However, an effective relationship cannot be infered with respect to the social analysis. The experimental results and conclusion drawn are based on the dataset which is not rich in terms of behavioural content. Scalability issues are present in case of limited number of participants. Calling patterns between the participants largely implied more from mobile and can be of little or no relevance with the real life. network
The one node projection can lead to loss of data and ignorance of the fact that number of out-of-study people can be very large and any usefull information lost is amplified as this number grows. Data prepration is not easy(eg cleaning, filtering) when the dataset is scaled upwards. The strategies cannot be applied or will not not be that effective if considered the evolving networks(dynamic instead of static as in this case) Evaluation of dataset can be jeopardise with the issues like Non sharable data source Privacy issues among the organisations, countries etc. Community structures uncleared requirements.
Future extension and ideas The average call duration patterns can be coupled with the user modeling techniques in order to have a more consise analysis. More details and comparisons are needed to explore the relation between mobile social network and real life groups. By analysing network behaviour the social pattern can be inferred. Graph patterns can be recognised more effectively using techniques like Modeling: by providing intuition into the mechanisms by which networks form and evolve Summarization: by providing a compact representation
Forecasting: by representing continuing trends Anomaly detection: by revealing data instances that deviate signicantly from the observed trends.