Location-Based Events Detection on Micro-Blogs

Luis Alvares

Location-Based Events Detection on Micro-Blogs

Luis Alvares

2012

visibility

…

description

10 pages

link

1 file

The increasing use of social networks generates enormous amounts of data that can be used for many types of analysis. Some of these data have temporal and geographical information, which can be used for comprehensive examination. In this paper, we propose a new method to analyze the massive volume of messages available in Twitter to identify places in the world where topics such as TV shows, climate change, disasters, and sports are emerging. The proposed method is based on a neural network that is used to detect outliers from a time series, which is built upon statistical data from tweets located on different political divisions (i.e., countries, cities). The outliers are used to identify topics within an abnormal behavior in Twitter. The effectiveness of our method is evaluated in an online environment indicating new findings on modeling local people's behavior from different places.

Location-Based Events Detection on Micro-Blogs Augusto Dias Pereira dos Santos, Leandro Krug Wives, Luis Otavio Alvares arXiv:1210.4008v1 [cs.SI] 15 Oct 2012 PPGC/UFRGS, Brazil {adpsantos, wives, alvares}@inf.ufrgs.br Abstract. The increasing use of social networks generates enormous amounts of data that can be used for various types of analysis. Some of these data have temporal and geographical information, which can be used for comprehensive examination. In this paper, we propose a new method to analyze the massive volume of messages available in Twitter to identify places in the world where events such as TV shows, climate change, disasters, and sports are emerging. The proposed approach is based on a neural network used to detect outliers from a time series, which is built upon statistical data from tweets located in different political divisions (i.e., countries, cities). These outliers are used to identify localized events within an abnormal behavior in Twitter. The effectiveness of our method is evaluated in an online environment indicating new findings on modeling local people’s behavior from different places. Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications, Data Mining; H.3.3 [Information Storage and Retrieval]: Retrieval models, Selection process Keywords: Microblogs, Socio-Geographic Analysis, Twitter Stream, Time Series, Neural Network 1. INTRODUCTION Modeling the human behavior has always been an attempt of several scientists, and with social networks this task can be done in many perspectives. Social networks allow people to interact on the Internet as they do in the real world, sharing their lives through text messages, photos, videos, and connecting to friends with comments, likes, quizzes and games. It is important to state that we follow the definition of [Wellman et al. 1996] regarding social networks, who states that when computer networks link people as well as machines, they become social networks. Some social networks in particular focus on sharing users’ short text messages. These are called micro blogs, since they are similar to web blogs but with just a few words, being very attractive to mobile appliances. The most popular micro blog is Twitter, and due to an easy-to-use API it is widely used in many mobile and desktop platforms. Twitter was launched in 2006, and after 6 years it has around 140 million active users sending an average of 340 million tweets, those short messages, per day1 . The public default policy of tweets enables researches of various areas to be done on subjects that may vary from natural language processing and data mining to public health analysis. We suggest reading the first quantitative study [Kwak et al. 2010] on the entire Twitter and its information diffusion to better understand Twitter’s topology, influential identification and trending topics’ behavior. Using Twitter in mobile devices makes it possible to embed geographical information in the tweets. Tweets stored within GPS coordinates or political division names enable us to identify from where these messages were sent and conduct a socio-geographic analysis. Socio-geographic data is very difficult to be obtained. Cellular service providers, vehicle GPS trackers and credit card companies are some examples of businesses that have these data, but lock them with strict security [Ferrari et al. 2012]. Some academic researches even needed to build their own set of data to study some socio-geographic patterns [Li et al. 2008] [Lerin et al. 2011]. 1 http://blog.twitter.com/2012/03/twitter-turns-six.html , Vol. 3, No. 3, October 2012, Pages 1–0??. 2 · A. D. P. Santos, L. K. Wives and L. O. Alvares This is why public data from social networks bring these researches to a new level with live, organic and enormous amount of data. That way, human behavior can be modeled, identifying what the users from a certain city or place are saying about a specific topic and why, i.e., what their impressions are. Along with its real-time nature, Twitter information can be used as a live sensor network, for instance, detecting earthquakes and typhoons [Sakaki et al. 2010] or local social events [Lee and Sumiya 2010]. In this paper, a topic is some subject referred to in a document and which users are talking about at any particular time, and an event means a unique thing that happens at some point of time [Allan et al. 1998] [Allan 2002]. In this context, this paper proposes a new method for using the vast volume of Twitter’s user messages to identify location-based events such as concerts, festivals, disasters, political demonstrations, etc., without having to select keywords. This points to our main contribution on event detection, changing the dimensional space from keywords to places. In this sense, the Twitter’s Streaming API2 method is used to retrieve geo-tagged and time-stamped short text messages at a worldwide coverage. Simple metrics are extracted from these messages, considering political divisions as partitions, creating time series and used as input of a neural network [Heinen et al. 2011] that models the input data based on a regression technique and identifies outliers. Text messages are then parsed to provide semantic information to the events detected. The paper is organized as follows: section 2 presents related works; in section 3, we present the proposed approach for location-based event detection; section 4 illustrates the experimental results and more detail on how the approach solves this task; and section 5 provides the conclusions and discussion of further works. 2. RELATED WORKS This section presents and discusses related works in the fields of Geo-social analysis and event detection, which are the main applications of our work. 2.1 Geo-Social Analysis Despite the early stage of location-based social networks, or social network with some location information, many researches are being conducted to extract some knowledge from geo-social relations, in order to improve the location prediction of individuals in a social network better than with IP-based geo-location. Backstrom et al. [Backstrom et al. 2010] used user-supplied addresses and the network of relation between profiles of the Facebook social network. Besides performing 69.1% of accuracy with their best method, against 57.2% for IP location, some interesting geo-social relations were confirmed, as intuitively known: people living in metropolitan areas are more cosmopolitan; they are more likely to have ties to distant places; the higher the population density, the lower the probability of knowing a person inside a square mile; and, in their data, 96% of people live in areas less dense than 50 people per square mile. For geographic mood characteristics analysis, Mislove et al [Mislove et al. 2010] analyzed tweets posted from September 2006 to August 2009, extracting words containing psychological rating, according to ANEW system [Bradley and Lang 1999], and matching them with the user profile location to identify some mood variations over the week, the hours of the day and the costs of the United States. These messages suggest that the West coast is happier than the East coast, and that happiness peaks occur each Sunday morning, with a trough on Thursday evenings, having the early morning and late evening the highest level of happy tweets. These works model some aspects of human behavior, but using static geographical information. Our study focuses on using information that changes in time and space with greater rate. 2 https://dev.twitter.com/docs , Vol. 3, No. 3, October 2012. JIDM - Journal of Information and Data Management · 3 Due to its real-time property and massive adoption in the world, Twitter can be used as a sensor network for natural and social event detection, sometimes before its coverage by the news media or the government. In Sakaki’s [Sakaki et al. 2010] work, they use geo-located tweets that have keywords related to natural hazard events such as earthquake or shaking to detect such events. With particle filtering, they can estimate the centers of earthquakes and the trajectories of typhoons, detecting 96% of earthquakes, with seismic intensity scale of 3 or more, registered by Japan’s Meteorological Agency. In a recent work, Lee [Lee and Sumiya 2010] developed a system to discover unusual regional social activities using Twitter geo-tagged information. Their framework has four steps: Collecting crowd experiences via Twitter, establishing natural socio-geographic regions, estimating geographical regularity of local crowd behavior, and detecting unusual geo-social events. The first step uses a divide and conquer solution to solve the Twitter Search API restriction of 1,500 results per query. The second uses K-Mean clustering algorithm with Voronoi’s diagram [MacQueen 1967] to create socio-geographic regions, a step that can impact a online system. On the third one, three metrics are estimated for each cluster, hourly: number of tweets, number of users, and movement of local crowd. The last step divides the day in 6-hour periods and calculates the regularities of each cluster’s metric using box plots that can also detect unusual statuses. This method detected 903 unusual activities from 7,200 possible (300 clusters x 6 days x 4 periods (6-h)) and compared to the investigated list of 50 events, from Japan’s local event guide site, 32 of them could be found, resulting in a recall performance of 64% (32/50) plus a precision rate of 3.54% (32/903). We must consider that this list is somewhat restricted, because other unexpected events, off the list, occurred and were detected. Despite the great advances in local event detection, driven primarily by the movement of local crowd’s metric, there are some deprecated issues, unnecessary steps and heavy processing. 2.2 Event Detection Event detection and tracking is a subset of problems from topic detection and tracking (TDT). The early definitions are from [Allan et al. 1998; Allan 2002], in an initiative to investigate the state-ofthe-art on finding and following new events in a stream of broadcast news stories. With the huge amount of information available on-line, the World Wide Web is a fertile source for that kind of event detection, and web mining research is at the crossroad of research from several research communities [Kosala and Blockeel 2000]. Over the last 10 years, user-generated content has come to dominate a large portion of the web and a real-time web has arisen to challenge number of areas of research, notably information retrieval and web data mining [Bermingham and Smeaton 2010]. Becker [Becker et al. 2011] presents a task of event identification on Twitter that is based on text analysis and clustering approaches, and shows numerous categories of features that must be considered: temporal, social, topical, and Twitter-centric. He also analyzes the different features that can impact the performance of a real-time system for event detection. The proposed technique for event identification offers a significant improvement over other approaches, showing that they can identify real-world event content in a large-scale stream of Twitter data. The use of location-based signals in event identification is suggested for future work. A filtered stream of tweets to automatically identify events of interest, using just the volume of tweets generated at any moment of an event, was suggested by [Lanagan and Smeaton 2011] to provide a very accurate means of event detection, as well as an automatic method for tagging events with representative words from the tweet stream. That approach leads to the problem of choosing a set of words and tags that represent a field of interest, missing any other event that doesn’t match it. , Vol. 3, No. 3, October 2012. 4 · A. D. P. Santos, L. K. Wives and L. O. Alvares Fig. 1. Proposed data flow 3. THE PROPOSED APPROACH To achieve the detection of events based on location using the huge amount of data provided by Twitter, we proceeded with the simpler data flow possible that lead to this goal. Figure 1 shows these flow as described below: —Tweets: A crawler collects tweets from Twitter using Streaming API service; —Places Metrics: Creates two time series from the number of tweets and users in a time instance (or bin); —IGMN: The neural network is used to create data models and identify outliers; —Place Outliers: Consist in the time instances that were detected as outliers in both time series; —Events Description: Through the messages contained in the time instance outliers it is possible to evaluate and understand the triggered event. In relation to the crawler, it is important to state that Twitter’s Streaming API is one of many Twitter’s public services available. It allows real-time access to various subsets of public tweets with high throughput. Any message sent to the social network, with public permission and that matches a given query, will be delivered to the crawler. This service has filter parameters such as tracking some keyword occurrences in status messages, following tweets from a specific set of users or specifying a set of geographic bounding boxes to track. In this aspect, it is important to state that, since September 2010, the bounding box can be of worldwide coverage, allowing the retrieval of all tweets in a single query, and thus there is no need any more to build a monitor system as Lee [Lee and Sumiya 2010] suggests. Each status message given by this API contains the text of the message, its creation’s date/time, the message’s id, the id and the full profile information of the user that has sent the message, and, sometimes, both place/country name and latitude/longitude, or just one of them. This happens because this information is sensible and for the sake of privacy the user may state whether or not he wants to share such specific latitude/longitude information or just the place’s name. Current localization technology used by Twitter comprises GPS and GPS-A (which have latitude/longitude information) and originating IP (which has not latitude/longitude information). The location technology used can also be retrieved, if allowed by the user, besides information given by the Twitter’s geographic database (which doesn’t have all world’s countries, provinces/states, cities, neighbors and areas names). For the last problem, we use a geographic database source3 to translate those latitude/longitude information into names that are not known by Twitter . For instance, many Eastern countries and 3 http://geocommons.com/overlays/85161 , Vol. 3, No. 3, October 2012. JIDM - Journal of Information and Data Management · 5 cities have blank names in the service API. So, this step is important because all our analysis is based on grouping tweets in sets of places as shown in Figure 1. This location identification process is made during real-time streaming consumption. Once the messages are localized (i.e., have location information), the next step consists on the identification of events. For this task, as stated before, we use a neural network (IGMN) to analyze time series and find outliers. A time series is a sequence of observations occurring in equal time intervals, having some basic properties/components [Brockwell and Davis 1986]. In a time series there are different components, for instance, seasonal component, trend component, and so on. The seasonal component describes when the time series’ data experience regular changes which recur in some period of time (e.g., daily, weekly, monthly, and so on). The trend component indicates a series with upward or downward long term movement. Thus, the series is stationary when the mean, variance and autocorrelation structures do not change over time, and doesn’t have a trend. A multivariate time series has more than one variable, while a univariate time series has only one variable. Our data can be described as a stationary, seasonal and univariate time series. After the time series analysis is performed, we apply specific metrics to detect events. The metrics used in this work are extracted by grouping the text messages in sets of cities, provinces/states or countries, depending on the amount of information in each instance, then computing the number of users and number of tweets, creating two separate time series. We have chosen simple metrics like these because our intention was to develop a real time on-line event detection system. So we needed to decrease the framework’s processing time. The usage of geographic names improved the framework in two ways: —Despite the linear complexity of K-Means, used on [Lee and Sumiya 2010], there is no need to use clustering algorithms, since the message clustering is based on political divisions; —We increased the amount of analyzed tweets using all types of messages: —With and without GPS features; and/or —With and without places’ names. Once this splitting is done, we have a set of m messages for each political division chosen. Thus the metrics are collected for each time instance (1 minute, 10 minutes, 1 hour, 6 hours, etc.) during a period of d days, creating a time series. Lee’s approach [Lee and Sumiya 2010] splits the day in 6-hour periods and uses box plot statistical analysis to detect outliers. We have discovered that this 6-hour period can hide some interesting detailed information about events happening in these political division areas, because the tilt’s curve is relevant in a 6-hour slice, smoothing the mid curve outliers and empowering the beginning/end period data outliers. Beyond that, box plot is a univariate statistical tool [Härdle and Simar 2012] and the Twitter stream has a temporal dependency, as can be observed in Figure 2. The term univariate has different meaning in time series analysis, it refers to a time series that consists of single (scalar) observations recorded sequentially over equal time increments, time is in fact an implicit variable in the time series4 . For the outliers detection task we use the Incremental Gaussian Mixture Network (IGMN) [Heinen et al. 2011], a neural network that creates and continually adjusts probabilistic models consistent to all sequentially presented data, after each data point presentation, and without the need to store any past data points. Its learning process is aggressive, or "one-shot", meaning that only a single scan through the data is necessary in order to obtain a consistent model. Compared to (S)ARIMA [Brockwell and Davis 1986] has equivalent root mean square error without the need to pre-understand the time series components and data correlation imposed to (S)ARIMA’s parameters, that facilitates the process of adding new places to the framework. The incremental process is another advantage 4 http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm , Vol. 3, No. 3, October 2012. 6 · A. D. P. Santos, L. K. Wives and L. O. Alvares against (S)ARIMA that needs a long period of data to model time series, that makes it possible to extend the framework for real-time analysis of Twitter stream. After the outliers detection phase, each outlier represents a time instance that is analyzed for its content. Which event triggers these outliers? We collect all messages in this time instance. Those messages are processed in a search of most frequent words, ignoring stop words. The stop word database needs to be rebuilt to the short text message context, which uses a lot of abbreviations. These top rank words can provide us with a great idea of the triggered event, confirmed or not by the web and news search over the Internet. 4. EXPERIMENTS For performing our experiments we have collected data from Twitter since January 2011. We have adjusted the locations parameter of the Twitter’s Streaming API to the bounding box corresponding to (-179.99, -89.99, 179.99, 89.99), which relates to the entire globe. Today we count with more than 1.4 billion geo-tagged messages, and around 10 million users. Considering this data set we have found that these users produced about 4.1 million geo-tagged tweets per day, where 42.25% contained geographic coordinates, and 93.49% contained places’ names. With an on-line collecting system, a routine calculated countries’ tweets of non-set country messages using the country boundary geographic database in a PostGIS server server5 . Data were stored in a MySQL6 database with a single structure: tweets’ and users’ tables, indexing message id, user id, created at timestamp, country and city columns for faster grouping by clause. A 3-tier architecture provided more concurrence in order to avoid overload in the database; one server is the collector, sending packages of 30 minutes’ data to the data storage computed by the processor that generates the time series, detects the outliers and fetches the most frequent words used to describe the event. The first step to create our time series is to choose a political division or place. We have five types of places, from Twitter definitions: country, admin (province/state), city, neighbor and POI (i.e., points of interest like restaurants, stores, museums, etc.); from wide to narrow areas. The wider the area, the more tweets per second are generated, but some places have a greater rate than others. Besides that, as more restrict is the area, the more local the event, we need a minimum number of messages per time instance in order to make the time series smoother. If we get few tweets per bin, the time series gets fluctuated values. The bin’s size, which determines the amount of messages, needs to be evaluated to each place in order to identify which value gets the best event detection. Figure 2 shows some samples of tweets’ time series generated with a bin of 10 minutes, for visualization purposes, in which it is easy to see a pattern of daily seasonality, represented by 144 values per day. Ordered by the volume of messages per bin, this figure shows events with different characteristics, all of them identified as outliers by our approach. The real date and time in which the event starts is indicated in the figure as its disturbance on the time series: —Oslo bombing event: great disturbance on time series and long duration; —Munich soccer match: great disturbance and short duration; —São Paulo carnival vote counting: small disturbance and short duration Once the bin size is chosen, two time series are made: —Tweets time series (TweetsTS ): each value represents the amount of messages sent to Twitter server in one time instance; 5 http://postgis.refractions.net/ 6 http://www.mysql.com/ , Vol. 3, No. 3, October 2012. JIDM - Journal of Information and Data Management Fig. 2. · 7 Sample of events from Olso, Munich and São Paulo —Users time series (UsersTS ): each value represents the number of unique users who have sent messages to Twitter at that time instance. To obtain the relevant outliers, each time series is modeled by the neural network, which returns the outliers of each one. An outlier is considered relevant when a time instance is detected as an outlier in both time series. It is noteworthy that the IGMN consider the values that are above or below the local likelihood as being outliers. However, in this work, we are only interested in the values above such likelihood, since they represent data beyond the normal volume. Outliers = Intersect(TweetsTS.outliers_above, UsersTS.outliers_above) (1) Another parameter can be tuned to result in better quality events. The IGMN adjusts its models to the presented data using clustering techniques, and the similarity between the inputs is measured by the probability of each input belonging to the existing clusters. In this sense, the standard deviation may be used to indicate when a new cluster must be created, i.e., if the new data is too different from any cluster, this parameter is used to detect if a given input should be considered an outlier, based on the local likelihood. For preliminary analysis and to evaluate the method’s precision over different parameters, we have chosen the city São Paulo, Brazil, as a place (political division), because it is the number one city in the world in volume of tweets with geographic information. For this article the period from 2012-02-19 to 2012-02-24 was selected for those tests be done We begin by examining the performance of the outliers’ detection against the number of events occurred, unique, duplicated and missed events. Events occurred are events that happened in the real world and that were evaluated using the most frequent words in the messages of each bin matching with the result of a local newspaper’s web search, using the time instance date as filter. We test the bin’s size parameter for 1, 5 and 10 minutes, over the same period (Figure 3), the precision rate score is presented with the mentioned metrics (Table I). As bin size increases, it smooth local data likelihood , Vol. 3, No. 3, October 2012. 8 A. D. P. Santos, L. K. Wives and L. O. Alvares · Fig. 3. Bin Size 1 minute 5 minutes 10 minutes Standard Deviations 3 4 5 Tweets time series on different bin’s size and the detected outliers Table I. Precision rate scores on different bin’s size Total Detected Unique Duplicate Missed Outliers Happened Events Detections Events Events 90 22 6 16 0 20 12 4 8 2 7 5 3 2 3 Table II. Precision rate scores on different deviations Total Detected Unique Duplicate Missed Outliers Happened Events Detections Events Events 90 22 6 16 0 31 11 5 6 1 12 8 3 5 3 Precision Rate 24.44% 60.00% 71.43% Precision Rate 24.44% 35.48% 66.67% making outliers the only values with significant difference. Otherwise, some not so substantial events occurred are missed. The next parameter evaluated, standard deviation, was tested with a time instance of 1 minute size and different values of deviations, i.e., 3, 4 and 5. Not surprisingly, the number of outliers detected decreased as the deviation increased (Figure 4), but the change on the precision rate did not evolve like the previous experiment (Table II). Our first assumption is that the 1-minute bin makes the time series rough and sensible to any minimum disturbances, making the deviation parameter tune incapable of getting better results. On the other hand, just increasing the bin’s size will cause the loss of the real-time approach capability, as well as of some events. Therefore, a suggested approach is to combine the tuning of these parameters (a task that is reserved for future work). In the task of evaluating the outliers with real-world events, the use of the most frequent terms allows , Vol. 3, No. 3, October 2012. JIDM - Journal of Information and Data Management Fig. 4. · 9 Tweets time series on different deviation and the detected outliers us to understand the kinds of topics that trigger Twitter users to post significantly more messages than the usual. Firstly, we must understand that cultural aspects can influence social media services usage, so our findings consider, yet, only São Paulo’s social behavior. All events occurred detected by our framework had televised coverage, but some with broad and other with local geographical interest (Table III). This leads us to new perspectives of specializing event detection with only local relevance. 5. CONCLUSIONS This paper presented a new method to discover events based on location over the Twitter stream, using time series analysis, and how this approach can lead to representative outliers with no need to previously select keywords, nor use clustering algorithms for geographic location grouping. This work provides the first step in a series of method to improve the detection of events with local relevance. In future work, we will generate statistical measures of performance and compare our proposition Table III. Event Description Events identified by the proposed approach Terms Soccer match for Copa Libertadores in Venezuela National reality TV show Soccer match on regional championship out the city Riots at carnival vote counting Two soccer games in the regional championship out the city Soccer match on regional championship in the city Corinthians, jogo, libertadores, gol, timão Yuri, fael, bbb, lider, ganhar Corinthians, willian, douglas, gol, jogo Geographical Interest Broad Broad Broad Gaviões, carnaval, nota, fogo, apuração, escola Gol, jogo, bragantino, time, corinthians Local Ganhar, vergonha, deus, palmeiras Local Broad , Vol. 3, No. 3, October 2012. 10 · A. D. P. Santos, L. K. Wives and L. O. Alvares with Lee’s and Becker’s method, and how those frameworks behave in a real-time environment, which can show how IGMN reuse benefits the performance. To do this comparison, we need to compute Lee’s aggregation and dispersion metric, but other metrics with linear processing time can be built in order to consider the users’ movement. To compute our method’s precision and recall rate we intend to use human annotators and a news database to automate the events evaluation. A visualization system is suggested to provide more relevant information to the end user. REFERENCES Allan, J. Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Umass, J. A., Cmu, B. A., Cmu, D. B., Cmu, A. B., Cmu, R. B., Dragon, I. C., Darpa, G. D., Cmu, A. H., Cmu, J. L., Umass, V. L., Cmu, X. L., Dragon, S. L., Dragon, P. V. M., Umass, R. P., Cmu, T. P., Umass, J. P., and Umass, M. S. Topic detection and tracking pilot study: Final report, 1998. Backstrom, L., Sun, E., and Marlow, C. Find me if you can: improving geographical prediction with social and spatial proximity. In Proceedings of the 19th international conference on World wide web. WWW ’10. ACM, New York, NY, USA, pp. 61–70, 2010. Becker, H., Naaman, M., and Gravano, L. Beyond trending topics: Real-world event identification on twitter. In ICWSM, 2011. Bermingham, A. and Smeaton, A. F. Crowdsourced real-world sensing: sentiment analysis and the real-time web. Challenges, 2010. Bradley, M. M. and Lang, P. J. Affective norms for english words (anew): Instruction manual and affective, 1999. Brockwell, P. J. and Davis, R. A. Time series: theory and methods. Springer-Verlag New York, Inc., New York, NY, USA, 1986. Ferrari, L., Mamei, M., and Colonna, M. People get together on special events: Discovering happenings in the city via cell network analysis. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on. pp. 223–228, 2012. Härdle, W. and Simar, L. Applied Multivariate Statistical Analysis. Springer, 2012. Heinen, M. R., Engel, P. M., and Pinto, R. C. Igmn: An incremental gaussian mixture network that learns instantaneously from data flows, 2011. Kosala, R. and Blockeel, H. Web mining research: a survey. SIGKDD Explor. Newsl. 2 (1): 1–15, June, 2000. Kwak, H., Lee, C., Park, H., and Moon, S. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web. WWW ’10. ACM, New York, NY, USA, pp. 591–600, 2010. Lanagan, J. and Smeaton, A. F. Using twitter to detect and tag important events in sports media. In ICWSM, 2011. Lee, R. and Sumiya, K. Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks. LBSN ’10. ACM, New York, NY, USA, pp. 1–10, 2010. Lerin, P. M., Yamamoto, D., and Takahashi, N. Inferring and focusing areas of interest from gps traces. In Proceedings of the 10th international conference on Web and wireless geographical information systems. W2GIS’11. Springer-Verlag, Berlin, Heidelberg, pp. 176–187, 2011. Li, Q., Zheng, Y., Xie, X., Chen, Y., Liu, W., and Ma, W.-Y. Mining user similarity based on location history. In Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems. GIS ’08. ACM, New York, NY, USA, pp. 34:1–34:10, 2008. MacQueen, J. B. Some methods for classification and analysis of multivariate observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, L. M. L. Cam and J. Neyman (Eds.). Vol. 1. University of California Press, pp. 281–297, 1967. Mislove, A., Lehmann, S., Ahn, Y., Onnela, J., and Rosenquist, J. Pulse of the nation: Us mood throughout the day inferred from twitter, 2010. Sakaki, T., Okazaki, M., and Matsuo, Y. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web. WWW ’10. ACM, New York, NY, USA, pp. 851–860, 2010. Wellman, B., Salaff, J., Dimitrova, D., Garton, L., Gulia, M., and Haythornthwaite, C. Computer networks as social networks: Collaborative work, telework, and virtual community. Annual Review of Sociology 22 (1): 213–238, 1996. , Vol. 3, No. 3, October 2012.

Log In

Location-Based Events Detection on Micro-Blogs

Sign up for access to the world's latest research.

Related papers

Related papers