Traffic Condition Is More Than Colored Lines On A Map: Characterization of Waze Alerts
Traffic Condition Is More Than Colored Lines On A Map: Characterization of Waze Alerts
Traffic Condition Is More Than Colored Lines On A Map: Characterization of Waze Alerts
1 Introduction
Participatory Sensing Systems (PSSs) [1,2] are revolutionizing the way we see cities,
societies and the interactions among people. PSSs provide a mobile interface that allows
people carrying smartphones to share data about the environment (or context) they are
inserted in at any time and place. These systems certainly have the power to contribute
in the process of making ubiquitous computing a reality. Consider the large variety
of PSSs already deployed and functioning at global scale, such as Foursquare1, Insta-
gram2 , Weddar3 , and Waze4 . Each of these systems can provide valuable information
about an aspect of a given city or society in almost real-time, such as its traffic and
weather conditions, local parties and festivals, riots, among others. More importantly,
the cost for obtaining this data is almost negligible, since it is distributed among all the
people who are sharing it.
From participatory sensing systems we can derive participatory sensor networks
(PSNs), where each node in the network consists of a user equipped with a mobile de-
vice, sending data to web services. In this direction, we can view PSNs as sensing layers
1
http://www.foursquare.com
2
http://www.instagram.com
3
http://www.weddar.com
4
http://www.waze.com
A. Jatowt et al. (Eds.): SocInfo 2013, LNCS 8238, pp. 309318, 2013.
c Springer International Publishing Switzerland 2013
310 T.H. Silva et al.
of a global scale sensor network that uses humans in the sensing process. For example,
from Waze we can obtain a layer about the traffic, from Instagram a layer containing
pictures of places, from Foursquare, we can obtain a layer about the category of loca-
tions, and from Weddar, a layer about weather conditions. Each layer is responsible for
sensing data related to a certain aspect, for instance traffic or weather conditions, of a
specific area in the globe, such as countries, cities, or neighborhoods. In this work, we
focus our analysis on a specific sensing layer, the one responsible for sensing traffic
conditions. Data collected from this layer, as well as from others as mentioned above,
have the potential to transform society. They enable the understanding of city dynamics
and the urban behavioral patterns of their inhabitants, supporting smarter decision mak-
ing. In fact, real-time traffic maps could inform more than the traffic flows conditions
(usually represented by colored lines in the map), for example, it could provide routes
that cause less pollution to the city, dangerous areas to avoid, among others.
In order to evaluate the potential of the traffic sensing layer, we here analyze par-
ticipatory data coming from Waze the most popular traffic report application. Waze
was created in 2008 and recently, had approximately 50 million users [3]. Waze period-
ically collects sensor data from mobile phones, and uses it to compute the speed of their
devices to infer traffic conditions. The system also offers to its users predefined alerts
stating incidents such as traffic jams and police traps, extending the information about
traffic conditions. One of Wazes main features is the user engagement to contribute to
the common good, i.e., Waze is not just crowdsourcing, but personal participation [3].
The objective of this work is to characterize the properties of the PSN derived from
Waze, its broad and global spatial coverage as well as its limitations. Moreover, we dis-
cuss different opportunities for application design using data collected from Waze. For
example, such data could be exploited to drive improvements in algorithms for naviga-
tion services and to support quicker identification of information about car accidents,
potholes, and slippery roads, which are valuable information that are hard to detect with
traditional sensors.
The rest of this paper is organized as follows. In Section 2 we present the related
work. In Section 3 we discuss the participation of human in the process of sensing. In
Section 4 we present the characterization of a PSN derived from Waze. In Section 5 we
illustrate some of the possible applications based on the data shared in Waze. Finally,
in Section 6 we present the conclusions and future work.
2 Related Work
Data obtained from participatory sensor networks (PSNs) may be very complex and,
therefore, a fundamental step in any investigation is to analyze the collected data to
understand its characteristics and usefulness. There are several proposals devoted to
the study of specific characteristics of PSNs. For example, in location sharing services
like Foursquare, Cranshaw et al. [4] presented a model to extract distinct regions of
a city that reflect current collective activity patterns. In a previous work [1], we have
characterized data collected from three distinct PSNs derived from location sharing ser-
vices, such as Foursquare. Among the results, we showed the planetary scale of those
networks, as well as the highly unequal frequency of data sharing, both spatially and
temporally, which is highly correlated with the typical routine of people. In another
Traffic Condition Is More Than Colored Lines on a Map 311
previous work [5], we performed the first characterization of Instagram using photos
shared by users, analyzing them from a sensor network point of view. We showed that
photo-sharing systems, particularly the Instagram, can also be used to map the charac-
teristics of urban locations at a low cost.
Quercia et al. [6] studied how social media communities resemble real-life ones.
They tested whether established sociological theories of real-life social networks hold
in Twitter. They found, for example, that social brokers in Twitter are opinion leaders
who take the risk of tweeting about different topics. Frias-Martinez et al. [7] proposed
a technique to determine the most common activities in a city by studying tweeting
patterns. Sakaki et al. [8] studied the real-time interaction of events (e.g., earthquakes)
in Twitter and proposed an algorithm to monitor tweets to detect a target event.
To the best of our knowledge, Fire et al. [9] is the only prior work to analyze Waze.
They showed that it might be possible to identify areas where accidents are more likely
to occur by analyzing user accident reports in Waze.
Our work differs from all previous studies as is the first characterization of Waze
from a crowdsensing point of view. Moreover, continuing our recent studies [1,5], we
show that traffic alert sharing systems, particularly Waze, can also be exploited for map-
ping the characteristics of urban locations at a low cost, providing complementary data
in relation to those obtained from other types of systems, such as location or photo
sharing system. As previously mentioned, we believe that the personal involvement of
users in such system can allow inferring much richer conclusions about traffic condi-
tions than the usual colored information about traffic jam provided by on-line traffic
websites. This work also discuss possible ways towards this goal.
4 Characterization of Waze
This section investigates the participatory sensor network (PSN) derived from Waze.
P[X x]
2
10
2
10
3
10
3
10 4
10
4 5
10 0 1 2 3
10 0 1 2 3 4
10 10 10 10 10 10 10 10 10
x[# of alerts] x[# of alerts]
amount of shared data, because the motivation of users to share data in such systems
is different from Waze. For example, a large supermarket may be visited by a large
number of people on a daily basis, but it is not likely that those people will share many
check-ins or photos at it.
2 2 1
10 data 10 data
loglogistic =0.93 0.8
Odds Ratio
Count
0.6
P[x>X]
1
10 0
10 0.4
0.2
0
10 0 1 2 3 4 5 0 2
10 10 10 10 10 10 10 10 0 0
(min) (min)
2 4 6
10 10 10 10
t t t (min)
(a) Histogram (pop. quad.) (b) Odds ratio (pop. quad.) (c) CDF (all quadrants)
Fig. 4. Time intervals between consecutive alerts, not necessarily done by the same user
z
log(x)
10
Probability Density Function: f (x|, ) = 1 x1 (1+e
e
z )2 ; x 0, where z =
.
11 CDF (x)
OR(x) = 1CDF (x) , where CDF(x) is the cumulative density function, in this case, of the
inter-sharing time t distribution.
Traffic Condition Is More Than Colored Lines on a Map 315
0
10 0.8 0.8
1
P[x > X]
P[x > X]
10 0.6 0.6
2
10 0.4 0.4
P[Xx]
3
10 0.2 0.2
4
10 0 10 5 0 5
0 4 2 0 2 4 6
10 10 10 10 10 10 10 10 10 10
All intersharing distances (Km) Median intersharing dist. (Km)
5
10 0
10 10
1
10
2
Fig. 5. CCDF of the number of Fig. 6. Distribution of the geographical distance between con-
shared alerts (same user) secutive data of the same person
We now analyze the spatial distance between consecutive alerts by the same user, by
taking the distance [13] between the geographic coordinates associated with both alerts.
In Figure 6a, we show the CDF of the distances between consecutive alerts shared by
each user, for all users. Note that a large portion of the distances are very short: for
instance, around 30% are below 1 meter. Such large fraction of small distances between
consecutive sharings were also observed in photo sharing [5] and, to a lesser extent,
location sharing services [10]. In the latter, Noulas et al. [10] observed that 20% of the
consecutive sharings by the same user were in locations that were apart from each other
by up to 1 km. For photos and alerts, this fraction raises to approximately 45% and 80%,
respectively. This suggests that users tend to share multiple alerts in the same location.
In Figure 6b, we show similar results for the distribution of the median distance
between consecutive sharings for each user. That is, even aggregating results for each
user, we still observe that alerts are shared at very short distances: around 15% of users
share alerts 1 meter apart from each other.
more opportunities to share data in location and photo sharing systems (e.g., in a night
club or in a concert).
Intense user activity during the weekends, as observed in location sharing ser-
vices [1,14] and photo sharing services [5], is not observed here. This might indicate
that the reasons motivating users to contribute alerts are distinct from the ones to per-
form check-ins. In Figure 7b, we show the average number of data sharings throughout
the day, separately for weekdays (Monday to Friday) and weekends (Saturday and Sun-
day). Note the two clear peaks of activity, one around 7 to 8 AM and the other around 6
PM, coinciding with typical rush hours in urban areas. This result is different from the
three clear peaks previously observed in location sharing services [1,14], around break-
fast, lunch and dinner times, as well as from the two peaks during lunch and dinner
times in photo sharing [5].
3500 4000
Weekday
3000
Weekend
Total # of data
Total # of data
3000
2500
2000
2000
1500
1000
1000
500
0 0
Mon Tue Wed Thu Fri Sat Sun 2 4 6 8 10 12 14 16 18 20 22
Time (week days) Time in hours
We now analyze the hourly variations of alert sharings in six large cities: Chicago
and New York (Figure 8a13 ); Belo Horizonte and Sao Paulo in Brazil (Figure 8b); and
London, and Paris (Figure 8c). Note that the curve of each city follows the general trend
observed for all locations (Figure 7b).
Belo Horizonte
1 Chicago 1 1
Sao Paulo London
New York
Paris
0.8 0.8 0.8
# of data
# of data
# of data
0 0 0
2 4 6 8 10 12 14 16 18 20 22 2 4 6 8 10 12 14 16 18 20 22 2 4 6 8 10 12 14 16 18 20 22
Hour Hour Hour
Fig. 8. Alerts sharing throughout the day in different cities around the world
We can also observe that the peaks reflect distinct rush times that are related to the
common working hours of different cities. In Chicago (Figure 8a) the morning peak is
around 7 AM, as in the two European cities (Figure 8c). In contrast, in New York and in
the Brazilian cities (Figure 8b), the morning peak is usually one hour later, suggesting
that people tend to leave later to work in those cities. The second most expressive peak in
13
Each curve is normalized by the maximum number of alerts shared in the city in question.
Traffic Condition Is More Than Colored Lines on a Map 317
both American cities is around 5 PM, which is similar to the European cities. However,
this is distinct from the Brazilian cities, which have a peak of activity around 6 PM.
To complement this analysis, we performed, from July 16th to July 18th, an hourly
collection of traffic conditions of Paris, using Google Maps. We note that the time of
the observed peaks reflects relatively well intense traffic conditions reported by Google
Maps, whereas the reduced activity prior and after the peaks also reflects better traffic
conditions. This suggests that this information could be used to assure the quality and
improve traffic condition information services, such as those offered by Google Maps.
4.6 Discussion
We showed the planetary scale of the studied PSN, derived from Waze. We also showed
the highly unequal frequency of data sharing, both spatially and temporally, which is
highly correlated with the typical routine of people. Our characterization provided a
deep understanding of the properties of this particular PSN, revealing its potential to
drive various studies on city dynamics and urban social behavior, as discussed next.
This section discusses some possible situations where a PSN derived from Waze can
be exploited to build new services and applications. As discussed in Section 4.1, the
most often reported problem by Waze users is traffic jam. Since this is a common cause
of complaints and many other problems may end up resulting in traffic jam, a natural
question that arises is: What are the causes of traffic jam? This is not an easy question to
answer, and it may vary from place to place. However, the shared alerts in Waze might
help us to understand the causes.
More specifically, we note that the analysis of the traffic alerts can lead to a more
detailed investigation of traffic conditions. For instance, the real-time identification of
locations with potholes or animals in the road, whose detection is hard with traditional
sensors, becomes more feasible when users participate in the sensing process. This
is useful to discover not obvious reasons for a frequent traffic jam. Besides that, such
detection opens opportunities for various services, such as, help smart cars in the correct
identification of problems on the road.
In the same direction, the identification of problematic roads might also be possible
by looking at the number of alerts reported on a road. For example, if we take the top
five reported locations in Belo Horizonte (Cristiano Machado Av; Raja Gabaglia Av;
Contorno Av; Beltline Rd; Amazonas Av) and Paris (A15; N104; A6 - E15; A13 - E05;
N118), we observe that all of them are roads that typically present traffic problems,
especially on rush hours. This shows that it is possible to identify problematic roads
using traffic alerts. However, the main advantages of using a PSN of traffic alerts do not
lie on discovering common problematic roads, but on detecting unusual ones. This is
possible thanks to the capability of Waze alerts in describing real-time incidents, what
can help to discover particular patterns not generally known.
This information could be used to improve algorithms for navigation services. Be-
sides that, traffic information services, such as Bing Maps, could also benefit from this
318 T.H. Silva et al.
information to assure the quality about the reported traffic condition, as we observed in
the Section 4.5. Moreover, an urban planner could use this information to assess the ef-
fectiveness of previous roadworks. For instance, it could be verified if roadworks in the
Cristiano Machado Avenue, a very problematic road in Belo Horizonte, were effective
to reduce the number of problems reported in that road.
References
1. Silva, T.H., Vaz de Melo, P.O.S., Almeida, J.M., Loureiro, A.A.F.: Challenges and opportu-
nities on the large scale study of city dynamics using participatory sensing. In: Proc. of ISCC
2013, Split, Croatia (July 2013)
2. Burke, J., Estrin, D., Hansen, M., Parker, A., Ramanathan, N., Reddy, S., Srivastava, M.B.:
Participatory sensing. In: Workshop on World-Sensor-Web, WSW 2006 (2006)
3. Goel, V.: Maps That Live and Breathe With Data. The New York Times (June 2013)
4. Cranshaw, J., Schwartz, R., Hong, J.I., Sadeh, N.: The Livehoods Project: Utilizing Social
Media to Understand the Dynamics of a City. In: Proc. of ICWSM 2012, Dublin, Ireland
(2012)
5. Silva, T.H., Vaz de Melo, P.O.S., Almeida, J.M., Salles, J., Loureiro, A.A.F.: A picture of
Instagram is worth more than a thousand words: Workload characterization and application.
In: Proc. of DCOSS 2013, Cambridge, USA, pp. 123132 (May 2013)
6. Quercia, D., Capra, L., Crowcroft, J.: The social world of twitter: Topics, geography, and
emotions. In: Proc. of ICWSM 2012, Dublin, Ireland (June 2012)
7. Frias-Martinez, V., Soto, V., Hohwald, H., Frias-Martinez, E.: Characterizing urban land-
scapes using geolocated tweets. In: Proc. of SocialCom 2012, Washington, USA (2012)
8. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detec-
tion by social sensors. In: Proc. of WWW 2010, Raleigh, USA, 851860. ACM (2010)
9. Fire, M., Kagan, D., Puzis, R., Rokach, L., Elovici, Y.: Data mining opportunities in geosocial
networks for improving road safety. In: Proc. of IEEEI 2012, pp. 14 (2012)
10. Noulas, A., Scellato, S., Mascolo, C., Pontil, M.: An Empirical Study of Geographic User
Activity Patterns in Foursquare. In: Proc. of ICWSM 2011, Barcelona, Spain (2011)
11. Vaz de Melo, P.O.S., Faloutsos, C., Loureiro, A.A.: Human dynamics in large communication
networks. In: Proc. of SDM 2011, Mesa, USA (2011)
12. Malmgren, R.D., Stouffer, D.B., Motter, A.E., Amaral, L.A.N.: A poissonian explanation for
heavy tails in e-mail communication. PNAS 105(47), 1815318158 (2008)
13. Sinnott, R.W.: Virtues of the Haversine. Sky and Telescope 68(2), 159+ (1984)
14. Cheng, Z., Caverlee, J., Lee, K., Sui, D.Z.: Exploring Millions of Footprints in Location
Sharing Services. In: Proc. of ICWSM 2011, Barcelona, Spain (2011)