User-Centered Perspectives for Automotive Augmented Reality
Victor Ng-Thow-Hing∗1 , Karlin Bark1 , Lee Beckwith1 , Cuong Tran1 , Rishabh Bhandari2 , Srinath Sridhar3
1 Honda
Research Institute USA, Mountain View, CA, USA
2 Stanford University, Stanford, CA, USA
3 Max-Planck-Institut für Informatik, Saarbrücken, Germany
A BSTRACT
Augmented reality (AR) in automobiles has the potential to significantly alter the driver’s user experience. Prototypes developed in
academia and industry demonstrate a range of applications from
advanced driver assist systems to location-based information services. A user-centered process for creating and evaluating designs
for AR displays in automobiles helps to explore what collaborative
role AR should serve between the technologies of the automobile
and the driver. In particular, we consider the nature of this role
along three important perspectives: understanding human perception, understanding distraction and understanding human behavior. We argue that AR applications should focus solely on tasks
that involve the immediate local driving environment and not secondary task spaces to minimize driver distraction. Consistent depth
cues should be supported by the technology to aid proper distance
judgement. Driving aids supporting situation awareness should be
designed with knowledge of current and future states of road users,
while focusing on specific problems. Designs must also take into
account behavioral phenomena such as risk compensation, inattentional blindness and an over-reliance on augmented technology in
driving decisions.
Index Terms: H.5.1 [Multimedia Information Systems]: Artificial, augmented, and virtual realities—; H.5.2 [User Interfaces]:
User-centered design—
1
I NTRODUCTION
Augmented reality (AR) in automobiles can potentially alter the
driver’s user experience in significant ways. With the emergence of
new technologies like head-up displays (HUDs) that are AR capable, designers can now provide visual aids and annotations that alter what the driver focuses on, and how they accomplish the driving
task. While this can potentially alleviate cognitive load and create
more enjoyment in the driving task, it can also introduce new risks.
In contrast to AR applications on smart phones or tablets, the
windshield offers a direct and larger field-of-view of the actual environment. Automobiles can be equipped with a wider variety of
powerful sensors and computational power. In addition, automobiles are generally constrained to roadways, which helps to limit the
domain of possible contexts applications need to focus on. Driver
distraction is a clear danger. The National Highway Traffic Safety
Administration (NHTSA) in the United States identifies three types
of driver distraction[30]: visual distraction (eyes off road), cognitive distraction (mind off driving), and manual distraction (hands
off the wheel). AR can directly affect the first two types.
There are looming societal trends that the future automobile must
deal with. In most developed countries, the number of elderly
∗ e-mail:
[email protected]
drivers are growing. Many of these older drivers must choose between a radical change in their lifestyle or the risk of continued
driving with impaired visual capabilities[45]. AR can be used to
increase saliency of important elements in the driver’s view and has
the potential for augmenting the driver’s situation awareness. These
benefits can trickle down to all drivers, regardless of age. Since AR
in the car provides a personal display to the driver, information can
be customized for the individual, both in terms of content and in
visual parameters to match the driver’s visual and cognitive capacities. In fact, there is a growing number of advanced driver assistance systems (ADAS) that provide aids to the primary task of driving such as pedestrian detection[16] and blind spot detection[40].
Cars will have access to a vast array of new information about
their environment through sensors and internet connectivity. AR
can serve as an intermediary for presenting this information as part
of an immersive experience, while keeping the driver’s eyes on the
road and not on a secondary display on the dashboard. New sensors capable of depth measurements provide higher precision information about objects in the environment[24]. AR can provide
a natural way to convey spatiotemporal information aligned with
the moving positions of objects relative to the ego-centric view of
the driver. The potential for greater state understanding that comes
with improved intelligence in the car can also be communicated
with augmented reality. This can be seen with AR navigation aids
that directly display paths in the driver’s view[15, 20].
The demonstration of autonomous vehicles on public roads[27]
has been seen as a viable alternative to dealing with the dual problems of a growing number of elderly drivers that can no longer
safely drive and increased channels of incoming information that
can distract the driver. However, to increase acceptance and comfort levels using this technology, the car needs to communicate
its decision-making process and perceptual awareness[31] so the
driver is ensured the car is operating safely. AR, combined with
other modalities such as sound or haptics, can serve as the intermediary for driver-car collaboration. In particular, the car can
communicate its own situation awareness and driving plans to the
passenger of autonomous vehicles using an AR interface. These
human-machine interfaces must be developed simultaneously with
technologies for driver automation.
For all these promising areas of application of automotive AR,
the stakes are high. Visual perception plays a large role in determining a driver’s situation awareness of the environment. Situation awareness is the perception of environment elements in time
and space. If information is presented incorrectly, either in style
or content, situation awareness may be improperly represented or
driver distraction can occur, leading to dangerous driving conditions. Massive deployment of these technologies without any design guidelines for application developers can produce an unacceptable number of accidents. Dying is a bad user experience.
AR can be a medium of collaboration between the driver and the
technology of the automobile. Technology enables better sensing of
a car’s immediate environment, access to location-based services,
and new modalities of interaction with a vehicle. AR can provide
a way for drivers to interface with these technologies using a visual modality that is integral to the driving task. There is a need
to design both the AR display technology and the applications that
they will host carefully, considering the behavior and physiological
constraints of the driver.
1.1
Contributions
This paper describes how we adapted a design framework to incorporate user-centered perspectives in automotive AR solutions. We
discuss three important perspectives for successfully designing and
implementing augmented reality in automobiles. First, we must
understand the visual perception of humans. Here, we are referring to the visual processing that occurs in the brain, prior to the
higher state recognition processes such as object recognition or assessing situation awareness. We present our own studies testing the
importance of consistency among multiple depth perception cues.
Second, we examine issues of driver distraction and how AR can
influence it. Finally, we seek to understand how inherent aspects
of natural human behavior should influence the design of solutions
for driving aids. Specifically, it is important not to inadvertently
introduce undesirable behavioral changes that can create dangerous
driving habits. Throughout our discussion of these three points, we
will illustrate with examples we are currently designing and developing.
1.2
Outline
We review the related work in automotive AR solutions in Section
2, followed by a description of our design process in Section 3. The
three user-centered perspectives are discussed in detail in Section
4, Section 5, and Section 6. Conclusions are in Section 7.
2
R ELATED W ORK
One advantage an AR-based display has over a basic HUD display
that conveys driving data like vehicle speed or other state information is the emphasis on contextual information as it relates to the
external environment of the car. Tönnis et al.[42] created AR driving aids for longitudinal and lateral driver assistance that visually
conform to the road’s surface. Park et al.[33] found that drivers had
faster response times to lane-changing information when it was directly augmented in perspective over the road surface compared to
being displayed as 2-D icons. Participants described the AR representation as intuitive, clear and accurate. This information can help
disambiguate instructions by using augmented graphics in consistent perspective with environmental elements in the driver’s view.
In the literature, these types of HUDs have been described as being
contact-analog[35].
Clear and intuitive information annotating elements in the
driver’s field of view unencumbered by the need to wear sensor
apparatus, has tempted many researchers to explore AR use in car
navigation[20, 29]. Medenica et al.[28] demonstrated that an ARbased navigation system caused drivers to spend more attention
looking ahead on the road then non-AR systems. A hybrid system that smoothly transitioned between egocentric route information and a 2-D overhead map was simulated by Kim and Dey[22]
to show its potential to reduce navigation errors and reduce driver
distraction for elderly drivers.
The perceptual issues with augmented reality display systems
have been identified and categorized. For stereoscopic displays,
Drascic and Milgram[11] examined sources of perceptual mismatches with calibration, inter-pupillary distances, luminance and
conflicting depth cues of accommodation and vergence. Kruijff
et al.[23] expanded the examination of issues to include new mobile device displays. In these mobile displays, differences in the
viewer’s and display’s field of view and the viewing offset angle
of the real world and the display can effect perception when using
AR applications. Mobile displays also suffer from a lack of depth
cues that can lead to underspecified depth of objects in the scene.
In Section 4.1, we quantify how inconsistent depth cues can lead to
inaccurate estimates of distances for driving applications.
A classification of presentation principles in AR was described
by Tönnis and Plecher[43]. The focus of the classification was
on different implementational details of AR solutions: temporality,
dimensionality, registration, frame of reference, referencing, and
mounting. Fröhlich et al.[15] describe a design space focused on
different aspects of realistic visualizations (constituents and styles).
These categories are related to the technical implementation of the
AR solution. In contrast, the goal of our work is to examine AR
solutions in the context of how they influence human cognition and
how human cognition should play a role in creating the design of
a solution. The incorporation of these user-centered perspectives
should occur during the early conceptual design phases, before a
particular prototype’s presentation and implementation has been finalized.
3
D ESIGN P ROCESS
With these perspectives of the driver in mind, the problem becomes
less of how to implement or describe an idea technically, but what
is the appropriate form of solution to a driver’s problem.
3.1 Understanding the Problem
The process of building up rationale for our designs begins with
understanding the problem. Research reports[5] documenting the
causes and types of accidents can give important statistics to help
prioritize problems areas based on factors such as societal impact
and size of casualties. However, they do not provide the personal perspective of the driver. To understand the problem from
the driver’s perspective, contextual inquiry and design can be employed as others have done for human-machine interface (HMI)
design[17]. We conducted in-car interviews with different demographic groups to understand their daily driving habits, concerns,
and how driving is integrated in their daily work/life schedules (Figure 1). Conducting the interview in their own car, allows interviewees to demonstrate directly how they interact with elements in
their cars.
Figure 1: In-car interviews conducted during the contextual inquiry
process.
3.2 Ideation
Once information is gathered from interviews and other research reports, the process of analyzing and organizing this information into
prevailing themes begins the creative process of idea generation or
ideation[21]. To facilitate this process, we employed a structured
brainstorming technique called affinity diagramming[7]. Observations from interviews as well as facts gathered from other research
reports are written down on notes of paper and organized according
to similar themes or affinities (Figure 2). This grouping can be done
at several layers to develop a hierarchical structure.
We interviewed 12 people (6 elderly drivers (age over 60) and 6
Generation-Y drivers (age 20-30). Through affinity diagramming
we identified themes of safety, traffic, integration with mobile devices and exploration. In the area of safety, elderly drivers had concerns about their personal abilities regarding awareness of their surrounds and navigating intersection scenarios with pedestrians. The
Generation-Y drivers mentioned more scenarios regarding the behavior of other drivers such as other people not seeing them. Unintentionally, we discovered individuals in both groups who could be
classified as extreme driving enthusiasts. They explicitly avoided
all distractions, such as phone use and music in the car, in order
to concentrate on perfecting execution of driving maneuvers and
developing better situation awareness. Studying these individuals
inspired our designers to create solutions to help normal drivers become engaged in the primary task of driving rather than focusing
on creating solutions involving distractive secondary tasks. We developed several solution concepts centered on situation awareness,
focusing on providing drivers better information to improve their
driving decisions rather than explicitly telling drivers what maneuvers to make. The aim of the former strategy is to engage the driver
in understanding the relevant situation awareness factors for a particular driving situation. We felt that following the latter strategy
would make drivers more reliant on the AR technology and might
weaken their inherent driving skills over time.
over this footage using the field-of-view guides to limit where we
could place augmented graphics. Higher fidelity prototypes eventually involved implementing driving aids to work within our driving simulator which projects road scenery on a 120 degree curved
screen while augmented graphics are shown on a see-through HUD
display in front (Figure 4).
Figure 3: A white circle was placed on the dashboard so that its
reflection could indicate the usable field of view of the HUD.
Figure 4: Driving simulator and HUD prototype: Augmented graphics appear on a the see-through windshield HUD display while road
scenery is projected on the curved screen behind.
Figure 2: Affinity diagrams for ideation. Blue notes represent groups
collecting commonly-themed observations (in yellow). Pink notes are
higher level categories that group themes together.
3.3 Sketching and prototyping
From the pool of solution concepts, the most promising are chosen and undergo development from early conceptual sketches, to
computer-generated test animations to working prototypes in our
driving simulator before eventual field testing in a vehicle. Figure
5 depicts the typical lifecycle of a driving aid. We refine key ideas
and choices at the early sketch stages, before investing engineering
effort to build higher fidelity prototypes. For example, we used GoPro cameras[19] mounted on the windshield to record video footage
of 120 degree field-of-view, driver perspectives of various road situations. To simulate the restricted field of view, a white paper circle the same radius as our HUD outgoing lens was placed on the
dashboard to visually indicate the HUD field of view via the circle’s reflection off the windshield (Figure 3). We could then use
video editing software to prototype different augmented graphics
It is during the sketching process, where we work out the mechanics and appearance of a design, that we take into consideration
the three user perspectives. For example, when designing the leftturn aid, one might have been tempted to draw the left-turn path
the car should take, but we realized during our interviews that the
main concern was deciding the timing of the left turn. One interviewee said, ”It’s difficult for me to judge how far away the oncoming traffic is, how fast they are going, and if it is safe to proceed”.
Further low-fidelity computer animations confirmed that adding a
green turning trajectory did not accentuate the danger of an oncoming vehicle. In fact, it diverted attention away from the other car. Instead of telling drivers what to do, our solution (Figure 14) provides
additional cues to the driver to enhance their situational awareness,
but the decision to turn is left up to them.
3.4 Evaluation
Design techniques such as contextual inquiry, ideation and prototyping can be employed to identify needs, conceive solutions and
implement them at various levels of fidelity[4]. This is important
because our current understanding of the neurological processes
mapping human perception to human behavior is imperfect. Only
through observational evidence and evaluation of solution ideas can
we validate if AR designs are useful.
Since experiencing AR in automobiles is not common, many diverse solutions must be examined in order to converge to a common
set of design patterns that have established utility. In order to do so
Yielding
iteration
higher fidelity
Figure 5: Typical design lifecycle of AR driving applications (e.g., yielding right-of-way): From user research, the ideation process creates solution
concepts which are refined from initial sketches to a series of higher fidelity prototypes.
in a timely fashion, early evaluation should be done prior to expensive investment of resources implementing actual working prototypes. Designers may attempt to anticipate how users will react, but
human behavior is often unpredictable. These early evaluations can
help identify whether the designs are effective and allow for rapid
iteration to help designers come up with a compelling solution.
In one example that illustrates the usefulness of early evaluation,
a driving aid designed to deter drivers from prematurely cutting off
a pedestrian’s right-of-way was evaluated to observe both initial
visceral reactions and effectiveness. Early concepts for the aid focused on visible textual barriers that increased the spatial footprint
of the pedestrian, which we hoped would make the driver leave a
bigger margin of safety around them.
For low-fidelity prototyping, we initially transitioned from handdrawn sketches (Figure 5) to using pre-rendered graphics over prerecorded driving scenery (Figure 6 and Figure 14). Registration
of 3-D graphic elements with moving elements from video was
done using a process called matchmoving [10]. These animations
were presented to drivers for user acceptance testing. We received
initial visceral reactions and visual response using eye-tracking
equipment[39].
For the yielding aid, positive reactions were recorded from an
elderly user:
”The visuals are helpful and confirm when it is safe
to proceed... This is a very good confirmation and a
second opportunity to be cautious and yield. I prefer
this... it reminds me when my wife accompanies me
when driving... she is usually vigilant and confirms such
risks too... I like this!”
Other useful suggestions on color, placement and motion of the
aid were noted by users.
Although qualitative observations provide insightful feedback,
evaluations that can capture the user’s behavior or intended actions
are also a necessity. A challenge is to develop low-fidelity prototypes that allow user enactments for AR with correct driving mechanics implemented. One method that our group has employed
for these enactments is to use short, focused clips of real life driving footage and animation overlays to estimate how a user would
act in different situations. Users can be asked to indicate when they
would perform different actions using push buttons or gaming input devices. We focus on identifying one or two basic parameters
to measure, which keeps the study short and simple.
In the case of the yielding example, the main intent of the yield
text was to provide a virtual barrier that would encourage drivers
Figure 6: Text is used as a barrier to protect pedestrians.
to wait a longer period of time before driving through the crossing.
A short, two minute study was conducted to observe whether or
not drivers presented with the yield text would indeed wait longer.
With a single question in mind, a controlled study with numerous
measures was not necessary. Instead, participants were shown a
short, ten second video clip that depicted a scenario with pedestrians (Figure 6) and participants were asked to indicate the time at
which they would proceed past the pedestrians after the pedestrians finished crossing the street. Half the participants were shown
video clips with the driving aid, while the other half were shown the
original recorded footage. Surprisingly, we found that the driving
aid did not have the intended effect. In fact, we found that participants who viewed the video without the driving aid tended to wait
a longer period of time ( 0.5 s) before proceeding than those who
had the driving aid. This evaluation, which only took one day to
conduct, provided valuable data highlighting the ineffectiveness of
the design. Although further follow up studies are needed to determine precisely why this was the case, early detection of this behav-
ior allows the team to iterate and improve upon the design prior to
conducting a larger, controlled study.
Once a solution has been refined, one can then identify and
examine design variables, perhaps using proposed classification
schemes like described in [43]. Since detailed studies require a
large investment of time, recruitment of subjects and evaluation of
collected data, they should be done only after a candidate idea has
been sufficiently iterated and refined to the point of being testable.
Controlled studies can be employed to identify which design variables affect driving performance. Physiological measures such as
eye-tracking and skin conductance sensors, in addition to driving
metrics recorded from driving simulation sessions, can be used to
quantitatively evaluate a driver’s performance. Eye-tracking in particular can help confirm if a driver’s visual perceptions are being
influenced by AR elements applied to the driver’s field of view.
Driving simulations are an effective tool for controlled studies.
Using driving simulations, we can test driving behavior as well as
model dangerous situations that are not feasible with real driving.
Many of the perceptual and behavior processes in driving are not
conscious. Driving simulation is a very important tool to observe
driver behavior while using the AR-based driving aids. Furthermore, the basic application can be tested for usability before additional research and development effort is made investing on sensors
and computer recognition algorithms for detecting elements in the
car’s environment. Nevertheless, additional engineering effort is
required to implement the aids to the stage that can work correctly
with driving data generated from a driving simulator.
In the next three sections, we examine further several user perspectives that must be taken into account when conceiving AR applications for the car.
4
can only focus on one distance at a time presents a problem when
displaying AR imagery on see-thru displays.
Displays that generate images directly from the windshield, such
as by exciting embedded transparent phosphors in the glass via laser
projection [46], will cause the eye to focus directly on the surface
of the windshield, making the entire scene beyond the car’s windshield out of focus. Furthermore, the head position must be tracked
to align the virtual image with the real objects they are meant to
augment in order to properly simulate motion parallax. Although
head-tracking technology has become more robust with face recognition algorithms, this still requires additional hardware and computer vision processing which can add to latency in the proper rendering of graphics.
Methods that use optical combiner-HUDs can generate imagery
beyond the windshield. Most HUDs in cars todays, including so-
U NDERSTANDING H UMAN P ERCEPTION
In this section, we discuss several perceptual cues and processes
that are important to consider when adding synthetic imagery to the
visual field in AR automotive applications.
4.1
Depth Perception
Proper handling of depth perception is important in automotive AR
for the driver to gauge distances of augmented objects with respect
to real objects in the scene. In Tönnis et al.[44], a study using an
egocentric view of a 3-D arrow to guide the driver’s attention produced slower reaction times than an indirect bird’s eye view locating the area of immediate danger. The results may seem counterintuitive as one would think the arrow in the egocentric view would
be more direct at localizing the source of attention. However, the
3-D arrow was rendered as a projection on a 2-D display, which the
authors speculated might have been the reason for the slower reaction times as observers were missing important stereoscopic depth
cues. Current techniques to display AR utilize a see-thru display
by reflecting the computer graphics imagery off a windshield or
see-thru surface mounted near the windshield. However, there are
significant challenges in displaying images properly to the driver
with correct and consistent depth cues.
For a 2-D display, monocular cues such as size, occlusion and
perspective can be modeled directly with the standard computer
graphics pipeline. However, with the introduction of optical seethrough displays[2], the eyes must be able to make sense of an entire combined scene of synthetic and real elements at the same time.
Accommodation is an important depth cue where the muscles in the
eye actively change its optical power to change focus at different
distances to maintain a clear image. Vergence is the simultaneous
inward rotation of the eyes towards each other to maintain a single
binocular image when viewing an object. If an image is created
directly on the windshield, as our eyes converge to points beyond
the windshield into the environment, a distracting double image of
the windshield visual elements will appear. The fact that the eye
Figure 7: Diagram of HUD prototype with virtual sign generated. Sign
distance depicted is much shorter than actual distances used in the
study for space reasons.
Figure 8: Actual prototype shown without cover, displaying optics and
actuated projector displays. Inset: Virtual guiding lanes aligned with
ground plane are generated with the prototype from the driver’s point
of view.
Figure 9: Percentage errors for estimating distances of an AR street sign using different depth cues for 16 subjects
lutions that directly reflect a smartphone or tablet display off the
windshield, result in an image that is in a fixed focal plane a relatively close distance beyond the windshield, not more than three
meters[3]. Consequently, drivers will not be able to correctly gauge
correct depth perception because the depth cues on the HUD display will not match the depth of the objects in the environment
they are meant to augment, which are typically beyond 5 meters.
Displays must be built that have dynamic focal planes that can be
adjusted to arbitrary distances beyond 5 meters to several hundred
meters beyond the car, providing enough time for a driver to see
potential hazards and safely react to them.
We built a prototype HUD display (Figures 7 and 8) that has
actuated optical elements that enable the focal plane to be dynamically adjusted from 5m to infinity. When testing with a group of 16
participants (8 males and 8 females), aged 19-78 (µ = 42, σ = 20),
there is a clear difference in depth estimation when people are asked
to judge the distance of a virtual stop sign drawn at different distances (from 7m-20m) using a fixed focal plane at 7m compared
to one that can adjust its focal plane to match the targeted distance
(Figure 9). Subjects were given different depth cue conditions from
size only, focus only (accommodation) and size+focus. It is clear
that size cues alone had the largest distance estimation percentage
errors among the three conditions. The mean percentage error in
depth perception by participants with size-based cues is 22% compared to 9.5% with focal cues (p < 0.01). The condition for size +
focus cues showed no significant difference from the focal cue only
condition. Interestingly, for the size-only condition, participants estimated the depth to be close to the fixed focal length that was set
for that condition (7 meters). This suggests that the depth cue for
accommodation can dominate potentially weaker cues like size, reinforcing the need to express accurate depth cues for AR elements.
See-thru display systems must have the ability to adjust the focal
planes at a large range of distances beyond 5m to properly imple-
ment driving aid augmented reality applications. Incorrect depth
perception can lead to incorrect driver decisions.
4.2 Field of view
Although humans are capable of over 180 degrees of horizontal
field of view[23], only the center 2 degrees of the fovea has highest visual acuity [14]. The visual acuity contributes to the driver’s
useful field of view (UFOV) which is defined as the visual area
over which information can be extracted at a brief glance without
head movements[18]. Due to the limited size of the UFOV, drivers
need to turn their heads to attend to different parts of the environment. From a driver’s viewpoint in a moving vehicle, the most
stable parts of an image are further away, which coincides with
where drivers should be directing their gaze to have time to react
to changes in road conditions. Highlighting parts of the view corresponding to the periphery of a windshield would correspond to
objects only a few meters in front of a vehicle. For a fast moving
vehicle, augmentations at the periphery of a windshield would give
drivers little time to react to them before they leave the driver’s field
of view. For these reasons, it may not be necessary to build an AR
display with a field of view that covers the entire windshield. This
is fortuitous, as it is technically difficult to build a full windshield
display with optical-combiner HUD technology due to constraints
placed on the optics caused by available dashboard space. In addition, the field of view of the HUD display will be fixed in size
as determined by the optical design (our prototype’s FOV is 20 degrees). These restrictions on the field of view imply that most AR
applications should concentrate on the forward view for displaying
augmentations. For creating situation awareness of entities beyond
the field of view, other strategies may need to be employed. Using low-fidelity prototyping methods as seen in Figure 3, we can
design solutions that take this restricted field of view into account.
In Figure 10, a cautionary callout for making the driver aware of
Figure 10: The pedestrian on the extreme left is out of the HUD’s
field of view. The field of view boundary is denoted by a dashed red
curve (added to diagram for clarity) Note: Augmented graphics are
on a see-through display while city and pedestrians are from a driving
simulator projected on a screen in front of the display. The warning
sign call-out with the white line helps to direct the driver’s attention to
the pedestrian out of the field of view.
pedestrians at intersection crossings was created to direct the gaze
of drivers at pedestrians which may exist outside of their UFOV.
5
U NDERSTANDING D ISTRACTION
AR can potentially add to visual and cognitive distraction. If an
AR application offers interactivity with the user (e.g., with gesturebased input), it contributes to manual distraction as well [30]. Any
kind of cognitive distraction directly influences visual distraction.
Research has shown that even in the absence of manual distraction when using a hands-free phone, driver interference effects are
predicted more on the cognitive component of conversations while
driving[32]. This can be explained by examination of the human
attention system.
5.1
Attention system
Inattention during driving accounts for 78% of crashes and 65%
of near crashes [36]. The human visual system has an attention
system where the gaze is first attracted to regions of high saliency.
In visual design, searching by color has been shown to have the
fastest time compared to size, brightness, or geometric shape [6].
The sensitivity of the cues are primed by the current task a person
is doing, known as selective visual attention [9]. Once the eye has
fixated on the visual region, the higher visual acuity regions process this visual information to infer greater contextual information,
such as object recognition, followed by cognitive processing to create situation awareness that can incorporate the temporal behavior
of all objects in the scene. Secondary tasks that are not related
to the immediate driving environment such as phone conversations
can cause the attention system to suppress relevant cues important
for the primary task of driving. Similarly, putting augmented reality elements related to secondary task information such as calendar
items or music album covers can dangerously redirect the attention
system away from important driving cues.
Instead, augmented reality can be used to provide markings that
can adapt to changing contextual situations to better regulate traffic behavior. In unaugmented driving, civil engineers already do
this with lane markings, and road signs, which have proven to be
very useful for indicating static aspects of traffic. Augmented reality can be used to provide dynamic markings, that can adapt to
changing contextual situations. In Figure 10, a virtual crosswalk
can be placed on a roadway for a pedestrian crossing a road with
the pattern moving in the direction of the pedestrian’s walk, increasing the saliency of the region as well as projecting into the future
the intended pathway of the pedestrian.
One important element of an attention system is the suppression
of elements surrounding the focus of attention[9]. The introduction
of physical, digital billboards with brighter self-illumination and
animation has been shown to draw the attention of drivers more
than regular billboards[12]. If the visuals of the advertisements are
more salient than surrounding cues, there is the danger that relevant
driving hazards could be suppressed. Inattentional blindness [26]
can occur where the driver stays fixated on some visual items, ignoring unexpected, but important changes elsewhere in the visual
field. On the other hand, increasing saliency can be used beneficially to highlight potential hazards on the road like pedestrians
and potholes. Using cues such as motion or strong colors of sufficient contrast to the background scenery will not only make things
more distinguishable, they offer an important way of informing the
driver that these elements are synthetic versus real.
Care must be taken in choosing the cues used to highlight hazards. Schall et al.[38] found that static cues for hazards actually
had longer reaction times than using no cues. However, more dynamic cues did reduce reaction times compared to the static cue
conditions in their studies. Schall et al.[37] further explored the use
of AR cues for elderly drivers and found they improved detection
of hazardous objects of low visibility, while not interfering with the
detection of nonhazardous objects. This shows potential promise of
AR technology being used to compensate for deficiencies in vision
as people age.
Further usability studies would be needed to design the proper
balance between increasing desirable saliency and avoiding unwanted distraction. Perhaps using visual highlights that fade
quickly over time can avoid inattentional blindness. Experiments
employing eye-tracking measurement technology[39] can help designers evaluate the eye-gaze behavior of their AR visual artifacts.
We utilized this technology when designing a left turn driving aid
that helps driver estimate the oncoming vehicle speed by projecting
a path in front of the car corresponding to the 3 second projection
of the car (Figure 14). One concern regarding the design of the
projected path was that drivers would fixate on the projected path,
distracting them from observing the road scenery. Eye gaze behavior was tested on different types of visual stylings of the projected
path. Though initial qualitative feedback indicated that a chevron
style design was more pleasing to users, eye gaze recordings also
indicated that users tended to fixate a little more on the chevron
styled path with a greater number of rapid eye movements. Therefore, a less eye catching but effective solid red path was chosen. In a
pilot study to test the effectiveness of the projected red path (Figure
11), we found that users fixated on the movement of oncoming vehicles and rarely fixated on the projected path. In post-study questionnaires, participants also indicated that the projected path was
clearly visible, indicating that they were able to use their peripheral
vision to notice the driving aid, but their primary attention visual
attention was spent on other objects in their view.
5.2 Cognitive Dissonance
Cognitive dissonance describes the discomfort one feels when
maintaining two different conflicting beliefs. This phenomena affects the visual perception of physical environments[1]. In the context of AR, if the graphic elements are not registered correctly, or
if they exist in two different coordinate systems (2-D image plane
vs. 3-D real world), the brain may seek to resolve between these
two spaces and the driver may become distracted or even misinterpret the visual scene. Popular AR applications currently overlay
2-D image labels on video images of the world. However, when
transitioning to a see-thru display in a car, one cannot focus simul-
with the vehicle, so that a collaborative partnership is established
between the driver and the technology of the car. This creates a
greater sense of trust and feeling of control when interacting with
the car’s sensors and driving control systems. Lee and See[25] explains how a breakdown of trust between technology and people
can result in misuse and disuse of technology automation. In the
case of AR-based driving aids, drivers will either turn off or ignore
the AR-based information (disuse) or incorrectly rely on this information, with potential disastrous outcomes if a critical assumption
of the technology’s accuracy is violated.
6.1 Collaboration between human and machine
Figure 11: Sample snapshot of eye gaze measurement for left turn
driving aid. The blue circle indicates where the participant’s eyes are
focused. Participants did not fixate on the projected path.
taneously on elements of the real world and their corresponding
2-D annotations because they may exist at different focal depths.
These effects may become even more disconcerting when stationary 2-D text is placed against a moving 3-D background. Studies
[34] have shown that layering labels at different depths rather than
in the same visual plane can improve visual search times due to improved clarity and legibility. In Sridhar and Ng-Thow-Hing[41], an
AR application places address labels directly in the same 3-D space
as the buildings they describe (see Figure 12). Labels move with
the buildings so that position and context stay consistent with the
changing scenery.
Figure 13: The safety grid identifies location of vehicles surrounding
a car.
Figure 12: Address labels are displayed in the same 3-D space as
the physical road.
6.2 Risk Compensation
6
U NDERSTANDING H UMAN B EHAVIOR
So far, the first two perspectives dealt with the experience of the
driver through relatively short timeframes when AR systems are
activated in specific driver contexts. There have been studies examining the behavioral changes that can be induced through longterm usage of these driving aids. In de Winter[8], continuous haptic
guidance in the gas pedal or steering wheel, while reducing mental
workload, may inhibit retention of robust driving skills. The author
concludes that supplementary information should not be provided
continuously, but only as needed. In Norman[31], a strong case is
made for the need to continually involve the driver in interactions
AR can serve as one important part of a multi-modal interface
that can facilitate the collaboration between a driver and the technology capabilities of the automobile. Lee and See[25] describe
how the development of trust can be influenced by the visual display of the interface to a technology, with trust increasing with increasing realism. This may lead to an advantage of AR-based over
indirect map-based displays for navigation because drivers can unambiguously see elements in their real view annotated directly with
the information the aid provides. Visuals can also be used to manipulate visceral reactions through color, motion or metaphor that
can manipulate the level of trust of the driver towards the AR technology in the car.
Visual displays often are the intermediary for interpreting data
that would otherwise not be easily interpretable by a driver. For
example, sensors in the car that can detect vehicles in the surrounding environment can use AR aids to show the neighboring vehicles in a visual manner easy to understand. In Figure 13, we designed a visual aid to be placed in the sky on a plane parallel to
the ground. Detected cars in the car’s immediate environment can
have corresponding grid squares highlighted. The transparency of
the grid square indicates proximity to the vehicle with fully opaque
squares implying the vehicle is directly adjacent to the driver. As
grid squares become more opaque or transparent, the driver gets a
sense if cars are approaching or moving away.
Risk compensation describes the phenomena when the presence of
extra safety information causes drivers to engage in riskier behavior because they may feel more confident about their surroundings [31]. When multiple vehicles have advanced AR systems,
the proper estimation of situation awareness may be inhibited if
drivers deliberate change their behavior in response to this information. When we left the safety grid driving aid continuously on
(Figure 13), we noticed that test drivers began engaging in more
lane changes. One solution to prevent this behavior may be to only
display the grid when imminent danger is present.
Another method to counteract this effect is actually making
things appear more dangerous than they really are, to help prevent
complacency[31]. We designed a left turn aid that helps driver estimate the oncoming vehicle speed by projecting a path in front of the
Figure 14: Oncoming cars show their projected path three seconds
in the future to help drivers gauge their speed for making left turns.
car corresponding to the 3 second projection of the car (Figure 14).
The rationale is that as a driver makes a left turn at an intersection,
the oncoming car increases its occupied area on the road using the
projection in a manner proportional to its speed and danger level.
6.3 Situation Awareness
A driver’s behavior is directly a result of a decision-making process.
Situation awareness describes how operators (in this case drivers)
maintain state and future state information of elements in their environment. Proper situation awareness is critical for maintaining
safe driving behavior. In Endsly[13], situation awareness is viewed
from an individual’s perspective and comprises three levels: I) Perception of elements in the environment, II) Comprehension of their
meaning and III) Projection of future system states. In Section 5.1,
we can understand how AR’s manipulation of saliency can help
achieve level I. However, AR can also assist in the cognitive tasks
to achieve level II and III, by allowing the computer to infer this
information.
As driving is a spatiotemporal task, one way to do this is to develop systems that interpret numerical meter results directly into
their spatiotemporal representations using carefully designed visuals. In Tönnis et al.[42], an AR braking bar indicates the distance in
front of the car where it will eventually stop if the brake is fully depressed. There is no need to estimate this from the velocity shown
in the speedometer.
The braking bar exclusively uses the state information from the
driver’s car. AR can also enhance the situation awareness of other
road vehicles and pedestrians surrounding the driver. In our left
turn driving aid (Figure 14), we automatically draw the projected
path of the other oncoming cars several seconds in the future based
on their current velocity. By articulating what is going on in the
external world with other cars, we are augmenting a driver’s ability
to understand information in the world and help the driver’s own
decision process rather than directly instructing the driver what to
do.
We have found that the best strategy for building driving aids for
achieving better situation awareness is to focus on specific problems where driving accidents are prevalent. For the left turn aid, we
were motivated by the fact that 22 percent of pre-crash events occur when a vehicle is making a left turn [5]. In contrast, the safety
grid (Figure 13) was not designed with a specific situation in mind.
As a result, in initial versions of the design, the driver was unsure
how to use the safety grid or was tempted to leave it on all the time,
as a dangerous substitute for self-gazing to build situation awareness. We are subsequently redesigning the aid to focus on specific
highway problems involving neighboring cars.
7 C ONCLUSIONS
In this paper, we have described three important perspectives centered on the driver: understanding driver perception, driver distraction and driver behavior. We believe that these considerations are
just as important as the technical components needed to implement
effective AR solutions in the car. Indeed, the design process needs
to consider these human characteristics when conceiving solutions
for augmented reality in the vehicle. We also describe a design process that can be used to help create and evaluate driving solutions
employing augmented reality. We intend to continue to pursue this
process for the current and future driving aids we will design.
In the two different technological efforts toward driver automation and driver enhancements, AR has the potential to contribute to
both endeavors. In driver enhancement, providing better situation
awareness and saliency to driving hazards or other important elements on the road can be aided with AR. For autonomous driving,
AR can serve an important role of communicating to the driver and
building trust in the car’s decisions, confirming it perceives objects
and the rationale for its decisions[31]. For example, AR can show
where a planned lane change will occur and what triggers in the
environment (e.g., a slow car) instigated the action. This allows the
driver to understand the car’s state and reduce the anxiety of not
knowing what actions a car make take next. In addition, AR can
be used to convey the degree of uncertainty a car has about perceived elements on the road to allow the driver to decide if manual
intervention is necessary. If this is not done, a polished rendering
of an AR element may falsely lead the driver to build an incorrect
assumption of situation awareness.
Several major questions not addressed here must be considered.
How do multiple driving aids interact together if used at the same
time? If people suddenly were to stop using AR after prolonged
use, how will their subsequent driving behavior be affected in a
non-AR equipped car? Will driving aids improve a driver’s performance in a non-AR car? Will native driving skills deteriorate
as drivers become overly dependent on AR-based information for
maintaining situation awareness? The successful application of AR
for automobiles will require well-motivated solutions that carefully
consider the driver’s mental capabilities, dynamic conditions encountered while driving, and technology to implement these solutions.
ACKNOWLEDGEMENTS
The authors wish to thank Tom Zamojdo and Chris Grabowski
for their stimulating conversations, sage advice, and assistance in
building our HUD prototypes.
R EFERENCES
[1] E. Balcetis and D. Dunning. Cognitive dissonance and the perception of natural environments. Psychological Science, pages 917–921,
2007.
[2] O. Bimber and R. Raskar. Spatial Augmented Reality: Merging Real
and Virtual Worlds. A K Peters, 2005.
[3] BMW. Sbt e60 - head-up display. Technical report, BMW AG - TIS,
2005.
[4] B. Buxton. Sketching User Experiences. Morgan Kaufmann, 2007.
[5] E. Choi. Crash factors in intersection-related crashes: An on-scene
perspective. Technical report, NHTSA, 2010.
[6] R. Christ. Research for evaluating visual display codes: An emphasis
on colour coding., pages 209–228. John Wiley & Sons, 1984.
[7] R. Curedale. design methods 1: 200 ways to apply design thinking.
Design Community College Inc., 2012.
[8] J. de Winter. Preparing drivers for dangerous situations: A critical reflection on continuous shared control. Systems, Man, and Cybernetics
(SMC), pages 1050–1056, 2011.
[9] R. Desimone and J. Duncan. Neural mechanisms of selective visual
attention. Annual Reviews Neuroscience, 18:193–222, 1995.
[10] T. Dobbert. Matchmoving: The Invisible Art of Camera Tracking.
Sybex, 2nd edition, 2013.
[11] D. Drascic and P. Milgram. Perceptual issues in augmented reality.
In Proc. SPIE: Stereoscopic Displays and Virtual Reality Systems III,
volume 2653, pages 123–134, 1996.
[12] T. Dukic, C. Ahistrom, C. Patten, C. Kettwich, and K. Kircher. Effects
of electronic billboards on driver distraction. Traffic Injury Prevention,
2012.
[13] M. Endsly. Towards a theory of situation awareness in dynamic systems. Human Factors, 37(1):32–64, 1995.
[14] M. Fairchild. Color Appearance Models. Addison, Wesley, & Longman, 1998.
[15] P. Fröhlich, R. Schatz, P. Leitner, M. Baldauf, and S. Mantler. Augmenting the driver’s view with realtime safety-related information. In
Proceedings of the 1st Augmented Human International Conference,
pages 1–11, 2010.
[16] D. Gavrila and S. Munder. Multi-cue pedestrian detection and tracking
from a moving vehicle. International Journal of Computer Vision,
73(1):41–59, 2007.
[17] A. Gellatly, C. Hansen, M. Highstrom, and J. Weiss. Journey: General
motors’ move to incorporate contextual design into its next generation
of automotive hmi designs. In Proceedings of the Second International
Conference on Automotive User Interfaces and Interactive Vehicular
Applications (AutomotiveUI 2010), pages 156–161, 2010.
[18] K. Goode, K. Ball, M. Sloane, D. Roenker, D. Roth, R. Myers, and
C. Owsley. Useful field of view and other neurocognitive indicators
of crash risk in older adults. Journal of Clinical Psychology in Medical
Settings, 5(4):425–440, 1988.
[19] GoPro. Gopro hero3 camera. http://gopro.com, 2012.
[20] D. Harkin, W. Cartwright, and M. Black. Decomposing the map: using head-up display for vehicle navigation. In Proceedings of the 22nd
International Cartographic Conference (ICC 2005), 2005.
[21] B. Jonson. Design ideation: the conceptual sketch in the digital age.
Design Studies, 26(6):613–624, 2005.
[22] S. Kim and A. Dey. Simulated augmented reality windshield display
as a cognitive mapping aid for elder driver navigation. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems,
pages 133–142, 2009.
[23] E. Kruijff, J. Swan II, and S. Feiner. Perceptual issues in augmented
reality revisited. In Proceedings of the 9th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 3–12, 2010.
[24] J. Langheim, A. Buchanan, U. Lages, and M. Wahl. Carsense-new environment sensing for advanced driver assistance systems. In Proceedings of the IEEE Intelligent Vehicle Symposium, pages 89–94, 2001.
[25] J. D. Lee and K. A. See. Trust in automation: designing for appropriate reliance. Human Factors, 46(1):50–80, 2004.
[26] A. Mack. Inattentional blindness looking without seeing. Current
Directions in Psychological Science, pages 180–184, 2003.
[27] J. Markoff. Google cars drive themselves, in traffic. New York Times,
2010.
[28] Z. Medenica, A. L. Kun, T. Paek, and O. Palinko. Augmented reality
vs. street views: a driving simulator study comparing two emerging
navigation aids. In Proceedings of the 13th International Conference
on Human Computer Interaction with Mobile Devices and Services,
pages 265–274, 2011.
[29] W. Narzt, G. Pomberger, A. Ferscha, D. Kolb, R. Müller, J. Wieghardt,
H. Hörtner, and C. Lindinger. Augmented reality navigation systems.
Universal Access in the Information Society, 4(3):177–187, 2006.
[30] NHTSA. Distraction. http://www.nhtsa.gov/Research/
Crash+Avoidance/Distraction, 2010.
[31] D. Norman. The Design of Future Things. Basic Books, 2007.
[32] L. Nunes and M. A. Recarte. Cognitive demands of hands-free-phone
conversation while driving. Transportation Research Part F: Traffic
Psychology and Behavior, 5(2):133–144, 2002.
[33] K. S. Park, I. H. Cho, G. B. Hong, T. J. Nam, J. Y. Park, S. I. Cho, and
I. H. Joo. Disposition of Information Entities and Adequate Level of
Information Presentation in an In-Car Augmented Reality Navigation
System, volume 4558, pages 1098–1108. Springer Berlin Heidelberg,
2007.
[34] S. Peterson, M. Axholt, and S. Ellis. Objective and subjective assessment of stereoscopically separated labels in augmented reality. Computers & Graphics, pages 23–33, 2009.
[35] T. Poitschke, M. Ablassmeier, G. Rigoll, S. Bardins, S. Kohlbecher,
and E. Schneider. Contact-analog information representation in an
automotive head-up display. In Proceedings of the 2008 symposium
on Eye tracking research & applications, pages 119–122, 2008.
[36] P. Salmon, N. Stanton, and K. Young. Situation awareness on the road:
review, theoretical and methodological issues, and future directions.
Theoretical Issues in Ergonomics Science, 13(4):472–492, 2012.
[37] M. Schall Jr., M. Rusch, J. Lee, J. Dawson, G. Thomas, N. Aksan, and
M. Rizzo. Augmented reality cues and elderly driver hazard perception. Human Factors, 55(3):643–658, 2013.
[38] M. Schall Jr., M. Rusch, J. Lee, S. Vecera, and M. Rizzo. Attraction
without distraction: Effects of augmented reality cues on driver hazard
perception. Journal of Vision, 10(7):236, 2010.
[39] S. I. (SMI). Smi eye tracking glasses. http://www.smivision.
com, 2012.
[40] M. Sotelo and J. Barriga. Blind spot detection using vision for automotive applications. J. Zhejiang University Science A, 9(10):1369–1372,
2008.
[41] S. Sridhar and V. Ng-Thow-Hing. Generation of virtual display surfaces for in-vehicle contextual augmented reality. In Proceedings of
the 11th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 317–318, 2012.
[42] M. Tönnis, C. Lange, and G. Klinker. Visual longitudinal and lateral
driving assistance in the head-up display of cars. In In Proceedings
of the 6th International Symposium on Mixed and Augmented Reality
(ISMAR), pages 91–94, 2007.
[43] M. Tönnis and D. A. Plecher. Presentation Principles in Augmented
Reality - Classification and Categorization Guidelines. techreport,
Technische Universität München, 2011.
[44] M. Tonnis, C. Sandor, C. Lange, and H. Bubb. Experimental evaluation of an augmented reality visualization for directing a car driver’s
attention. In Proceedings of the 4th IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR ’05), pages 56–59,
2005.
[45] J. Wood and R. Troutbeck. Elderly drivers and simulated visual impairment. Optometry & Vision Science, 72(2), 1995.
[46] W. Wu, F. Blaicher, J. Yang, T. Seder, and D. Cui. A prototype of
landmark-based car navigation using a full-windshield head-up display system. In Proceedings of the 2009 workshop on Ambient media
computing, pages 21–28, 2009.