Learning a Group-Aware Policy for Robot Navigation
arXiv:2012.12291v2 [cs.RO] 17 Nov 2021
Kapil Katyal∗1,2 , Yuxiang Gao∗2 , Jared Markowitz1 ,
Sara Pohland3 , Corban Rivera1 , I-Jeng Wang1 , Chien-Ming Huang2
Abstract— Human-aware robot navigation promises a range
of applications in which mobile robots bring versatile assistance to people in common human environments. While
prior research has mostly focused on modeling pedestrians as
independent, intentional individuals, people move in groups;
consequently, it is imperative for mobile robots to respect
human groups when navigating around people. This paper
explores learning group-aware navigation policies based on
dynamic group formation using deep reinforcement learning.
Through simulation experiments, we show that group-aware
policies, compared to baseline policies that neglect human
groups, achieve greater robot navigation performance (e.g.,
fewer collisions), minimize violation of social norms and discomfort, and reduce the robot’s movement impact on pedestrians.
Our results contribute to the development of social navigation
and the integration of mobile robots into human environments.
I. I NTRODUCTION
Mobile robots that are capable of navigating crowded
human environments in a safe, efficient, and socially appropriate manner hold promise in bringing practical robotic
assistance to a range of applications, including security patrol, emergency response, and parcel delivery. An increasing
body of research has focused on the challenging quest to
enable human-aware robot navigation, accounting for human
movements that are fast, dynamic, and follow delicate social
norms [1], [2], [3], [4]. For example, prior research has
treated humans as dynamic obstacles to avoid collisions
(e.g., [5]), investigated strategies to avoid getting stuck in
human crowds (e.g., [6]), and explored how to model social
norms to allow for socially appropriate robot navigation
(e.g., [7], [8], [9]). However, prior works have mainly treated
people as individual, independent entities in robot navigation.
The majority of people, however, walk in groups [10],
[11]; an empirical study showed that up to 70% of pedestrians in a commercial environment walked in groups [12].
Consequently, it is imperative that a mobile robot respects
human grouping (e.g., not to cut through a social group)
during its navigation in a human environment. In particular,
in this work, we consider the problem of a robot interacting
with dynamic human groups—people walking together in
groups—rather than standing groups that are commonly
seen in social events (e.g., [13]). While substantial efforts
have been made to model and understand dynamic groups
(e.g., [12], [14], [15]), how mobile robots should navigate
effectively and appropriately around dynamic human groups
is under-explored.
∗ Both
authors contributed equally to this work.
Hopkins University Applied Physics Lab, Laurel, MD, USA.
2 Dept. of Comp. Sci., Johns Hopkins University, Baltimore, MD, USA.
3 Dept. of EECS, UC Berkeley, Berkeley, CA, USA.
1 Johns
Avoid cutting through
pedestrian group
A
B
C
D
Fig. 1: The objective of this work is to learn a navigation
policy that allows the robot to safely reach its goal while
minimizing impact to individual and groups of pedestrians.
In this work, we explore robot navigation in crowds of
human groups. Our approach is to learn navigation policies
that allow the robot to safely reach its desired goal while
minimizing impact to individual and groups of pedestrians
(Fig. 1). Our contributions include:
• A reinforcement learning (RL) algorithm that combines
robot navigation performance and group-aware social
norms for learning a robust policy;
• A novel reward function that uses the convex hull of
a group as the group space to minimize impact to
pedestrian groups and improve navigation performance;
• Software extensions to the CrowdNav simulation environment [16] to support social navigation research; and
• Experimental results that demonstrate the efficacy of
our learned policy with respect to robot navigation
performance, human navigation performance, and maintenance of social norms.
II. BACKGROUND AND R ELATED W ORK
The goal of human-aware robot navigation is to enable
robots to move safely, efficiently, and socially appropriately
in natural human environments. To achieve safe and efficient
navigation, prior research has investigated reactive methods
for motion planning [17], [18] and considered modeling
pedestrian intention [19]. To realize social appropriateness,
previous works have explored learning from human data
[7], [20], [21], [22], [23], [24] and used handcrafted rules
as planning constraints [8], [25], [26], [27]. One notable
approach is the Social Force Model [28], which is based on
the proxemics theory [29] and attempts to model pedestrian
social motion using a combination of attractive and repulsive
forces. This approach has been adapted and extended for
crowd simulation [30] and robot navigation [31], [32].
While prior research on human-aware robot navigation
has mostly regarded humans as individual agents, increasing
efforts have considered how mobile robots should interact
with human social groups. We consider human social groups
as two categories—static and dynamic social groups. Static
social groups are commonly seen in social events (e.g.,
a cocktail party), where people gather together in small
groups for conversation. Dynamic social groups are groups
of people walking together, and possibly engaging in conversations during walking, toward shared destinations. Previous
research has investigated how to enable robots to recognize
[33] and approach [34], [35] static, standing social groups,
while taking account of the size and formation of the groups.
Though the detection and modeling of dynamic social
groups present additional challenges when compared to static
groups, they are critical in enabling socially appropriate
robot navigation in human crowds. Prior research has generally explored methods to capture intra-group coherence
(e.g., [36], [37]) and inter-group differences (e.g., [38])
in dynamic groups. For instance, salient turn signals in
groups of humans that share the same navigation goals can
be used to enhance trajectory prediction and subsequently
improve the social-awareness in robot path planning [39].
However, it has been shown that group properties may be
different in static and dynamic settings. For example, the
o-space formation commonly observed in static groups is
not necessarily apparent in dynamic groups [40]. To address
such difference, a set of dynamic constraints for o-space
based on the walking direction and group cohesion was
proposed for effective robot navigation [40]. Additionally,
dynamic groups also bear unique properties that mobile
robots may take advantage of during their navigation in
crowded environments. As an example, robots may “group
surf” human groups by following their movement flows [41].
Methodologically, modern machine learning techniques
have fueled the advances of human-aware robot navigation. In particular, Recurrent Neural Networks (RNNs) and
Generative Adversarial Networks (GANs) have been shown
to be able to accurately predict human motion for individuals [42], [43] and groups [15]. However, RNN and
GAN based methods only predict human motion and do
not generate navigation policies for mobile robots. Reinforcement learning approaches are increasingly used for
learning navigation policies. For example, prior works have
leveraged inverse reinforcement learning to imitate humans
and realize socially appropriate movements [13], [23]. Deep
Reinforcement Learning (DRL) has also been used for robot
navigation [44], [45]; in particular, attention-based DRL has
been demonstrated to capture human-human and humanrobot interactions in crowded environments [16], [46].
Different from these prior works, we explicitly include
group modeling, rather than a simple consideration of pairwise interactions between individuals in a crowd [45]. In
addition, our approach uses a more compact representation
of group space by computing a polygon based on the convex
hull of the pedestrians instead of the F-formation as in
prior work [40]. Moreover, our approach considers a range
of metrics including social compliance and pedestrian and
robot performance. Finally, our work uses violations of the
group space as a reward term to learn socially appropriate
movements around groups of pedestrians.
III. P RELIMINARIES
A. Problem Formulation
Our objective is to learn a controller that allows a robot to
navigate to a desired goal while maintaining social norms and
avoiding collisions with groups of pedestrians. We formulate
our approach using reinforcement learning (RL) to learn a
policy to meet the objectives stated above. In this form of a
Markov decision making process, the robot uses observations
to generate a state vector, S, and chooses an action, A, that
maximizes expectation of the future reward, R.
B. State and Action Space
The state space, S, consists of observable state information
for each pedestrian i, represented as Pedi as well as internal
state of the robot represented as Rob as described by Eq. 1.
Here, px and py are the x and y coordinates of the position,
vx and vy are the x and y coordinates of the velocity, rad is
the radius of the pedestrian or the robot, gx and gy represent
the x and y goal positions, v pref is the preferred velocity
and theta is the turn angle.
Pedi = [px , py , vx , vy , rad],
Rob = [px , py , vx , vy , rad, gx , gy , v pref, theta],
Si = [Pedi , Rob]
(1)
In our simulation, we assume a robot with holonomic
kinematics that receives vx and vy commands. The action
space is discretized into 5 speeds ranging from 0.2 to 1.0 m/s
and 16 rotations ranging from 0 to 2π plus a stop command
resulting in 81 possible actions.
C. CrowdNav Simulation Environment
We leverage the CrowdNav simulation environment [16]
for training and evaluating our group aware RL policy. This
environment provides a learning and simulation framework
that allows us to model scenes of pedestrians and robots
interacting while reaching their target goals. We extend
this framework by allowing groups of pedestrians to be
instantiated with similar starting and end goals. We further
leverage our group aware social force model described in the
next section to model the motion of the groups of pedestrians
as they interact with other pedestrians and the robot.
D. Group Aware Social Force Model
We use an extended Social Force Model (SFM) [12] to
simulate dynamic social groups. In the extended SFM, each
individual’s motion, as defined in Eq. 2, is driven by a
#»
combination of an attractive force fi des that drives them to
#»
a desired goal, the obstacle repulsive forcesPfi obs , the sum
#»
, and
of social repulsive forces from other agents j f social
ij
#»
a new group term f group
defined
by
Eq.
3.
i
dv#»i #»des #»obs X #»social #»group
=f i + f i +
f ij
+ fi
(2)
dt
j
The group term is defined as the sum of the attractive forces
#»
force between
between group members f att
i , the repulsive
#»rep
#»
group members f i , and a gaze force f gaze
that steers the
i
agents to keep the center of mass of the social group within
their vision field to simulate in-group social interactions:
#»group #»att #»rep #»gaze
fi
= fi + fi + fi
(3)
We developed a custom Python implementation of the extended Social Force Model 1 for the CrowdNav environment,
following the implementation of PEDSIM C++ library [47]
and the ROS implementation by Vasquez et al. [48].
IV. A PPROACH
To evaluate our group aware policy, we extend the existing
CrowdNav simulation environment [16] to represent pedestrian motion in groups. We accomplish this by stochastically
sampling the number of groups per episode using a Poisson
distribution (λ = 1.2) [49] and then randomly assigning
pedestrians to the groups. Each pedestrian within a group
has similar start and goal positions. The average number of
groups and group size for five pedestrians are 2.5 and 1.96,
respectively. For ten pedestrians, the number of groups and
group size increase to 4.9 and 2.0, respectively.
A. Policy based on Convex Hull of Group
To train the policy, we use a multi-term reward function
that encourages the robot to reach its goal while maintaining
social norms and avoiding collisions with groups of pedestrians. In particular, we focus on social norms that minimize
discomfort to individuals and discourage intersections with
a group of pedestrians.
Our reward function is given by Eq. 4, where dgoal is the
distance from the robot to the goal, dcoll. = 0.6 is the distance
between the centers of entities beneath which a collision is
considered to have occurred, di is the distance between the
robot and pedestrian i, ddisc. = dcoll. + 0.2 is the minimum
“comfortable” distance between a robot and a pedestrian (as
in [16]), and dj is the distance from the robot to the edge
of the convex hull surrounding group j:
R(t) =Cprog. (dgoal (t − 1) − dgoal (t))
+ Cgoal δ(dgoal (t) < dcoll. )
X
(ddisc. − di (t))δ(dcoll. ≤ di (t) ≤ ddisc. )
− Cdisc.
− Ccoll.
i
X
− Cgroup
δ(di (t) < dcoll. )
i
X
δ(dj (t) < dcoll. ).
j
(4)
1 https://github.com/yuxiang-gao/PySocialForce
The multiple objectives are weighted via the following constants: Cprog. = 0.1, Cgoal = 1.0, Cdisc. = 0.5, Ccoll. = 0.25,
and Cgroup = 0.25. The first term encourages the robot
to progress toward the goal, allowing us to remove the
initial imitation learning phase in [16]. The second, third,
and fourth terms encourage the robot to reach the goal,
avoid close encounters with pedestrians, and avoid collisions,
respectively. The last term encourages the robot to follow
group social norms by penalizing any group space violation.
To determine the dj terms, we first compute a polygon
representing the convex hull of the positions of all members
of the pedestrian group. We then calculate the minimum
distance between the robot and the polygon and penalize
the robot for intruding into this space. Note that group
membership information is only needed during training and
is not needed for evaluation or deployment of the policy.
B. Neural Network Architecture
Our overall network architecture is depicted in Fig. 2. As
in [16], we used an attention-based network architecture to
represent navigation policies. For each pedestrian, a vector of
quantities representing the pedestrian was first concatenated
with a vector representing the robot and passed through the
first multi-layer perceptron (MLP) in the network (MLP1 ).
The resulting feature vector was concatenated with the mean
value of the outputs of MLP1 for all pedestrians and used
to compute an attention score αi for each pedestrian via
MLP3 . The output of MLP1 was also passed through MLP2
to generate a separate “robot-pedestrian interaction vector”
that was then multiplied by αi to generate a weighted feature
vector ωi for each pedestrian. The ωi were summed for
all pedestrians and the result was passed through MLP4 ,
ultimately leading to separate policy and value heads. The
filter parameters for the neural network architecture are
described in Table I. In summary, our architecture matched
that of [16] without the interaction module and with 1) a
softmax layer being added to produce a categorical policy
output and 2) a single fully-connected layer with 100 neurons
connecting to a scalar value head.
C. PPO Algorithm
Our agents were trained using proximal policy optimization (PPO; [50]), a leading model-free, actor-critic approach.
Hyperparameters were chosen to mimic those used for Atari
in [50], with the exceptions of shorter windows (16 steps)
and more windows per batch (64). This change was made
to accommodate the shorter episodes of CrowdNav while
maintaining the number of experiences per batch.
D. Training Details
We used the Adam optimizer [51] with learning rate set
to 2.5 × 10−4 and epsilon set to 1.0 × 10−5 . In the RL
policy, the discount factor, γ, was set to 0.99 and the credit
assignment variable, λ, was set to 0.95. We trained our policy
for 7000 iterations yielding approximately 4.8M steps and
reaching a maximum reward of approximately 1.7 based on
the reward definition described in Eq. 4.
Fig. 2: On the left is the CrowdNav Simulation Environment [16] that provides pedestrian and robot state information as
well as the reward to the policy. The right figure represents the neural network architecture used for our attention-based,
actor-critic policy. The pedestrian and robot state vectors are concatenated to represent a pairwise combined state vector; the
output of the network are the policy π over potential actions and value V of the current state. The gray and green blocks
indicate features from individual pedestrians. The blue blocks indicate aggregate features across pedestrians. The argmax of
the policy is chosen as the action, which subsequently is sent to the CrowdNav Environment to control the robot.
Network Layer
Output Features / Activation
M LP 1
150, ReLU, 100, ReLU
M LP 2
100, ReLU, 50
M LP 3
100, ReLU, 100, ReLU, 1
M LP 4
150, ReLU, 100, ReLU, 100, ReLU
Linear1
81
Linear2
1
TABLE I: Network Layer Filter Parameters
V. E XPERIMENTAL E VALUATION
A. Experimental Setup
The goal of our experiment is to assess the efficacy of
our group-aware navigation policy. Our experiments involved
four settings determined by two factors: the number of
pedestrians and the number of groups. We explored both
5-person and 10-person settings as well as a single group
and a stochastic number of groups as described in Sec. IV.
We used the Circle Crossing scenario where groups of
pedestrians started and ended around the perimeter of a circle
(radius = 4 m) during training and evaluation. The robot’s
starting and end positions were set to ensure the robot to
go through the center of the circle and interact with the
pedestrian groups. We evaluated our trained policy on 250
trials with randomly initialized starting and ending pedestrian
positions for the four experimental settings.
B. Metrics
Our evaluation was focused on 1) robot navigation performance, 2) pedestrian navigation performance, and 3) social
compliance. For robot navigation performance, our metrics
represent the quality of the robot’s ability to navigate to the
goal quickly without collision:
• Successes: Number of trials the robot reached the goal.
• Collisions: Number of trials a collision occured.
Timeouts: Number of trials the robot did not reach the
goal within the allotted time (25 seconds).
• Time to Goal: Average of the number of seconds the
robot needed to reach the goal for all trials.
• Mean Robot Velocity: The average velocity of the
robot at each time step during all trials.
To assess pedestrian performance, we measured the impact
of the robot’s behavior on the desired pedestrian motion:
• Mean Pedestrian Velocity: The average velocity of the
pedestrians during the trials.
• Mean Pedestrian Angle: The average angular deviation
between the pedestrian’s observed motion and the direct
vector to the pedestrian’s goal. This metric seeks to
measure the disturbance from the optimal trajectory to
the goal caused by the robot’s policy.
Finally, to assess social norms, we quantified how the
robot maintained social distance among individual pedestrians and limited intersections with groups of pedestrians:
• Group Intersections: The number of groups intersections by the robot that occurred during the trials.
• Individual Discomfort: The mean distance between the
robot and the pedestrians aggregated over all pedestrians
when the robot violates the discomfort threshold.
• Pedestrian Social Force: The mean social force applied
to pedestrian i. This is equal to the sum of the forces
applied to pedestrian i from the other pedestrians
P #»and
the robot, j as described in Sec. III-D (i.e., j f ij ).
This metric captures how the robot’s motion may impact
directly or indirectly human pedestrians’ motions.
• Robot Social Force: The mean social force applied to
the robot, r from
pedestrians, j as described in
P other
#»
Sec. III-D (i.e., j f rj ). This metric captures how the
robot’s motion may be impacted by human pedestrians.
•
Ped.
Mean
TO
Coll.
Time
↓
↓
(s) ↓
Mean
Mean
Robot Vel. Ped. Vel.
(m/s) ↑
(m/s) ↑
Mean Ped.
Ang. (◦ ) ↓
Grp.
Ind.
Inters. Discomfort
↓
↓
Ped.
Social
Force ↓
Robot
Social
Force ↓
8.24*
10.54*
10.87*
8.92
0.962
0.777*
0.796*
0.964
1.170*
1.139*
1.166*
1.183
3.76
3.61
3.43
3.59
143
64
50
15
3.10*
3.06*
2.72*
1.29
0.375*
0.391*
0.349
0.351
0.523
0.545*
0.473
0.482
0
0
0
0
8.23*
9.73*
9.68*
8.81
0.964
0.797*
0.817*
0.961
1.136
1.107*
1.125*
1.146
5.99*
5.71
5.25*
5.59
151
50
56
22
2.87
3.91*
2.36
2.63
0.522*
0.539*
0.489
0.485
0.716*
0.648
0.582*
0.657
23
121
76
14
5
2
7
4
8.59*
10.92*
11.32*
9.87
0.955
0.733*
0.728*
0.956
1.161*
1.090*
1.131*
1.174
4.11
4.17*
4.02
3.93
176
85
94
29
4.20*
3.30
4.29*
2.31
0.395*
0.465*
0.413*
0.366
0.707*
0.944*
0.795*
0.597
10
109
64
9
1
0
0
0
8.72*
10.20
9.89*
10.21
0.960*
0.693*
0.772*
0.918
1.089*
1.022*
1.074
1.108*
8.09*
8.18*
7.81*
7.07
258
95
156
20
4.94*
5.71*
5.71*
2.29
0.681*
0.764*
0.684*
0.599
0.964*
0.986*
0.928*
0.849
Method
#
#
Succ.
Groups Peds. ↑
SARL
SocialNCE (w/o CL)
SocialNCE
Group Aware (ours)
1
1
1
1
5
5
5
5
237
187
213
236
11
60
31
9
2
3
6
5
SARL
SocialNCE (w/o CL)
SocialNCE
Group Aware (ours)
2.548
2.548
2.548
2.548
5
5
5
5
238
174
206
242
12
76
44
8
SARL
SocialNCE (w/o CL)
SocialNCE
Group Aware (ours)
1
1
1
1
10
10
10
10
222
127
167
232
SARL
SocialNCE (w/o CL)
SocialNCE
Group Aware (ours)
4.884
4.884
4.884
4.884
10
10
10
10
239
141
186
241
TABLE II: This table summarizes our results across 5 and 10 pedestrians. The best values are marked with bold texts.
Asterisks indicate statistically significant results (p < .05) when compared to our group aware policy. We show that our
policy is able to achieve comparable or better robot navigation performance while allowing pedestrians to achieve faster
velocities. In addition, we show our policy leads to more socially compliant navigation indicated by significantly fewer
instances of group intersection while reducing individual discomfort.
SARL
5
Nav.
Goal
Time step 8 (s=2)
Time step 16 (s=4)
Time step 24 (s=6)
Distances between
robot and pedestrian
Time step 0 (s=0)
10
Pedestrians
0
-5
Cut through
pedestrian group
Robot
Influenced group
formation
7
6
TS=0
TS=8
TS=16
TS=24
5
4
3
2
Pedestrian velocities
Group Aware
-10
10
5
0
-5
-10
10
5
0
-5
-10 10
5
0
-5
-10 10
5
0
-5
Meter
-10 10
5
0
-5
-10
1.250
1.225
1.200
1.175
1.150
1.125
1.100
1.075
1.050
0
5
10
15
20
25
30
35
Time step
Fig. 3: The figure on the left shows a representative example of the robot navigating through the crowd of pedestrians over
time using both the SARL and the group aware policy. The SARL policy chooses actions that cut through the group of
pedestrians and influences the group formation, while our group aware policy chooses actions that move around the group
with minimal disturbance. The figure on the right shows the average distance between the pedestrian and the robot (top)
and the average pedestrian velocity (bottom) over time. Here, we show the group aware policy results in increased distance
to the pedestrians while allowing the pedestrians to maintain faster speeds.
C. Results
We conducted independent two-tailed t-tests to compare
our group-aware to the following baseline policies.
• SARL: Crowd aware RL policy that uses an attention
mechanism to model human robot interaction to encourage socially compliant navigation [16].
• SocialNCE (w/o CL): Social NCE policy without contrastive loss. This policy uses behavior cloning using a
trained SARL policy as the expert [52].
• SocialNCE: Social NCE policy. This policy uses a contrastive loss function to represent ‘negative’ examples
such as collisions to improve social compliance [52].
Table II summarizes the robot and pedestrian navigation
performance as well as the social compliance results. Overall,
the group aware policy generally led to higher number of
successful trials, while allowing the pedestrians to travel at
faster speeds with less disturbance towards the goal. The
group aware policy led to a significant fewer number of
instances where the robot navigated through a group. The
group aware policy also resulted in an overall less individual
discomfort. Fianlly, we observe that our policy improved the
overall social forces applied to the pedestrians and robot.
We note that the robot with the group aware policy took
longer to reach the goal compared to the SARL policy.
However, by and large, we do not see significant differences
in robot speed between the two approaches. The group-aware
robot took a longer path to its goal due to its preference
of navigating around groups of pedestrian to avoid group
intersections, whereas the SARL robot aimed to reach its
goal in spite of cutting in between groups of pedestrians.
Fig. 3 illustrates an example of such behavior—the SARL
policy chose a path that cut through the pedestrian group
whereas, in the same scenario, our group-aware policy chose
a path around the group. The resulted group-aware behavior
ultimately enabled greater group cohesion and less disruption, while improving group and individual discomfort. We
additionally computed the distances between the pedestrians
and the robot for both policies (Fig. 3 top-right), as well as
the velocities of the pedestrians (Fig. 3 bottom-right), over
time. During interaction between the robot and the pedestrians, we observe that the distance between the pedestrians
and the robot was lower for the SARL policy corresponding
to the results reported in Table II. Further, we see that the
average pedestrian velocity decreased substantially during
the times of interaction in the SARL policy; however, we
do not see similar decreases in our group-aware policy.
D. Hardware Experiments
To demonstrate the applicability of our group aware policy
to real-world systems, we implemented our policy on the
Boston Dynamics Spot robot. The policy was trained entirely
in the CrowdSim environment before the demonstration,
then evaluated on the Spot robot as a proof of concept.
In this demonstration, the robot’s start and goal positions
were predetermined such that the robot reached a point
approximately six meters in front of its startup position.
The radius and preferred velocity of the robot and all of the
pedestrians were approximated ahead of time and set when
the robot and pedestrian objects were defined.
During the demonstration, the position and orientation
of the robot and each of the pedestrians was determined
using an OptiTrack motion capture system, then the velocity
of each agent was approximated appropriately. While the
robot was outside of a small radius of the goal position, the
observable state of the pedestrians and the full state of the
robot was determined and used as input to our group aware
policy. The policy then provided the predicted best action as
an x and y velocity, which was mapped to a ROS command
used to control the robot. We decreased the speed in both the
x and y direction by a factor of two to ensure that the robot
moved at an appropriate speed to compensate for hardware
limitations.
As illustrated in Fig. 1, we demonstrated that the policy
worked on a real-world robotic system by having pedestrians
move around the robot in various different formations in a
way that interfered with the path of the robot. To demonstrate
the effectiveness of the policy, the pedestrians followed paths
that varied from the simple circle crossing movements seen
by the simulated robot during training. Videos of several
of these demonstrations can be found in the supplementary
video attachment.
VI. D ISCUSSION
Towards achieving socially appropriate robot navigation in
human environments, this paper explores group-aware behaviors that respect pedestrian group formations and trajectories,
while minimally sacrificing robot navigation performance.
Our approach utilizes deep reinforcement learning and considers group formation during training. Our results show
that the learned policy is able to achieve higher number
of successful trials in which the robot reached the goal,
fewer collisions, and less impact to the pedestrian’s motion
towards their goal. In addition, we show that our learned
policy not only reduced the number of group violations (e.g.,
cutting through the group) but also decreased the individual
discomfort and social forces applied to the pedestrians and
robot. Our approach, however, resulted in an increase of the
robot’s total time to goal compared to the SARL baseline
that did not consider group formation. This increase of robot
navigation time was expected as the robot sought to move
around groups as opposed to navigating through them (Fig.
3). However, our results show that even though the total
time to goal increased, the average velocity of the robot was
mostly unaffected by the group aware policy.
Our exploration indicates several directions of future research. First, we would like to determine how well our
learned policy reflects actual human motion through groups
of pedestrians. Second, we would like to investigate whether
we can bootstrap our learned policy with imitation learning using observations of humans navigating groups of
pedestrians. Third, we would like to investigate different
representations of group space beyond the convex hull approach described in this paper. We speculate that considering additional parameters, such as social interaction during
movement, the specific formation of the group, and environmental cues (e.g., social space), may contribute to learning
more socially compliant navigation policies. Additionally,
while this paper focuses mostly on robot and pedestrian
navigation performance, as well as group intersections and
discomfort, there are other factors to consider for socially
appropriate behavior, such as how to pass and follow human
groups. Finally, while learning a single policy works for a
limited number of environments, choosing amongst a library
of policies depending on the density of pedestrians, the
layout of the environment, and the local culture can lead to
better navigation performance once deployed. For example,
respecting human grouping may not always be possible (e.g.,
in a narrow corridor). It is therefore important for a mobile
robot to selectively choose a context suitable policy in order
to achieve efficient, safe, and socially appropriate navigation
in crowded human environments.
ACKNOWLEDGMENTS
This work was supported by the Johns Hopkins University
Institute for Assured Autonomy.
R EFERENCES
[1] K. Charalampous, I. Kostavelis, and A. Gasteratos, “Recent trends in
social aware robot navigation: A survey,” Robotics and Autonomous
Systems, vol. 93, pp. 85–104, 2017.
[2] A. Pandey, S. Pandey, and D. Parhi, “Mobile robot navigation and
obstacle avoidance techniques: A review,” Int Rob Auto J, vol. 2, no. 3,
p. 00022, 2017.
[3] T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch, “Human-aware robot
navigation: A survey,” Robotics and Autonomous Systems, vol. 61,
no. 12, pp. 1726–1743, 2013.
[4] J. Rios-Martinez, A. Spalanzani, and C. Laugier, “From proxemics
theory to socially-aware navigation: A survey,” International Journal
of Social Robotics, vol. 7, no. 2, pp. 137–153, 2015.
[5] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to
collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4,
no. 1, pp. 23–33, 1997.
[6] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation
in dense human crowds: Statistical models and experimental studies
of human–robot cooperation,” The International Journal of Robotics
Research, vol. 34, no. 3, pp. 335–356, 2015.
[7] A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning
social etiquette: Human trajectory understanding in crowded scenes,”
in European conference on computer vision. Springer, 2016, pp.
549–565.
[8] A. Bera, T. Randhavane, R. Prinja, and D. Manocha, “Sociosense:
Robot navigation amongst pedestrians with social and psychological
constraints,” in 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS). IEEE, 2017, pp. 7018–7025.
[9] R. Mead and M. J. Matarić, “Autonomous human–robot proxemics:
socially aware navigation based on interaction potential,” Autonomous
Robots, vol. 41, no. 5, pp. 1189–1201, 2017.
[10] M. Costa, “Interpersonal distances in group walking,” Journal of
Nonverbal Behavior, vol. 34, no. 1, pp. 15–26, 2010.
[11] A. F. Aveni, “The not-so-lonely crowd: Friendship groups in collective
behavior,” Sociometry, pp. 96–99, 1977.
[12] M. Moussaı̈d, N. Perozo, S. Garnier, D. Helbing, and G. Theraulaz,
“The walking behaviour of pedestrian social groups and its impact on
crowd dynamics,” PloS one, vol. 5, no. 4, p. e10047, 2010.
[13] B. Okal and K. O. Arras, “Learning socially normative robot navigation behaviors with bayesian inverse reinforcement learning,” in 2016
IEEE International Conference on Robotics and Automation (ICRA).
IEEE, 2016, pp. 2889–2895.
[14] Z. Yücel, F. Zanlungo, T. Ikeda, T. Miyashita, and N. Hagita, “Deciphering the crowd: Modeling and identification of pedestrian group
motion,” Sensors, vol. 13, no. 1, pp. 875–897, 2013.
[15] N. Bisagno, B. Zhang, and N. Conci, “Group lstm: Group trajectory
prediction in crowded scenarios,” in Proceedings of the European
conference on computer vision (ECCV), 2018, pp. 0–0.
[16] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction:
Crowd-aware robot navigation with attention-based deep reinforcement learning,” 2019 International Conference on Robotics and Automation (ICRA), pp. 6015–6022, 2019.
[17] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal
n-body collision avoidance,” in Robotics research. Springer, 2011,
pp. 3–19.
[18] Y. Luo, P. Cai, A. Bera, D. Hsu, W. S. Lee, and D. Manocha,
“Porca: Modeling and planning for autonomous driving among many
pedestrians,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp.
3418–3425, 2018.
[19] K. D. Katyal, G. D. Hager, and C.-M. Huang, “Intent-aware pedestrian
prediction for adaptive crowd navigation,” in 2020 IEEE International
Conference on Robotics and Automation (ICRA). IEEE, 2020, pp.
3277–3283.
[20] M. Luber, L. Spinello, J. Silva, and K. O. Arras, “Socially-aware robot
navigation: A learning approach,” in 2012 IEEE/RSJ International
Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 902–
907.
[21] M. Shiomi, F. Zanlungo, K. Hayashi, and T. Kanda, “Towards a
socially acceptable collision avoidance for a mobile robot navigating
among pedestrians using a pedestrian model,” International Journal
of Social Robotics, vol. 6, no. 3, pp. 443–455, 2014.
[22] M. Sebastian, S. B. Banisetty, and D. Feil-Seifer, “Socially-aware
navigation planner using models of human-human interaction,” in 2017
26th IEEE International Symposium on Robot and Human Interactive
Communication (RO-MAN). IEEE, 2017, pp. 405–410.
[23] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially
compliant mobile robot navigation via inverse reinforcement learning,”
The International Journal of Robotics Research, vol. 35, no. 11,
pp. 1289–1307, 2016. [Online]. Available: https://doi.org/10.1177/
0278364915619772
[24] S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool, “You’ll never
walk alone: Modeling social behavior for multi-target tracking,” in
2009 IEEE 12th International Conference on Computer Vision. IEEE,
2009, pp. 261–268.
[25] C. Johnson and B. Kuipers, “Socially-aware navigation using topological maps and social norm learning,” in Proceedings of the 2018
AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 151–157.
[26] R. Kirby, R. Simmons, and J. Forlizzi, “Companion: A constraintoptimizing method for person-acceptable navigation,” in RO-MAN
2009-The 18th IEEE International Symposium on Robot and Human
Interactive Communication. IEEE, 2009, pp. 607–612.
[27] X.-T. Truong and T.-D. Ngo, “Dynamic social zone based mobile
robot navigation for human comfortable safety in social environments,”
International Journal of Social Robotics, vol. 8, no. 5, pp. 663–684,
2016.
[28] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995.
[29] E. T. Hall, The hidden dimension. Garden City, NY: Doubleday,
1966, vol. 609.
[30] F. Farina, D. Fontanelli, A. Garulli, A. Giannitrapani, and D. Prattichizzo, “Walking ahead: The headed social force model,” PloS one,
vol. 12, no. 1, p. e0169734, 2017.
[31] G. Ferrer and A. Sanfeliu, “Proactive kinodynamic planning using
the extended social force model and human motion prediction in
urban environments,” in 2014 IEEE/RSJ International Conference on
Intelligent Robots and Systems. IEEE, 2014, pp. 1730–1735.
[32] P. Ratsamee, Y. Mae, K. Ohara, T. Takubo, and T. Arai, “Human–
robot collision avoidance using a modified social force model with
body pose and face orientation,” International Journal of Humanoid
Robotics, vol. 10, no. 01, p. 1350008, 2013.
[33] M. Swofford, J. C. Peruzzi, N. Tsoi, S. Thompson, R. Martı́nMartı́n, S. Savarese, and M. Vázquez, “Improving social awareness
through dante: A deep affinity network for clustering conversational
interactants,” arXiv preprint arXiv:1907.12910, 2019.
[34] Y. Kato, T. Kanda, and H. Ishiguro, “May i help you?-design of
human-like polite approaching behavior,” in 2015 10th ACM/IEEE
International Conference on Human-Robot Interaction (HRI). IEEE,
2015, pp. 35–42.
[35] J. V. Gómez, N. Mavridis, and S. Garrido, “Fast marching solution
for the social path planning problem,” in 2014 IEEE International
Conference on Robotics and Automation (ICRA). IEEE, 2014, pp.
1871–1876.
[36] A. Taylor, D. M. Chan, and L. D. Riek, “Robot-centric perception
of human groups,” ACM Transactions on Human-Robot Interaction
(THRI), vol. 9, no. 3, pp. 1–21, 2020.
[37] M. Luber and K. O. Arras, “Multi-hypothesis social grouping and
tracking for mobile robots.” in Robotics: Science and Systems, 2013.
[38] J. Shao, C. Change Loy, and X. Wang, “Scene-independent group
profiling in crowd,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2014, pp. 2219–2226.
[39] V. V. Unhelkar, C. Pérez-D’Arpino, L. Stirling, and J. A. Shah,
“Human-robot co-navigation using anticipatory indicators of human
walking motion,” in 2015 IEEE International Conference on Robotics
and Automation (ICRA). IEEE, 2015, pp. 6183–6190.
[40] F. Yang and C. Peters, “Social-aware navigation in crowds with
static and dynamic groups,” in 2019 11th International Conference
on Virtual Worlds and Games for Serious Applications (VS-Games),
2019, pp. 1–4.
[41] Y. Du, N. J. Hetherington, C. L. Oon, W. P. Chan, C. P. Quintero,
E. Croft, and H. M. Van der Loos, “Group surfing: A pedestrianbased approach to sidewalk robot navigation,” in 2019 International
Conference on Robotics and Automation (ICRA). IEEE, 2019, pp.
6518–6524.
[42] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and
S. Savarese, “Social lstm: Human trajectory prediction in crowded
spaces,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 961–971.
[43] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social
gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2018, pp. 2255–2264.
[44] L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement
learning: Continuous control of mobile robots for mapless navigation,”
in 2017 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS). IEEE, 2017, pp. 31–36.
[45] C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational graph
learning for crowd navigation,” arXiv preprint arXiv:1909.13165,
2019.
[46] Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized noncommunicating multiagent collision avoidance with deep reinforcement learning,” in 2017 IEEE international conference on robotics
and automation (ICRA). IEEE, 2017, pp. 285–292.
[47] C. Gloor, “Pedsim: A microscopic pedestrian crowd simulation system,” 2003.
[48] D. Vasquez, B. Okal, and K. O. Arras, “Inverse reinforcement learning
algorithms and features for robot navigation in crowds: an experi-
[49]
[50]
[51]
[52]
mental comparison,” in 2014 IEEE/RSJ International Conference on
Intelligent Robots and Systems. IEEE, 2014, pp. 1341–1346.
J. S. Coleman and J. James, “The equilibrium size distribution of
freely-forming groups,” Sociometry, vol. 24, no. 1, pp. 36–45, 1961.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
“Proximal policy optimization algorithms,” 2017, arXiv preprint
arXiv:1707.06347.
D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” in 3rd International Conference on Learning
Representations, ICLR 2015, San Diego, CA, USA, May 7-9,
2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds.,
2015. [Online]. Available: http://arxiv.org/abs/1412.6980
Y. Liu, Q. Yan, and A. Alahi, “Social nce: Contrastive learning of socially-aware motion representations,” arXiv preprint
arXiv:2012.11717, 2020.