Academia.eduAcademia.edu

Learning a Group-Aware Policy for Robot Navigation

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Human-aware robot navigation promises a range of applications in which mobile robots bring versatile assistance to people in common human environments. While prior research has mostly focused on modeling pedestrians as independent, intentional individuals, people move in groups; consequently, it is imperative for mobile robots to respect human groups when navigating around people. This paper explores learning group-aware navigation policies based on dynamic group formation using deep reinforcement learning. Through simulation experiments, we show that group-aware policies, compared to baseline policies that neglect human groups, achieve greater robot navigation performance (e.g., fewer collisions), minimize violation of social norms and discomfort, and reduce the robot's movement impact on pedestrians. Our results contribute to the development of social navigation and the integration of mobile robots into human environments.

Learning a Group-Aware Policy for Robot Navigation arXiv:2012.12291v2 [cs.RO] 17 Nov 2021 Kapil Katyal∗1,2 , Yuxiang Gao∗2 , Jared Markowitz1 , Sara Pohland3 , Corban Rivera1 , I-Jeng Wang1 , Chien-Ming Huang2 Abstract— Human-aware robot navigation promises a range of applications in which mobile robots bring versatile assistance to people in common human environments. While prior research has mostly focused on modeling pedestrians as independent, intentional individuals, people move in groups; consequently, it is imperative for mobile robots to respect human groups when navigating around people. This paper explores learning group-aware navigation policies based on dynamic group formation using deep reinforcement learning. Through simulation experiments, we show that group-aware policies, compared to baseline policies that neglect human groups, achieve greater robot navigation performance (e.g., fewer collisions), minimize violation of social norms and discomfort, and reduce the robot’s movement impact on pedestrians. Our results contribute to the development of social navigation and the integration of mobile robots into human environments. I. I NTRODUCTION Mobile robots that are capable of navigating crowded human environments in a safe, efficient, and socially appropriate manner hold promise in bringing practical robotic assistance to a range of applications, including security patrol, emergency response, and parcel delivery. An increasing body of research has focused on the challenging quest to enable human-aware robot navigation, accounting for human movements that are fast, dynamic, and follow delicate social norms [1], [2], [3], [4]. For example, prior research has treated humans as dynamic obstacles to avoid collisions (e.g., [5]), investigated strategies to avoid getting stuck in human crowds (e.g., [6]), and explored how to model social norms to allow for socially appropriate robot navigation (e.g., [7], [8], [9]). However, prior works have mainly treated people as individual, independent entities in robot navigation. The majority of people, however, walk in groups [10], [11]; an empirical study showed that up to 70% of pedestrians in a commercial environment walked in groups [12]. Consequently, it is imperative that a mobile robot respects human grouping (e.g., not to cut through a social group) during its navigation in a human environment. In particular, in this work, we consider the problem of a robot interacting with dynamic human groups—people walking together in groups—rather than standing groups that are commonly seen in social events (e.g., [13]). While substantial efforts have been made to model and understand dynamic groups (e.g., [12], [14], [15]), how mobile robots should navigate effectively and appropriately around dynamic human groups is under-explored. ∗ Both authors contributed equally to this work. Hopkins University Applied Physics Lab, Laurel, MD, USA. 2 Dept. of Comp. Sci., Johns Hopkins University, Baltimore, MD, USA. 3 Dept. of EECS, UC Berkeley, Berkeley, CA, USA. 1 Johns Avoid cutting through pedestrian group A B C D Fig. 1: The objective of this work is to learn a navigation policy that allows the robot to safely reach its goal while minimizing impact to individual and groups of pedestrians. In this work, we explore robot navigation in crowds of human groups. Our approach is to learn navigation policies that allow the robot to safely reach its desired goal while minimizing impact to individual and groups of pedestrians (Fig. 1). Our contributions include: • A reinforcement learning (RL) algorithm that combines robot navigation performance and group-aware social norms for learning a robust policy; • A novel reward function that uses the convex hull of a group as the group space to minimize impact to pedestrian groups and improve navigation performance; • Software extensions to the CrowdNav simulation environment [16] to support social navigation research; and • Experimental results that demonstrate the efficacy of our learned policy with respect to robot navigation performance, human navigation performance, and maintenance of social norms. II. BACKGROUND AND R ELATED W ORK The goal of human-aware robot navigation is to enable robots to move safely, efficiently, and socially appropriately in natural human environments. To achieve safe and efficient navigation, prior research has investigated reactive methods for motion planning [17], [18] and considered modeling pedestrian intention [19]. To realize social appropriateness, previous works have explored learning from human data [7], [20], [21], [22], [23], [24] and used handcrafted rules as planning constraints [8], [25], [26], [27]. One notable approach is the Social Force Model [28], which is based on the proxemics theory [29] and attempts to model pedestrian social motion using a combination of attractive and repulsive forces. This approach has been adapted and extended for crowd simulation [30] and robot navigation [31], [32]. While prior research on human-aware robot navigation has mostly regarded humans as individual agents, increasing efforts have considered how mobile robots should interact with human social groups. We consider human social groups as two categories—static and dynamic social groups. Static social groups are commonly seen in social events (e.g., a cocktail party), where people gather together in small groups for conversation. Dynamic social groups are groups of people walking together, and possibly engaging in conversations during walking, toward shared destinations. Previous research has investigated how to enable robots to recognize [33] and approach [34], [35] static, standing social groups, while taking account of the size and formation of the groups. Though the detection and modeling of dynamic social groups present additional challenges when compared to static groups, they are critical in enabling socially appropriate robot navigation in human crowds. Prior research has generally explored methods to capture intra-group coherence (e.g., [36], [37]) and inter-group differences (e.g., [38]) in dynamic groups. For instance, salient turn signals in groups of humans that share the same navigation goals can be used to enhance trajectory prediction and subsequently improve the social-awareness in robot path planning [39]. However, it has been shown that group properties may be different in static and dynamic settings. For example, the o-space formation commonly observed in static groups is not necessarily apparent in dynamic groups [40]. To address such difference, a set of dynamic constraints for o-space based on the walking direction and group cohesion was proposed for effective robot navigation [40]. Additionally, dynamic groups also bear unique properties that mobile robots may take advantage of during their navigation in crowded environments. As an example, robots may “group surf” human groups by following their movement flows [41]. Methodologically, modern machine learning techniques have fueled the advances of human-aware robot navigation. In particular, Recurrent Neural Networks (RNNs) and Generative Adversarial Networks (GANs) have been shown to be able to accurately predict human motion for individuals [42], [43] and groups [15]. However, RNN and GAN based methods only predict human motion and do not generate navigation policies for mobile robots. Reinforcement learning approaches are increasingly used for learning navigation policies. For example, prior works have leveraged inverse reinforcement learning to imitate humans and realize socially appropriate movements [13], [23]. Deep Reinforcement Learning (DRL) has also been used for robot navigation [44], [45]; in particular, attention-based DRL has been demonstrated to capture human-human and humanrobot interactions in crowded environments [16], [46]. Different from these prior works, we explicitly include group modeling, rather than a simple consideration of pairwise interactions between individuals in a crowd [45]. In addition, our approach uses a more compact representation of group space by computing a polygon based on the convex hull of the pedestrians instead of the F-formation as in prior work [40]. Moreover, our approach considers a range of metrics including social compliance and pedestrian and robot performance. Finally, our work uses violations of the group space as a reward term to learn socially appropriate movements around groups of pedestrians. III. P RELIMINARIES A. Problem Formulation Our objective is to learn a controller that allows a robot to navigate to a desired goal while maintaining social norms and avoiding collisions with groups of pedestrians. We formulate our approach using reinforcement learning (RL) to learn a policy to meet the objectives stated above. In this form of a Markov decision making process, the robot uses observations to generate a state vector, S, and chooses an action, A, that maximizes expectation of the future reward, R. B. State and Action Space The state space, S, consists of observable state information for each pedestrian i, represented as Pedi as well as internal state of the robot represented as Rob as described by Eq. 1. Here, px and py are the x and y coordinates of the position, vx and vy are the x and y coordinates of the velocity, rad is the radius of the pedestrian or the robot, gx and gy represent the x and y goal positions, v pref is the preferred velocity and theta is the turn angle. Pedi = [px , py , vx , vy , rad], Rob = [px , py , vx , vy , rad, gx , gy , v pref, theta], Si = [Pedi , Rob] (1) In our simulation, we assume a robot with holonomic kinematics that receives vx and vy commands. The action space is discretized into 5 speeds ranging from 0.2 to 1.0 m/s and 16 rotations ranging from 0 to 2π plus a stop command resulting in 81 possible actions. C. CrowdNav Simulation Environment We leverage the CrowdNav simulation environment [16] for training and evaluating our group aware RL policy. This environment provides a learning and simulation framework that allows us to model scenes of pedestrians and robots interacting while reaching their target goals. We extend this framework by allowing groups of pedestrians to be instantiated with similar starting and end goals. We further leverage our group aware social force model described in the next section to model the motion of the groups of pedestrians as they interact with other pedestrians and the robot. D. Group Aware Social Force Model We use an extended Social Force Model (SFM) [12] to simulate dynamic social groups. In the extended SFM, each individual’s motion, as defined in Eq. 2, is driven by a #» combination of an attractive force fi des that drives them to #» a desired goal, the obstacle repulsive forcesPfi obs , the sum #» , and of social repulsive forces from other agents j f social ij #» a new group term f group defined by Eq. 3. i dv#»i #»des #»obs X #»social #»group =f i + f i + f ij + fi (2) dt j The group term is defined as the sum of the attractive forces #» force between between group members f att i , the repulsive #»rep #» group members f i , and a gaze force f gaze that steers the i agents to keep the center of mass of the social group within their vision field to simulate in-group social interactions: #»group #»att #»rep #»gaze fi = fi + fi + fi (3) We developed a custom Python implementation of the extended Social Force Model 1 for the CrowdNav environment, following the implementation of PEDSIM C++ library [47] and the ROS implementation by Vasquez et al. [48]. IV. A PPROACH To evaluate our group aware policy, we extend the existing CrowdNav simulation environment [16] to represent pedestrian motion in groups. We accomplish this by stochastically sampling the number of groups per episode using a Poisson distribution (λ = 1.2) [49] and then randomly assigning pedestrians to the groups. Each pedestrian within a group has similar start and goal positions. The average number of groups and group size for five pedestrians are 2.5 and 1.96, respectively. For ten pedestrians, the number of groups and group size increase to 4.9 and 2.0, respectively. A. Policy based on Convex Hull of Group To train the policy, we use a multi-term reward function that encourages the robot to reach its goal while maintaining social norms and avoiding collisions with groups of pedestrians. In particular, we focus on social norms that minimize discomfort to individuals and discourage intersections with a group of pedestrians. Our reward function is given by Eq. 4, where dgoal is the distance from the robot to the goal, dcoll. = 0.6 is the distance between the centers of entities beneath which a collision is considered to have occurred, di is the distance between the robot and pedestrian i, ddisc. = dcoll. + 0.2 is the minimum “comfortable” distance between a robot and a pedestrian (as in [16]), and dj is the distance from the robot to the edge of the convex hull surrounding group j: R(t) =Cprog. (dgoal (t − 1) − dgoal (t)) + Cgoal δ(dgoal (t) < dcoll. ) X (ddisc. − di (t))δ(dcoll. ≤ di (t) ≤ ddisc. ) − Cdisc. − Ccoll. i X − Cgroup δ(di (t) < dcoll. ) i X δ(dj (t) < dcoll. ). j (4) 1 https://github.com/yuxiang-gao/PySocialForce The multiple objectives are weighted via the following constants: Cprog. = 0.1, Cgoal = 1.0, Cdisc. = 0.5, Ccoll. = 0.25, and Cgroup = 0.25. The first term encourages the robot to progress toward the goal, allowing us to remove the initial imitation learning phase in [16]. The second, third, and fourth terms encourage the robot to reach the goal, avoid close encounters with pedestrians, and avoid collisions, respectively. The last term encourages the robot to follow group social norms by penalizing any group space violation. To determine the dj terms, we first compute a polygon representing the convex hull of the positions of all members of the pedestrian group. We then calculate the minimum distance between the robot and the polygon and penalize the robot for intruding into this space. Note that group membership information is only needed during training and is not needed for evaluation or deployment of the policy. B. Neural Network Architecture Our overall network architecture is depicted in Fig. 2. As in [16], we used an attention-based network architecture to represent navigation policies. For each pedestrian, a vector of quantities representing the pedestrian was first concatenated with a vector representing the robot and passed through the first multi-layer perceptron (MLP) in the network (MLP1 ). The resulting feature vector was concatenated with the mean value of the outputs of MLP1 for all pedestrians and used to compute an attention score αi for each pedestrian via MLP3 . The output of MLP1 was also passed through MLP2 to generate a separate “robot-pedestrian interaction vector” that was then multiplied by αi to generate a weighted feature vector ωi for each pedestrian. The ωi were summed for all pedestrians and the result was passed through MLP4 , ultimately leading to separate policy and value heads. The filter parameters for the neural network architecture are described in Table I. In summary, our architecture matched that of [16] without the interaction module and with 1) a softmax layer being added to produce a categorical policy output and 2) a single fully-connected layer with 100 neurons connecting to a scalar value head. C. PPO Algorithm Our agents were trained using proximal policy optimization (PPO; [50]), a leading model-free, actor-critic approach. Hyperparameters were chosen to mimic those used for Atari in [50], with the exceptions of shorter windows (16 steps) and more windows per batch (64). This change was made to accommodate the shorter episodes of CrowdNav while maintaining the number of experiences per batch. D. Training Details We used the Adam optimizer [51] with learning rate set to 2.5 × 10−4 and epsilon set to 1.0 × 10−5 . In the RL policy, the discount factor, γ, was set to 0.99 and the credit assignment variable, λ, was set to 0.95. We trained our policy for 7000 iterations yielding approximately 4.8M steps and reaching a maximum reward of approximately 1.7 based on the reward definition described in Eq. 4. Fig. 2: On the left is the CrowdNav Simulation Environment [16] that provides pedestrian and robot state information as well as the reward to the policy. The right figure represents the neural network architecture used for our attention-based, actor-critic policy. The pedestrian and robot state vectors are concatenated to represent a pairwise combined state vector; the output of the network are the policy π over potential actions and value V of the current state. The gray and green blocks indicate features from individual pedestrians. The blue blocks indicate aggregate features across pedestrians. The argmax of the policy is chosen as the action, which subsequently is sent to the CrowdNav Environment to control the robot. Network Layer Output Features / Activation M LP 1 150, ReLU, 100, ReLU M LP 2 100, ReLU, 50 M LP 3 100, ReLU, 100, ReLU, 1 M LP 4 150, ReLU, 100, ReLU, 100, ReLU Linear1 81 Linear2 1 TABLE I: Network Layer Filter Parameters V. E XPERIMENTAL E VALUATION A. Experimental Setup The goal of our experiment is to assess the efficacy of our group-aware navigation policy. Our experiments involved four settings determined by two factors: the number of pedestrians and the number of groups. We explored both 5-person and 10-person settings as well as a single group and a stochastic number of groups as described in Sec. IV. We used the Circle Crossing scenario where groups of pedestrians started and ended around the perimeter of a circle (radius = 4 m) during training and evaluation. The robot’s starting and end positions were set to ensure the robot to go through the center of the circle and interact with the pedestrian groups. We evaluated our trained policy on 250 trials with randomly initialized starting and ending pedestrian positions for the four experimental settings. B. Metrics Our evaluation was focused on 1) robot navigation performance, 2) pedestrian navigation performance, and 3) social compliance. For robot navigation performance, our metrics represent the quality of the robot’s ability to navigate to the goal quickly without collision: • Successes: Number of trials the robot reached the goal. • Collisions: Number of trials a collision occured. Timeouts: Number of trials the robot did not reach the goal within the allotted time (25 seconds). • Time to Goal: Average of the number of seconds the robot needed to reach the goal for all trials. • Mean Robot Velocity: The average velocity of the robot at each time step during all trials. To assess pedestrian performance, we measured the impact of the robot’s behavior on the desired pedestrian motion: • Mean Pedestrian Velocity: The average velocity of the pedestrians during the trials. • Mean Pedestrian Angle: The average angular deviation between the pedestrian’s observed motion and the direct vector to the pedestrian’s goal. This metric seeks to measure the disturbance from the optimal trajectory to the goal caused by the robot’s policy. Finally, to assess social norms, we quantified how the robot maintained social distance among individual pedestrians and limited intersections with groups of pedestrians: • Group Intersections: The number of groups intersections by the robot that occurred during the trials. • Individual Discomfort: The mean distance between the robot and the pedestrians aggregated over all pedestrians when the robot violates the discomfort threshold. • Pedestrian Social Force: The mean social force applied to pedestrian i. This is equal to the sum of the forces applied to pedestrian i from the other pedestrians P #»and the robot, j as described in Sec. III-D (i.e., j f ij ). This metric captures how the robot’s motion may impact directly or indirectly human pedestrians’ motions. • Robot Social Force: The mean social force applied to the robot, r from pedestrians, j as described in P other #» Sec. III-D (i.e., j f rj ). This metric captures how the robot’s motion may be impacted by human pedestrians. • Ped. Mean TO Coll. Time ↓ ↓ (s) ↓ Mean Mean Robot Vel. Ped. Vel. (m/s) ↑ (m/s) ↑ Mean Ped. Ang. (◦ ) ↓ Grp. Ind. Inters. Discomfort ↓ ↓ Ped. Social Force ↓ Robot Social Force ↓ 8.24* 10.54* 10.87* 8.92 0.962 0.777* 0.796* 0.964 1.170* 1.139* 1.166* 1.183 3.76 3.61 3.43 3.59 143 64 50 15 3.10* 3.06* 2.72* 1.29 0.375* 0.391* 0.349 0.351 0.523 0.545* 0.473 0.482 0 0 0 0 8.23* 9.73* 9.68* 8.81 0.964 0.797* 0.817* 0.961 1.136 1.107* 1.125* 1.146 5.99* 5.71 5.25* 5.59 151 50 56 22 2.87 3.91* 2.36 2.63 0.522* 0.539* 0.489 0.485 0.716* 0.648 0.582* 0.657 23 121 76 14 5 2 7 4 8.59* 10.92* 11.32* 9.87 0.955 0.733* 0.728* 0.956 1.161* 1.090* 1.131* 1.174 4.11 4.17* 4.02 3.93 176 85 94 29 4.20* 3.30 4.29* 2.31 0.395* 0.465* 0.413* 0.366 0.707* 0.944* 0.795* 0.597 10 109 64 9 1 0 0 0 8.72* 10.20 9.89* 10.21 0.960* 0.693* 0.772* 0.918 1.089* 1.022* 1.074 1.108* 8.09* 8.18* 7.81* 7.07 258 95 156 20 4.94* 5.71* 5.71* 2.29 0.681* 0.764* 0.684* 0.599 0.964* 0.986* 0.928* 0.849 Method # # Succ. Groups Peds. ↑ SARL SocialNCE (w/o CL) SocialNCE Group Aware (ours) 1 1 1 1 5 5 5 5 237 187 213 236 11 60 31 9 2 3 6 5 SARL SocialNCE (w/o CL) SocialNCE Group Aware (ours) 2.548 2.548 2.548 2.548 5 5 5 5 238 174 206 242 12 76 44 8 SARL SocialNCE (w/o CL) SocialNCE Group Aware (ours) 1 1 1 1 10 10 10 10 222 127 167 232 SARL SocialNCE (w/o CL) SocialNCE Group Aware (ours) 4.884 4.884 4.884 4.884 10 10 10 10 239 141 186 241 TABLE II: This table summarizes our results across 5 and 10 pedestrians. The best values are marked with bold texts. Asterisks indicate statistically significant results (p < .05) when compared to our group aware policy. We show that our policy is able to achieve comparable or better robot navigation performance while allowing pedestrians to achieve faster velocities. In addition, we show our policy leads to more socially compliant navigation indicated by significantly fewer instances of group intersection while reducing individual discomfort. SARL 5 Nav. Goal Time step 8 (s=2) Time step 16 (s=4) Time step 24 (s=6) Distances between robot and pedestrian Time step 0 (s=0) 10 Pedestrians 0 -5 Cut through pedestrian group Robot Influenced group formation 7 6 TS=0 TS=8 TS=16 TS=24 5 4 3 2 Pedestrian velocities Group Aware -10 10 5 0 -5 -10 10 5 0 -5 -10 10 5 0 -5 -10 10 5 0 -5 Meter -10 10 5 0 -5 -10 1.250 1.225 1.200 1.175 1.150 1.125 1.100 1.075 1.050 0 5 10 15 20 25 30 35 Time step Fig. 3: The figure on the left shows a representative example of the robot navigating through the crowd of pedestrians over time using both the SARL and the group aware policy. The SARL policy chooses actions that cut through the group of pedestrians and influences the group formation, while our group aware policy chooses actions that move around the group with minimal disturbance. The figure on the right shows the average distance between the pedestrian and the robot (top) and the average pedestrian velocity (bottom) over time. Here, we show the group aware policy results in increased distance to the pedestrians while allowing the pedestrians to maintain faster speeds. C. Results We conducted independent two-tailed t-tests to compare our group-aware to the following baseline policies. • SARL: Crowd aware RL policy that uses an attention mechanism to model human robot interaction to encourage socially compliant navigation [16]. • SocialNCE (w/o CL): Social NCE policy without contrastive loss. This policy uses behavior cloning using a trained SARL policy as the expert [52]. • SocialNCE: Social NCE policy. This policy uses a contrastive loss function to represent ‘negative’ examples such as collisions to improve social compliance [52]. Table II summarizes the robot and pedestrian navigation performance as well as the social compliance results. Overall, the group aware policy generally led to higher number of successful trials, while allowing the pedestrians to travel at faster speeds with less disturbance towards the goal. The group aware policy led to a significant fewer number of instances where the robot navigated through a group. The group aware policy also resulted in an overall less individual discomfort. Fianlly, we observe that our policy improved the overall social forces applied to the pedestrians and robot. We note that the robot with the group aware policy took longer to reach the goal compared to the SARL policy. However, by and large, we do not see significant differences in robot speed between the two approaches. The group-aware robot took a longer path to its goal due to its preference of navigating around groups of pedestrian to avoid group intersections, whereas the SARL robot aimed to reach its goal in spite of cutting in between groups of pedestrians. Fig. 3 illustrates an example of such behavior—the SARL policy chose a path that cut through the pedestrian group whereas, in the same scenario, our group-aware policy chose a path around the group. The resulted group-aware behavior ultimately enabled greater group cohesion and less disruption, while improving group and individual discomfort. We additionally computed the distances between the pedestrians and the robot for both policies (Fig. 3 top-right), as well as the velocities of the pedestrians (Fig. 3 bottom-right), over time. During interaction between the robot and the pedestrians, we observe that the distance between the pedestrians and the robot was lower for the SARL policy corresponding to the results reported in Table II. Further, we see that the average pedestrian velocity decreased substantially during the times of interaction in the SARL policy; however, we do not see similar decreases in our group-aware policy. D. Hardware Experiments To demonstrate the applicability of our group aware policy to real-world systems, we implemented our policy on the Boston Dynamics Spot robot. The policy was trained entirely in the CrowdSim environment before the demonstration, then evaluated on the Spot robot as a proof of concept. In this demonstration, the robot’s start and goal positions were predetermined such that the robot reached a point approximately six meters in front of its startup position. The radius and preferred velocity of the robot and all of the pedestrians were approximated ahead of time and set when the robot and pedestrian objects were defined. During the demonstration, the position and orientation of the robot and each of the pedestrians was determined using an OptiTrack motion capture system, then the velocity of each agent was approximated appropriately. While the robot was outside of a small radius of the goal position, the observable state of the pedestrians and the full state of the robot was determined and used as input to our group aware policy. The policy then provided the predicted best action as an x and y velocity, which was mapped to a ROS command used to control the robot. We decreased the speed in both the x and y direction by a factor of two to ensure that the robot moved at an appropriate speed to compensate for hardware limitations. As illustrated in Fig. 1, we demonstrated that the policy worked on a real-world robotic system by having pedestrians move around the robot in various different formations in a way that interfered with the path of the robot. To demonstrate the effectiveness of the policy, the pedestrians followed paths that varied from the simple circle crossing movements seen by the simulated robot during training. Videos of several of these demonstrations can be found in the supplementary video attachment. VI. D ISCUSSION Towards achieving socially appropriate robot navigation in human environments, this paper explores group-aware behaviors that respect pedestrian group formations and trajectories, while minimally sacrificing robot navigation performance. Our approach utilizes deep reinforcement learning and considers group formation during training. Our results show that the learned policy is able to achieve higher number of successful trials in which the robot reached the goal, fewer collisions, and less impact to the pedestrian’s motion towards their goal. In addition, we show that our learned policy not only reduced the number of group violations (e.g., cutting through the group) but also decreased the individual discomfort and social forces applied to the pedestrians and robot. Our approach, however, resulted in an increase of the robot’s total time to goal compared to the SARL baseline that did not consider group formation. This increase of robot navigation time was expected as the robot sought to move around groups as opposed to navigating through them (Fig. 3). However, our results show that even though the total time to goal increased, the average velocity of the robot was mostly unaffected by the group aware policy. Our exploration indicates several directions of future research. First, we would like to determine how well our learned policy reflects actual human motion through groups of pedestrians. Second, we would like to investigate whether we can bootstrap our learned policy with imitation learning using observations of humans navigating groups of pedestrians. Third, we would like to investigate different representations of group space beyond the convex hull approach described in this paper. We speculate that considering additional parameters, such as social interaction during movement, the specific formation of the group, and environmental cues (e.g., social space), may contribute to learning more socially compliant navigation policies. Additionally, while this paper focuses mostly on robot and pedestrian navigation performance, as well as group intersections and discomfort, there are other factors to consider for socially appropriate behavior, such as how to pass and follow human groups. Finally, while learning a single policy works for a limited number of environments, choosing amongst a library of policies depending on the density of pedestrians, the layout of the environment, and the local culture can lead to better navigation performance once deployed. For example, respecting human grouping may not always be possible (e.g., in a narrow corridor). It is therefore important for a mobile robot to selectively choose a context suitable policy in order to achieve efficient, safe, and socially appropriate navigation in crowded human environments. ACKNOWLEDGMENTS This work was supported by the Johns Hopkins University Institute for Assured Autonomy. R EFERENCES [1] K. Charalampous, I. Kostavelis, and A. Gasteratos, “Recent trends in social aware robot navigation: A survey,” Robotics and Autonomous Systems, vol. 93, pp. 85–104, 2017. [2] A. Pandey, S. Pandey, and D. Parhi, “Mobile robot navigation and obstacle avoidance techniques: A review,” Int Rob Auto J, vol. 2, no. 3, p. 00022, 2017. [3] T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch, “Human-aware robot navigation: A survey,” Robotics and Autonomous Systems, vol. 61, no. 12, pp. 1726–1743, 2013. [4] J. Rios-Martinez, A. Spalanzani, and C. Laugier, “From proxemics theory to socially-aware navigation: A survey,” International Journal of Social Robotics, vol. 7, no. 2, pp. 137–153, 2015. [5] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997. [6] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation,” The International Journal of Robotics Research, vol. 34, no. 3, pp. 335–356, 2015. [7] A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning social etiquette: Human trajectory understanding in crowded scenes,” in European conference on computer vision. Springer, 2016, pp. 549–565. [8] A. Bera, T. Randhavane, R. Prinja, and D. Manocha, “Sociosense: Robot navigation amongst pedestrians with social and psychological constraints,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 7018–7025. [9] R. Mead and M. J. Matarić, “Autonomous human–robot proxemics: socially aware navigation based on interaction potential,” Autonomous Robots, vol. 41, no. 5, pp. 1189–1201, 2017. [10] M. Costa, “Interpersonal distances in group walking,” Journal of Nonverbal Behavior, vol. 34, no. 1, pp. 15–26, 2010. [11] A. F. Aveni, “The not-so-lonely crowd: Friendship groups in collective behavior,” Sociometry, pp. 96–99, 1977. [12] M. Moussaı̈d, N. Perozo, S. Garnier, D. Helbing, and G. Theraulaz, “The walking behaviour of pedestrian social groups and its impact on crowd dynamics,” PloS one, vol. 5, no. 4, p. e10047, 2010. [13] B. Okal and K. O. Arras, “Learning socially normative robot navigation behaviors with bayesian inverse reinforcement learning,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 2889–2895. [14] Z. Yücel, F. Zanlungo, T. Ikeda, T. Miyashita, and N. Hagita, “Deciphering the crowd: Modeling and identification of pedestrian group motion,” Sensors, vol. 13, no. 1, pp. 875–897, 2013. [15] N. Bisagno, B. Zhang, and N. Conci, “Group lstm: Group trajectory prediction in crowded scenarios,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 0–0. [16] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” 2019 International Conference on Robotics and Automation (ICRA), pp. 6015–6022, 2019. [17] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in Robotics research. Springer, 2011, pp. 3–19. [18] Y. Luo, P. Cai, A. Bera, D. Hsu, W. S. Lee, and D. Manocha, “Porca: Modeling and planning for autonomous driving among many pedestrians,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3418–3425, 2018. [19] K. D. Katyal, G. D. Hager, and C.-M. Huang, “Intent-aware pedestrian prediction for adaptive crowd navigation,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3277–3283. [20] M. Luber, L. Spinello, J. Silva, and K. O. Arras, “Socially-aware robot navigation: A learning approach,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 902– 907. [21] M. Shiomi, F. Zanlungo, K. Hayashi, and T. Kanda, “Towards a socially acceptable collision avoidance for a mobile robot navigating among pedestrians using a pedestrian model,” International Journal of Social Robotics, vol. 6, no. 3, pp. 443–455, 2014. [22] M. Sebastian, S. B. Banisetty, and D. Feil-Seifer, “Socially-aware navigation planner using models of human-human interaction,” in 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 2017, pp. 405–410. [23] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,” The International Journal of Robotics Research, vol. 35, no. 11, pp. 1289–1307, 2016. [Online]. Available: https://doi.org/10.1177/ 0278364915619772 [24] S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp. 261–268. [25] C. Johnson and B. Kuipers, “Socially-aware navigation using topological maps and social norm learning,” in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 151–157. [26] R. Kirby, R. Simmons, and J. Forlizzi, “Companion: A constraintoptimizing method for person-acceptable navigation,” in RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 2009, pp. 607–612. [27] X.-T. Truong and T.-D. Ngo, “Dynamic social zone based mobile robot navigation for human comfortable safety in social environments,” International Journal of Social Robotics, vol. 8, no. 5, pp. 663–684, 2016. [28] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995. [29] E. T. Hall, The hidden dimension. Garden City, NY: Doubleday, 1966, vol. 609. [30] F. Farina, D. Fontanelli, A. Garulli, A. Giannitrapani, and D. Prattichizzo, “Walking ahead: The headed social force model,” PloS one, vol. 12, no. 1, p. e0169734, 2017. [31] G. Ferrer and A. Sanfeliu, “Proactive kinodynamic planning using the extended social force model and human motion prediction in urban environments,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014, pp. 1730–1735. [32] P. Ratsamee, Y. Mae, K. Ohara, T. Takubo, and T. Arai, “Human– robot collision avoidance using a modified social force model with body pose and face orientation,” International Journal of Humanoid Robotics, vol. 10, no. 01, p. 1350008, 2013. [33] M. Swofford, J. C. Peruzzi, N. Tsoi, S. Thompson, R. Martı́nMartı́n, S. Savarese, and M. Vázquez, “Improving social awareness through dante: A deep affinity network for clustering conversational interactants,” arXiv preprint arXiv:1907.12910, 2019. [34] Y. Kato, T. Kanda, and H. Ishiguro, “May i help you?-design of human-like polite approaching behavior,” in 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2015, pp. 35–42. [35] J. V. Gómez, N. Mavridis, and S. Garrido, “Fast marching solution for the social path planning problem,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1871–1876. [36] A. Taylor, D. M. Chan, and L. D. Riek, “Robot-centric perception of human groups,” ACM Transactions on Human-Robot Interaction (THRI), vol. 9, no. 3, pp. 1–21, 2020. [37] M. Luber and K. O. Arras, “Multi-hypothesis social grouping and tracking for mobile robots.” in Robotics: Science and Systems, 2013. [38] J. Shao, C. Change Loy, and X. Wang, “Scene-independent group profiling in crowd,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2219–2226. [39] V. V. Unhelkar, C. Pérez-D’Arpino, L. Stirling, and J. A. Shah, “Human-robot co-navigation using anticipatory indicators of human walking motion,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 6183–6190. [40] F. Yang and C. Peters, “Social-aware navigation in crowds with static and dynamic groups,” in 2019 11th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), 2019, pp. 1–4. [41] Y. Du, N. J. Hetherington, C. L. Oon, W. P. Chan, C. P. Quintero, E. Croft, and H. M. Van der Loos, “Group surfing: A pedestrianbased approach to sidewalk robot navigation,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 6518–6524. [42] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 961–971. [43] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2255–2264. [44] L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 31–36. [45] C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational graph learning for crowd navigation,” arXiv preprint arXiv:1909.13165, 2019. [46] Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized noncommunicating multiagent collision avoidance with deep reinforcement learning,” in 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 2017, pp. 285–292. [47] C. Gloor, “Pedsim: A microscopic pedestrian crowd simulation system,” 2003. [48] D. Vasquez, B. Okal, and K. O. Arras, “Inverse reinforcement learning algorithms and features for robot navigation in crowds: an experi- [49] [50] [51] [52] mental comparison,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014, pp. 1341–1346. J. S. Coleman and J. James, “The equilibrium size distribution of freely-forming groups,” Sociometry, vol. 24, no. 1, pp. 36–45, 1961. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017, arXiv preprint arXiv:1707.06347. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980 Y. Liu, Q. Yan, and A. Alahi, “Social nce: Contrastive learning of socially-aware motion representations,” arXiv preprint arXiv:2012.11717, 2020.