Skip to main content

Ashwin Balakrishna

Followers

4

Following

1

Public Views

University of Zagreb

Uppsala University

University of East London

University of Leicester

Gwen Robbins Schug

University of North Carolina at Greensboro

Gabriel Gutierrez-Alonso

University of Salamanca

Macquarie University

Universidade Federal do Rio Grande do Sul

Swansea University

Jesper Hoffmeyer

University of Copenhagen

Uploads

Papers by Ashwin Balakrishna

Disentangling Dense Multi-Cable Knots

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

Disentangling two or more cables requires many steps to remove crossings between and within cable... more Disentangling two or more cables requires many steps to remove crossings between and within cables. We formalize the problem of multiple cable disentangling and present an iterative, graph-based algorithm, Iterative Reduction Of Non-planar Multiple cAble kNots (IRON-MAN), that outputs moves to remove crossings from the scene. We instantiate it with a learned perception system, inspired by prior work in single-cable untying, to disentangle two cable twists, three cable braids, and knots of two or three cables, such as the overhand, square, carrick bend, sheet bend, crown, and fisherman's knots from image input. IRON-MAN keeps track of task-relevant keypoints corresponding to target cable endpoints and crossings and iteratively disentangles the cables by identifying crossings that are critical to knot structure and undoing them. Using a da Vinci surgical robot, we experimentally evaluate the effectiveness of IRON-MAN on the task of untangling a class of multiple cable knots present in the training data, as well as generalizing to novel classes of multiple cable knots involving two to three cables. Results suggest that IRON-MAN is effective in disentangling knots involving up to three cables with 80.5% success, with generalization to knots that are never seen during training on cables that are either distinct or uniform in color.

Predicting Electric Vehicle Charging Station Usage: Using Machine Learning to Estimate Individual Station Statistics from Physical Configurations of Charging Station Networks

arXiv (Cornell University), Apr 2, 2018

Electric vehicles (EVs) have been gaining popularity due to their environmental friendliness and ... more Electric vehicles (EVs) have been gaining popularity due to their environmental friendliness and efficiency. EV charging station networks are scalable solutions for supporting increasing numbers of EVs within modern electric grid constraints, yet few tools exist to aid the physical configuration design of new networks. We use neural networks to predict individual charging station usage statistics from the station?s physical location within a network. We have shown this quickly gives accurate estimates of average usage statistics given a proposed configuration, without the need for running many computationally expensive simulations. The trained neural network can help EV charging network designers rapidly test various placements of charging stations under additional individual constraints in order to find an optimal configuration given their design objectives.

Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies

2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

Simulation-to-reality transfer has emerged as a popular and highly successful method to train rob... more Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behaviors on physical hardware. On the other hand, excessive training in simulation can cause policies to overfit to the visual appearance and dynamics of the simulator. In this work, we study strategies to automatically determine when policies trained in simulation can be reliably transferred to a physical robot. We specifically study these ideas in the context of robotic fabric manipulation, in which successful sim2real transfer is especially challenging due to the difficulties of precisely modeling the dynamics and visual appearance of fabric. Results in a fabric smoothing task suggest that our switching criteria correlate well with performance in real. In particular, our confidence-based switching criteria achieve average final fabric coverage of 87.2-93.7% within 55-60% of the total training budget. See https://tinyurl.com/lsccase for code and supplemental materials.

Accelerating Grasp Exploration by Leveraging Learned Priors

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)

Orienting Novel 3D Objects Using Self-Supervised Learning of Rotation Transforms

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)

Orienting objects is a critical component in the automation of many packing and assembly tasks. W... more Orienting objects is a critical component in the automation of many packing and assembly tasks. We present an algorithm to orient novel objects given a depth image of the object in its current and desired orientation. We formulate a self-supervised objective for this problem and train a deep neural network to estimate the 3D rotation as parameterized by a quaternion, between these current and desired depth images. We then use the trained network in a proportional controller to reorient objects based on the estimated rotation between the two depth images. Results suggest that in simulation we can rotate unseen objects with unknown geometries by up to 30°with a median angle error of 1.47°over 100 random initial/desired orientations each for 22 novel objects. Experiments on physical objects suggest that the controller can achieve a median angle error of 4.2°over 10 random initial/desired orientations each for 5 objects.

LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks

arXiv (Cornell University), Jul 10, 2021

Reinforcement learning (RL) has shown impressive success in exploring high-dimensional environmen... more Reinforcement learning (RL) has shown impressive success in exploring high-dimensional environments to learn complex tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for learning in dynamically uncertain environments is requiring that the agent can robustly return to learned safe sets, where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with visual observations is exceedingly challenging. We present a novel continuous representation for safe sets by framing it as a binary classification problem in a learned latent space, which flexibly scales to image observations. We then present a new algorithm, Latent Space Safe Sets (LS 3), which uses this representation for long-horizon tasks with sparse rewards. We evaluate LS 3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS 3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-ss for code and supplementary material.

Untangling Dense Knots by Learning Task-Relevant Keypoints

arXiv (Cornell University), 2020

Untangling ropes, wires, and cables is a challenging task for robots due to the high-dimensional ... more Untangling ropes, wires, and cables is a challenging task for robots due to the high-dimensional configuration space, visual homogeneity, self-occlusions, and complex dynamics. We consider dense (tight) knots that lack space between self-intersections and present an iterative approach that uses learned geometric structure in configurations. We instantiate this into an algorithm, HULK: Hierarchical Untangling from Learned Keypoints, which combines learning-based perception with a geometric planner into a policy that guides a bilateral robot to untangle knots. To evaluate the policy, we perform experiments both in a novel simulation environment modelling cables with varied knot types and textures and in a physical system using the da Vinci surgical robot. We find that HULK is able to untangle cables with dense figure-eight and overhand knots and generalize to varied textures and appearances. We compare two variants of HULK to three baselines and observe that HULK achieves 43.3% higher success rates on a physical system compared to the next best baseline. HULK successfully untangles a cable from a dense initial configuration containing up to two overhand and figure-eight knots in 97.9% of 378 simulation experiments with an average of 12.1 actions per trial. In physical experiments, HULK achieves 61.7% untangling success, averaging 8.48 actions per trial. Supplementary material, code, and videos can be found at https://tinyurl.com/y3a88ycu.

Untangling Dense Non-Planar Knots by Learning Manipulation Features and Recovery Policies

Robotics: Science and Systems XVII, 2021

Robot manipulation for untangling 1D deformable structures such as ropes, cables, and wires is ch... more Robot manipulation for untangling 1D deformable structures such as ropes, cables, and wires is challenging due to their infinite dimensional configuration space, complex dynamics, and tendency to self-occlude. Analytical controllers often fail in the presence of dense configurations, due to the difficulty of grasping between adjacent cable segments. We present two algorithms that enhance robust cable untangling, LOKI and SPi-DERMan, which operate alongside HULK, a high-level planner from prior work. LOKI uses a learned model of manipulation features to refine a coarse grasp keypoint prediction to a precise, optimized location and orientation, while SPiDERMan uses a learned model to sense task progress and apply recovery actions. We evaluate these algorithms in physical cable untangling experiments with 336 knots and over 1500 actions on real cables using the da Vinci surgical robot. We find that the combination of HULK, LOKI, and SPiDERMan is able to untangle dense overhand, figure-eight, double-overhand, square, bowline, granny, stevedore, and triple-overhand knots. The composition of these methods successfully untangles a cable from a dense initial configuration in 68.3% of 60 physical experiments and achieves 50% higher success rates than baselines from prior work. Supplementary material, code, and videos can be found at https://tinyurl.com/rssuntangling.

Automating Planar Object Singulation by Linear Pushing with Single-point and Multi-point Contacts

2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), 2019

Singulation is useful for manufacturing, logistics, and service applications; we consider the pro... more Singulation is useful for manufacturing, logistics, and service applications; we consider the problem in a planar setting. We propose a novel O(n(n + v)) linear push policy (n denotes the number of objects, v denotes the maximum number of vertices per object), ClusterPush, that can be efficiently computed using clustering. To evaluate the policy, we define singulation distance as the average pairwise distance of polygon centroids given random arrangements of 2D polygonal objects on a surface, and seek pushing policies that can maximize singulation distance. When compared with a brute force evaluation of all candidate pushes in Box2D simulator using 50,000 pushing scenarios, ClusterPush achieves 70% of the singulation distance achieved using brute force and is 2000x faster. ClusterPush also improves on previous pushing policies and can be used for multi-point pushes with two-point and edge (infinite-point) contacts. Compared with pushes with singlepoint contacts using ClusterPush, pushes with two-point and edge contacts improve singulation by 7% and 13% respectively. In physical experiments conducted with an ABB YuMi robot on 40 sets of 3-7 blocks, ClusterPush increases singulation distance by 15-30%, outperforming the next best policy by 24% on average. Data and code are available at https: //github.com/Jekyll1021/MultiPointPushing.

On-Policy Robot Imitation Learning from a Converging Supervisor

Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed superv... more Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed supervisor. However, there are many settings where the supervisor may converge during policy learning, such as a human performing a novel task or an improving algorithmic controller. We formalize imitation learning from a "converging supervisor" and provide sublinear static and dynamic regret guarantees against the best policy in hindsight with labels from the converged supervisor, even when labels during learning are only from intermediate supervisors. We then show that this framework is closely connected to a recent class of reinforcement learning (RL) algorithms known as dual policy iteration (DPI), which alternate between training a reactive learner with imitation learning and a model-based supervisor with data from the learner. Experiments suggest that when this framework is applied with the state-of-the-art deep model-based RL algorithm PETS as an improving supervisor, it outpe...

ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning

ArXiv, 2021

Effective robot learning often requires online human feedback and interventions that can cost sig... more Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden on the human supervisor? This paper presents ThriftyDAgger, an algorithm for actively querying a human supervisor given a desired budget of human interventions. ThriftyDAgger uses a learned switching policy to solicit interventions only at states that are sufficiently (1) novel, where the robot policy has no reference behavior to imitate, or (2) risky, where the robot has low confidence in task completion. To detect the latter, we introduce a novel metric for estimating risk under the current robot policy. Experiments in simulation and on a physical cable routing experiment suggest that ThriftyDAgger’s intervention criteria balances task performance and supervisor burden more effectively ...

Policy Gradient Bayesian Robust Optimization for Imitation Learning

ArXiv, 2021

The difficulty in specifying rewards for many realworld problems has led to an increased focus on... more The difficulty in specifying rewards for many realworld problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguou...

On-Policy Imitation Learning from an Improving Supervisor

Most on-policy imitation algorithms, such as DAgger, are designed for learning with a fixed super... more Most on-policy imitation algorithms, such as DAgger, are designed for learning with a fixed supervisor. However, there are many settings in which the supervisor improves during policy learning, such as when the supervisor is a human performing a novel task or an improving algorithmic controller. We consider learning from an “improving supervisor” and derive a bound on the static-regret of online gradient descent when a converging supervisor policy is used. We present an on-policy imitation learning algorithm, Follow the Improving Teacher (FIT), which uses a deep model-based reinforcement learning (deep MBRL) algorithm to provide the sample complexity benefits of model-based methods but enable faster training and evaluation via distillation into a reactive controller. We evaluate FIT with experiments on the Reacher and Pusher MuJoCo domains using the deep MBRL algorithm, PETS, as the improving supervisor. To the best of our knowledge, this work is the first to formally consider the s...

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions

Sample-based learning model predictive control (LMPC) strategies have recently attracted attentio... more Sample-based learning model predictive control (LMPC) strategies have recently attracted attention due to their desirable theoretical properties and their good empirical performance on robotic tasks. However, prior analysis of LMPC controllers for stochastic systems has mainly focused on linear systems in the iterative learning control setting. We present a novel LMPC algorithm, Adjustable Boundary Condition LMPC (ABC-LMPC), which enables rapid adaptation to novel start and goal configurations and theoretically show that the resulting controller guarantees iterative improvement in expectation for stochastic nonlinear systems. We present results with a practical instantiation of this algorithm and experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 stochastic continuous control tasks.

Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones

IEEE Robotics and Automation Letters, 2021

Safety remains a central obstacle preventing widespread use of RL in the real world: learning new... more Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in uncertain environments requires extensive exploration, but safety requires limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy learning and (2) separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task, and an image-based reaching task on a physical robot. We compare Recovery RL to 5 prior safe RL methods which jointly optimize for task performance and safety via constrained optimization or reward shaping and find that Recovery RL outperforms the next best prior method across all domains. Results suggest that Recovery RL trades off constraint violations and task successes 2-80 times more efficiently in simulation domains and 12 times more efficiently in physical experiments. See https://tinyurl.com/rl-recovery for videos and supplementary material.

Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter

2019 International Conference on Robotics and Automation (ICRA), 2019

When operating in unstructured environments such as warehouses, homes, and retail centers, robots... more When operating in unstructured environments such as warehouses, homes, and retail centers, robots are frequently required to interactively search for and retrieve specific objects from cluttered bins, shelves, or tables. Mechanical Search describes the class of tasks where the goal is to locate and extract a known target object. In this paper, we formalize Mechanical Search and study a version where distractor objects are heaped over the target object in a bin. The robot uses an RGBD perception system and control policies to iteratively select, parameterize, and perform one of 3 actions-push, suction, grasp-until the target object is extracted, or either a time limit is exceeded, or no high confidence push or grasp is available. We present a study of 5 algorithmic policies for mechanical search, with 15,000 simulated trials and 300 physical trials for heaps ranging from 10 to 20 objects. Results suggest that success can be achieved in this long-horizon task with algorithmic policies in over 95% of instances and that the number of actions required scales approximately linearly with the size of the heap. Code and supplementary material can be found at http://ai.stanford.edu/mech-search.

Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities

2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), 2021

In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequen... more In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequent assembly. Kitting is a critical step as it can decrease downstream processing and handling times and enable lower storage and shipping costs. We present Kit-Net, a framework for kitting previously unseen 3D objects into cavities given depth images of both the target cavity and an object held by a gripper in an unknown initial orientation. Kit-Net uses self-supervised deep learning and dataaugmentation to train a convolutional neural network (CNN) to robustly estimate 3D rotations between objects and matching concave or convex cavities using a large training dataset of simulated depth images pairs. Kit-Net then uses the trained CNN to implement a controller to orient and position novel objects for insertion into novel prismatic and conformal 3D cavities. Experiments in simulation suggest that Kit-Net can orient objects to have a 98.9 % average intersection volume between the object mesh and that of the target cavity. Physical experiments with industrial objects succeed in 18 % of trials using a baseline method and in 63 % of trials with Kit-Net. Video, code, and data are available at https://github. com/BerkeleyAutomation/Kit-Net.

VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation

Robotics: Science and Systems XVI, 2020

Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery ... more Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery and more. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We address this problem by extending the recently proposed Visual Foresight framework to learn fabric dynamics, which can be efficiently reused to accomplish a variety of different fabric manipulation tasks with a single goalconditioned policy. We introduce VisuoSpatial Foresight (VSF), which extends prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks both in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance for cloth manipulation tasks, and results suggest that leveraging RGBD data for video prediction and planning yields an 80% improvement in fabric folding success rate over pure RGB data. Supplementary material is available at https://sites.google.com/ view/fabric-vsf/.

Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manuf... more Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color (RGB), depth (D), or combined color-depth (RGBD) images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic supervisor that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of RGB vs D vs RGBD images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 180 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, RGBD policies trained in simulation attain coverage of 83% to 95% depending on difficulty tier, suggesting that effective fabric smoothing policies can be learned from an algorithmic supervisor and that depth sensing is a valuable addition to color alone. Supplementary material is available at https: //sites.google.com/view/fabric-smoothing. • Structural: between a point mass and the point masses to its left and above it.

Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks

IEEE Robotics and Automation Letters, 2020

Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering... more Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering a dense cost function, which can lead to unintended behavior, and dynamical uncertainty, which makes exploration and constraint satisfaction challenging. We address these issues with a new modelbased reinforcement learning algorithm, Safety Augmented Value Estimation from Demonstrations (SAVED), which uses supervision that only identifies task completion and a modest set of suboptimal demonstrations to constrain exploration and learn efficiently while handling complex constraints. We then compare SAVED with 3 state-of-the-art model-based and model-free RL algorithms on 6 standard simulation benchmarks involving navigation and manipulation and a physical knot-tying task on the da Vinci surgical robot. Results suggest that SAVED outperforms prior methods in terms of success rate, constraint satisfaction, and sample efficiency, making it feasible to safely learn a control policy directly on a real robot in less than an hour. For tasks on the robot, baselines succeed less than 5% of the time while SAVED has a success rate of over 75% in the first 50 training iterations. Code and supplementary material is available at https://tinyurl.com/saved-rl.

Disentangling Dense Multi-Cable Knots

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

Disentangling two or more cables requires many steps to remove crossings between and within cable... more Disentangling two or more cables requires many steps to remove crossings between and within cables. We formalize the problem of multiple cable disentangling and present an iterative, graph-based algorithm, Iterative Reduction Of Non-planar Multiple cAble kNots (IRON-MAN), that outputs moves to remove crossings from the scene. We instantiate it with a learned perception system, inspired by prior work in single-cable untying, to disentangle two cable twists, three cable braids, and knots of two or three cables, such as the overhand, square, carrick bend, sheet bend, crown, and fisherman's knots from image input. IRON-MAN keeps track of task-relevant keypoints corresponding to target cable endpoints and crossings and iteratively disentangles the cables by identifying crossings that are critical to knot structure and undoing them. Using a da Vinci surgical robot, we experimentally evaluate the effectiveness of IRON-MAN on the task of untangling a class of multiple cable knots present in the training data, as well as generalizing to novel classes of multiple cable knots involving two to three cables. Results suggest that IRON-MAN is effective in disentangling knots involving up to three cables with 80.5% success, with generalization to knots that are never seen during training on cables that are either distinct or uniform in color.

Predicting Electric Vehicle Charging Station Usage: Using Machine Learning to Estimate Individual Station Statistics from Physical Configurations of Charging Station Networks

arXiv (Cornell University), Apr 2, 2018

Electric vehicles (EVs) have been gaining popularity due to their environmental friendliness and ... more Electric vehicles (EVs) have been gaining popularity due to their environmental friendliness and efficiency. EV charging station networks are scalable solutions for supporting increasing numbers of EVs within modern electric grid constraints, yet few tools exist to aid the physical configuration design of new networks. We use neural networks to predict individual charging station usage statistics from the station?s physical location within a network. We have shown this quickly gives accurate estimates of average usage statistics given a proposed configuration, without the need for running many computationally expensive simulations. The trained neural network can help EV charging network designers rapidly test various placements of charging stations under additional individual constraints in order to find an optimal configuration given their design objectives.

Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies

2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

Simulation-to-reality transfer has emerged as a popular and highly successful method to train rob... more Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behaviors on physical hardware. On the other hand, excessive training in simulation can cause policies to overfit to the visual appearance and dynamics of the simulator. In this work, we study strategies to automatically determine when policies trained in simulation can be reliably transferred to a physical robot. We specifically study these ideas in the context of robotic fabric manipulation, in which successful sim2real transfer is especially challenging due to the difficulties of precisely modeling the dynamics and visual appearance of fabric. Results in a fabric smoothing task suggest that our switching criteria correlate well with performance in real. In particular, our confidence-based switching criteria achieve average final fabric coverage of 87.2-93.7% within 55-60% of the total training budget. See https://tinyurl.com/lsccase for code and supplemental materials.

Accelerating Grasp Exploration by Leveraging Learned Priors

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)

Orienting Novel 3D Objects Using Self-Supervised Learning of Rotation Transforms

2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)

Orienting objects is a critical component in the automation of many packing and assembly tasks. W... more Orienting objects is a critical component in the automation of many packing and assembly tasks. We present an algorithm to orient novel objects given a depth image of the object in its current and desired orientation. We formulate a self-supervised objective for this problem and train a deep neural network to estimate the 3D rotation as parameterized by a quaternion, between these current and desired depth images. We then use the trained network in a proportional controller to reorient objects based on the estimated rotation between the two depth images. Results suggest that in simulation we can rotate unseen objects with unknown geometries by up to 30°with a median angle error of 1.47°over 100 random initial/desired orientations each for 22 novel objects. Experiments on physical objects suggest that the controller can achieve a median angle error of 4.2°over 10 random initial/desired orientations each for 5 objects.

LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks

arXiv (Cornell University), Jul 10, 2021

Reinforcement learning (RL) has shown impressive success in exploring high-dimensional environmen... more Reinforcement learning (RL) has shown impressive success in exploring high-dimensional environments to learn complex tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for learning in dynamically uncertain environments is requiring that the agent can robustly return to learned safe sets, where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with visual observations is exceedingly challenging. We present a novel continuous representation for safe sets by framing it as a binary classification problem in a learned latent space, which flexibly scales to image observations. We then present a new algorithm, Latent Space Safe Sets (LS 3), which uses this representation for long-horizon tasks with sparse rewards. We evaluate LS 3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS 3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-ss for code and supplementary material.

Untangling Dense Knots by Learning Task-Relevant Keypoints

arXiv (Cornell University), 2020

Untangling ropes, wires, and cables is a challenging task for robots due to the high-dimensional ... more Untangling ropes, wires, and cables is a challenging task for robots due to the high-dimensional configuration space, visual homogeneity, self-occlusions, and complex dynamics. We consider dense (tight) knots that lack space between self-intersections and present an iterative approach that uses learned geometric structure in configurations. We instantiate this into an algorithm, HULK: Hierarchical Untangling from Learned Keypoints, which combines learning-based perception with a geometric planner into a policy that guides a bilateral robot to untangle knots. To evaluate the policy, we perform experiments both in a novel simulation environment modelling cables with varied knot types and textures and in a physical system using the da Vinci surgical robot. We find that HULK is able to untangle cables with dense figure-eight and overhand knots and generalize to varied textures and appearances. We compare two variants of HULK to three baselines and observe that HULK achieves 43.3% higher success rates on a physical system compared to the next best baseline. HULK successfully untangles a cable from a dense initial configuration containing up to two overhand and figure-eight knots in 97.9% of 378 simulation experiments with an average of 12.1 actions per trial. In physical experiments, HULK achieves 61.7% untangling success, averaging 8.48 actions per trial. Supplementary material, code, and videos can be found at https://tinyurl.com/y3a88ycu.

Untangling Dense Non-Planar Knots by Learning Manipulation Features and Recovery Policies

Robotics: Science and Systems XVII, 2021

Robot manipulation for untangling 1D deformable structures such as ropes, cables, and wires is ch... more Robot manipulation for untangling 1D deformable structures such as ropes, cables, and wires is challenging due to their infinite dimensional configuration space, complex dynamics, and tendency to self-occlude. Analytical controllers often fail in the presence of dense configurations, due to the difficulty of grasping between adjacent cable segments. We present two algorithms that enhance robust cable untangling, LOKI and SPi-DERMan, which operate alongside HULK, a high-level planner from prior work. LOKI uses a learned model of manipulation features to refine a coarse grasp keypoint prediction to a precise, optimized location and orientation, while SPiDERMan uses a learned model to sense task progress and apply recovery actions. We evaluate these algorithms in physical cable untangling experiments with 336 knots and over 1500 actions on real cables using the da Vinci surgical robot. We find that the combination of HULK, LOKI, and SPiDERMan is able to untangle dense overhand, figure-eight, double-overhand, square, bowline, granny, stevedore, and triple-overhand knots. The composition of these methods successfully untangles a cable from a dense initial configuration in 68.3% of 60 physical experiments and achieves 50% higher success rates than baselines from prior work. Supplementary material, code, and videos can be found at https://tinyurl.com/rssuntangling.

Automating Planar Object Singulation by Linear Pushing with Single-point and Multi-point Contacts

2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), 2019

Singulation is useful for manufacturing, logistics, and service applications; we consider the pro... more Singulation is useful for manufacturing, logistics, and service applications; we consider the problem in a planar setting. We propose a novel O(n(n + v)) linear push policy (n denotes the number of objects, v denotes the maximum number of vertices per object), ClusterPush, that can be efficiently computed using clustering. To evaluate the policy, we define singulation distance as the average pairwise distance of polygon centroids given random arrangements of 2D polygonal objects on a surface, and seek pushing policies that can maximize singulation distance. When compared with a brute force evaluation of all candidate pushes in Box2D simulator using 50,000 pushing scenarios, ClusterPush achieves 70% of the singulation distance achieved using brute force and is 2000x faster. ClusterPush also improves on previous pushing policies and can be used for multi-point pushes with two-point and edge (infinite-point) contacts. Compared with pushes with singlepoint contacts using ClusterPush, pushes with two-point and edge contacts improve singulation by 7% and 13% respectively. In physical experiments conducted with an ABB YuMi robot on 40 sets of 3-7 blocks, ClusterPush increases singulation distance by 15-30%, outperforming the next best policy by 24% on average. Data and code are available at https: //github.com/Jekyll1021/MultiPointPushing.

On-Policy Robot Imitation Learning from a Converging Supervisor

Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed superv... more Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed supervisor. However, there are many settings where the supervisor may converge during policy learning, such as a human performing a novel task or an improving algorithmic controller. We formalize imitation learning from a "converging supervisor" and provide sublinear static and dynamic regret guarantees against the best policy in hindsight with labels from the converged supervisor, even when labels during learning are only from intermediate supervisors. We then show that this framework is closely connected to a recent class of reinforcement learning (RL) algorithms known as dual policy iteration (DPI), which alternate between training a reactive learner with imitation learning and a model-based supervisor with data from the learner. Experiments suggest that when this framework is applied with the state-of-the-art deep model-based RL algorithm PETS as an improving supervisor, it outpe...

ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning

ArXiv, 2021

Effective robot learning often requires online human feedback and interventions that can cost sig... more Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden on the human supervisor? This paper presents ThriftyDAgger, an algorithm for actively querying a human supervisor given a desired budget of human interventions. ThriftyDAgger uses a learned switching policy to solicit interventions only at states that are sufficiently (1) novel, where the robot policy has no reference behavior to imitate, or (2) risky, where the robot has low confidence in task completion. To detect the latter, we introduce a novel metric for estimating risk under the current robot policy. Experiments in simulation and on a physical cable routing experiment suggest that ThriftyDAgger’s intervention criteria balances task performance and supervisor burden more effectively ...

Policy Gradient Bayesian Robust Optimization for Imitation Learning

ArXiv, 2021

The difficulty in specifying rewards for many realworld problems has led to an increased focus on... more The difficulty in specifying rewards for many realworld problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguou...

On-Policy Imitation Learning from an Improving Supervisor

Most on-policy imitation algorithms, such as DAgger, are designed for learning with a fixed super... more Most on-policy imitation algorithms, such as DAgger, are designed for learning with a fixed supervisor. However, there are many settings in which the supervisor improves during policy learning, such as when the supervisor is a human performing a novel task or an improving algorithmic controller. We consider learning from an “improving supervisor” and derive a bound on the static-regret of online gradient descent when a converging supervisor policy is used. We present an on-policy imitation learning algorithm, Follow the Improving Teacher (FIT), which uses a deep model-based reinforcement learning (deep MBRL) algorithm to provide the sample complexity benefits of model-based methods but enable faster training and evaluation via distillation into a reactive controller. We evaluate FIT with experiments on the Reacher and Pusher MuJoCo domains using the deep MBRL algorithm, PETS, as the improving supervisor. To the best of our knowledge, this work is the first to formally consider the s...

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions

Sample-based learning model predictive control (LMPC) strategies have recently attracted attentio... more Sample-based learning model predictive control (LMPC) strategies have recently attracted attention due to their desirable theoretical properties and their good empirical performance on robotic tasks. However, prior analysis of LMPC controllers for stochastic systems has mainly focused on linear systems in the iterative learning control setting. We present a novel LMPC algorithm, Adjustable Boundary Condition LMPC (ABC-LMPC), which enables rapid adaptation to novel start and goal configurations and theoretically show that the resulting controller guarantees iterative improvement in expectation for stochastic nonlinear systems. We present results with a practical instantiation of this algorithm and experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 stochastic continuous control tasks.

Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones

IEEE Robotics and Automation Letters, 2021

Safety remains a central obstacle preventing widespread use of RL in the real world: learning new... more Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in uncertain environments requires extensive exploration, but safety requires limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy learning and (2) separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task, and an image-based reaching task on a physical robot. We compare Recovery RL to 5 prior safe RL methods which jointly optimize for task performance and safety via constrained optimization or reward shaping and find that Recovery RL outperforms the next best prior method across all domains. Results suggest that Recovery RL trades off constraint violations and task successes 2-80 times more efficiently in simulation domains and 12 times more efficiently in physical experiments. See https://tinyurl.com/rl-recovery for videos and supplementary material.

Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter

2019 International Conference on Robotics and Automation (ICRA), 2019

When operating in unstructured environments such as warehouses, homes, and retail centers, robots... more When operating in unstructured environments such as warehouses, homes, and retail centers, robots are frequently required to interactively search for and retrieve specific objects from cluttered bins, shelves, or tables. Mechanical Search describes the class of tasks where the goal is to locate and extract a known target object. In this paper, we formalize Mechanical Search and study a version where distractor objects are heaped over the target object in a bin. The robot uses an RGBD perception system and control policies to iteratively select, parameterize, and perform one of 3 actions-push, suction, grasp-until the target object is extracted, or either a time limit is exceeded, or no high confidence push or grasp is available. We present a study of 5 algorithmic policies for mechanical search, with 15,000 simulated trials and 300 physical trials for heaps ranging from 10 to 20 objects. Results suggest that success can be achieved in this long-horizon task with algorithmic policies in over 95% of instances and that the number of actions required scales approximately linearly with the size of the heap. Code and supplementary material can be found at http://ai.stanford.edu/mech-search.

Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities

2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), 2021

In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequen... more In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequent assembly. Kitting is a critical step as it can decrease downstream processing and handling times and enable lower storage and shipping costs. We present Kit-Net, a framework for kitting previously unseen 3D objects into cavities given depth images of both the target cavity and an object held by a gripper in an unknown initial orientation. Kit-Net uses self-supervised deep learning and dataaugmentation to train a convolutional neural network (CNN) to robustly estimate 3D rotations between objects and matching concave or convex cavities using a large training dataset of simulated depth images pairs. Kit-Net then uses the trained CNN to implement a controller to orient and position novel objects for insertion into novel prismatic and conformal 3D cavities. Experiments in simulation suggest that Kit-Net can orient objects to have a 98.9 % average intersection volume between the object mesh and that of the target cavity. Physical experiments with industrial objects succeed in 18 % of trials using a baseline method and in 63 % of trials with Kit-Net. Video, code, and data are available at https://github. com/BerkeleyAutomation/Kit-Net.

VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation

Robotics: Science and Systems XVI, 2020

Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery ... more Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery and more. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We address this problem by extending the recently proposed Visual Foresight framework to learn fabric dynamics, which can be efficiently reused to accomplish a variety of different fabric manipulation tasks with a single goalconditioned policy. We introduce VisuoSpatial Foresight (VSF), which extends prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks both in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance for cloth manipulation tasks, and results suggest that leveraging RGBD data for video prediction and planning yields an 80% improvement in fabric folding success rate over pure RGB data. Supplementary material is available at https://sites.google.com/ view/fabric-vsf/.

Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manuf... more Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color (RGB), depth (D), or combined color-depth (RGBD) images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic supervisor that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of RGB vs D vs RGBD images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 180 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, RGBD policies trained in simulation attain coverage of 83% to 95% depending on difficulty tier, suggesting that effective fabric smoothing policies can be learned from an algorithmic supervisor and that depth sensing is a valuable addition to color alone. Supplementary material is available at https: //sites.google.com/view/fabric-smoothing. • Structural: between a point mass and the point masses to its left and above it.

Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks

IEEE Robotics and Automation Letters, 2020

Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering... more Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering a dense cost function, which can lead to unintended behavior, and dynamical uncertainty, which makes exploration and constraint satisfaction challenging. We address these issues with a new modelbased reinforcement learning algorithm, Safety Augmented Value Estimation from Demonstrations (SAVED), which uses supervision that only identifies task completion and a modest set of suboptimal demonstrations to constrain exploration and learn efficiently while handling complex constraints. We then compare SAVED with 3 state-of-the-art model-based and model-free RL algorithms on 6 standard simulation benchmarks involving navigation and manipulation and a physical knot-tying task on the da Vinci surgical robot. Results suggest that SAVED outperforms prior methods in terms of success rate, constraint satisfaction, and sample efficiency, making it feasible to safely learn a control policy directly on a real robot in less than an hour. For tasks on the robot, baselines succeed less than 5% of the time while SAVED has a success rate of over 75% in the first 50 training iterations. Code and supplementary material is available at https://tinyurl.com/saved-rl.