Papers by Pannag R Sanketi

arXiv (Cornell University), Jul 28, 2023
We study how vision-language models trained on Internet-scale data can be incorporated directly i... more We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).

arXiv (Cornell University), Jun 13, 2023
We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a c... more We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing "classical" and "learning-based" techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching. Figure 1: Mobile Manipulator with Lacrosse Head catching a ball within a second. (right) Automatic ball thrower with controllable yaw angles and speed of around 5m/s.
arXiv (Cornell University), Mar 31, 2020
We propose a model-free algorithm for learning efficient policies capable of returning table tenn... more We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.

arXiv (Cornell University), Oct 7, 2022
Learning goal conditioned control in the real world is a challenging open problem in robotics. Re... more Learning goal conditioned control in the real world is a challenging open problem in robotics. Reinforcement learning systems have the potential to learn autonomously via trial-and-error, but in practice the costs of manual reward design, ensuring safe exploration, and hyperparameter tuning are often enough to preclude real world deployment. Imitation learning approaches, on the other hand, offer a simple way to learn control in the real world, but typically require costly curated demonstration data and lack a mechanism for continuous improvement. Recently, iterative imitation techniques have been shown to learn goal directed control from undirected demonstration data, and improve continuously via self-supervised goal reaching, but results thus far have been limited to simulated environments. In this work, we present evidence that iterative imitation learning can scale to goal-directed behavior on a real robot in a dynamic setting: high speed, precision table tennis (e.g. "land the ball on this particular target"). We find that this approach offers a straightforward way to do continuous on-robot learning, without complexities such as reward design or sim-to-real transfer. It is also scalable-sample efficient enough to train on a physical robot in just a few hours. In real world evaluations, we find that the resulting policy can perform on par or better than amateur humans (with players sampled randomly from a robotics lab) at the task of returning the ball to specific targets on the table. Finally, we analyze the effect of an initial undirected bootstrap dataset size on performance, finding that a modest amount of unstructured demonstration data provided up-front drastically speeds up the convergence of a general purpose goal-reaching policy.

arXiv (Cornell University), Jul 13, 2022
Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to tr... more Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, our goal is to leverage the power of simulation to train robotic policies that are proficient at interacting with humans upon deployment. But there is a chicken and egg problem-how to gather examples of a human interacting with a physical robot so as to model human behavior in simulation without already having a robot that is able to interact with a human? Our proposed method, Iterative-Sim-to-Real (i-S2R), attempts to address this. i-S2R bootstraps from a simple model of human behavior and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are refined. For all training we apply a new evolutionary search algorithm called Blackbox Gradient Sensing (BGS). We evaluate our method on a real world robotic table tennis setting, where the objective for the robot is to play cooperatively with a human player for as long as possible. Table tennis is a high-speed, dynamic task that requires the two players to react quickly to each other's moves, making for a challenging test bed for research on human-robot interaction. We present results on an industrial robotic arm that is able to cooperatively play table tennis with human players, achieving rallies of 22 successive hits on average and 150 at best. Further, for 80% of players, rally lengths are 70% to 175% longer compared to the sim-to-real plus fine-tuning (S2R+FT) baseline. For videos of our system in action please see https://sites.google.com/view/is2r.
Robotics: Science and Systems XIX

arXiv (Cornell University), Mar 26, 2023
Most successes in robotic manipulation have been restricted to single-arm gripper robots, whose l... more Most successes in robotic manipulation have been restricted to single-arm gripper robots, whose low dexterity limits the range of solvable tasks to pick-and-place, insertion, and object rearrangement. More complex tasks such as assembly require dual and multi-arm platforms, but entail a suite of unique challenges such as bi-arm coordination and collision avoidance, robust grasping, and long-horizon planning. In this work we investigate the feasibility of training deep reinforcement learning (RL) policies in simulation and transferring them to the real world (Sim2Real) as a generic methodology for obtaining performant controllers for realworld bi-manual robotic manipulation tasks. As a testbed for bimanual manipulation, we develop the "U-Shape Magnetic Block Assembly Task", wherein two robots with parallel grippers must connect 3 magnetic blocks to form a "U"-shape. Without a manually-designed controller nor human demonstrations, we demonstrate that with careful Sim2Real considerations, our policies trained with RL in simulation enable two xArm6 robots to solve the U-shape assembly task with a success rate of above 90% in simulation, and 50% on real hardware without any additional real-world fine-tuning. Through careful ablations, we highlight how each component of the system is critical for such simple and successful policy learning and transfer, including task specification, learning algorithm, direct jointspace control, behavior constraints, perception and actuation noises, action delays and action interpolation. Our results present a significant step forward for bi-arm capability on real hardware, and we hope our system can inspire future research on deep RL and Sim2Real transfer of bi-manual policies, drastically scaling up the capability of real-world robot manipulators. The accompanying project webpage and videos can be found at: sites.google.com/view/u-shape-block-assembly.

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
In many robot learning tasks, demonstrations are often costly to gather. 1 In some cases the time... more In many robot learning tasks, demonstrations are often costly to gather. 1 In some cases the time and effort required to gather a sufficient quantity of 2 demonstrations or demonstrations of specific state-action trajectories may be pro3 hibitively expensive or infeasible. Indeed, it is often easier to generate undirected 4 demonstrations or ‘play’ data. In this work we propose a system that given a 5 small number of undirected demonstrations is 1) capable of learning a policy that 6 can solve for specific goals, 2) able to generalize beyond the state-action space 7 of the initial demonstration set, 3) can progressively improve both goal accuracy 8 and generalizability through self-supervised practice, and 4) can be used to train 9 a policy from scratch on a real robot, without requiring a sim-real transfer. We 10 demonstrate the effectiveness of this algorithm on the problem of goal-directed 11 robot table tennis both in simulation and on a real robot. We show that the real 12 robot performed better than or comparable to amateur human performance (with 13 players sampled randomly from a robotics lab) on a goal reaching task. We also 14 analyze the role of the initial demonstration dataset on performance. See video. 15
2012 American Control Conference (ACC), 2012
ABSTRACT

IFAC Proceedings Volumes, 2007
The initial 1-2 minutes of operation of an automotive spark-ignition engine, commonly called as t... more The initial 1-2 minutes of operation of an automotive spark-ignition engine, commonly called as the "coldstart" period, produces more than 75-80 % of the hydrocarbon (HC) emissions in a typical drive cycle. Model-based controller development requires accurate, yet simple, models that can run in realtime. Simple, intuitive models are developed to predict both tailpipe hydrocarbon (HC) emissions and exhaust temperature during coldstart. Each of the models is chosen to be sum of first order linear systems based on the experimental data and ease of parameter identification. Inputs to these models are AF R, spark timing and engine crankshaft speed. A reduced order thermodynamic model for the catalyst temperature is also developed. The parameters are identified using least squares technique. The model estimates for the coldstart are compared with the experimental results with good agreement.

Proceedings of the 16th IFAC World Congress, 2005, 2005
Almost three quarters of the hydrocarbon (HC) emissions emitted by an automobile in a typical dri... more Almost three quarters of the hydrocarbon (HC) emissions emitted by an automobile in a typical drive-cycle are produced during the first three minutes of its operation called the coldstart period. In this paper, we propose a way to decrease cold start emissions. A Model-Based paradigm is used to aid the generation of an efficient controller. The controller is built around a mean value engine model and a simplified catalyst model characterized by thermal dynamics, oxygen storage and static efficiency curves. It is shown that the control of engine-out exhaust gas temperature for faster catalyst light-off could be detrimental to the catalyst. A control scheme comprising engine-out hydrocarbon emissions control and catalyst temperature control through dynamic surface control is developed to reduce the tailpipe emissions. It is shown that reduced tailpipe emissions can be achieved without the risk of damaging the catalyst.
Dynamic Systems and Control, Parts A and B, 2005
IFAC Proceedings Volumes, 2007
The problem of controlling combustion engine emissions during the coldstart period is addressed b... more The problem of controlling combustion engine emissions during the coldstart period is addressed by designing a MIMO sliding mode controller. The task of the controller is to track a given set of desired profiles of engine-out hydrocarbon emissions and catalyst temperature using spark timing and fuel injection rate as the inputs. This is an important step in solving the coldstart problem. The throttle is not used as a control input. Different profiles of desired engine-out hydrocarbons and catalyst temperatures are used to analyze the coldstart problem. Simulation results indicate that the controller tracks the desired profiles as long as the inputs are not saturated. The controller presented here could be used as a tool to investigate the optimal input profiles. Experiments are being carried out to validate the simulations.
Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility, 2010

Dynamic Systems and Control, Parts A and B, 2006
Different strategies have been used in the past to improve the performance of hydrocarbon emissio... more Different strategies have been used in the past to improve the performance of hydrocarbon emissions controllers in SI engines. One of them relies on the use of the model-based control design scheme, which offers the possibility of automating the controller design-to-implementation phase. It also gives the chance to update the model with a potential reduction in required experiments if physical changes are made to the plant. Under the model-based scheme, an accurate plant model can greatly enhance the development of an effective control systems. In particular, acquiring a correct fuel-dynamics model can be crucial in developing a good hydrocarbon emissions controller for coldstart. During this period, the factory AFR (air-fuel ratio) sensor is not active and the engine experiences an abrupt transient, which makes modeling difficult. In this paper, a model that describes fuel dynamics for coldstart is developed. The model uses mainly two parameters: one of them accounts for the fracti...
Uploads
Papers by Pannag R Sanketi