Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun

OpenBot: Turning Smartphones into Robots
Matthias Müller1 and Vladlen Koltun1
Abstract— Current robots are either expensive or make

significant compromises on sensory richness, computational
power, and communication capabilities. We propose to leverage
smartphones to equip robots with extensive sensor suites, pow-
erful computational abilities, state-of-the-art communication
channels, and access to a thriving software ecosystem. We
arXiv:2008.10631v2 [cs.RO] 10 Mar 2021
design a small electric vehicle that costs $50 and serves as

a robot body for standard Android smartphones. We develop
a software stack that allows smartphones to use this body Fig. 1. OpenBots. Our wheeled robots leverage a smartphone for sensing
for mobile operation and demonstrate that the system is and computation. The robot body costs $50 without the smartphone. The
sufficiently powerful to support advanced robotics workloads platform supports person following and real-time autonomous navigation in
such as person following and real-time autonomous navigation unstructured environments.
in unstructured environments. Controlled experiments demon-
strate that the presented approach is robust across different tooth, 4G modems, and more. Modern smartphones are even
smartphones and robot bodies. equipped with dedicated AI chips for neural network infer-
I. I NTRODUCTION ence, some of which already outperform common desktop
processors [10].
Robots are expensive. Legged robots and industrial ma-
We develop and validate a design for a mobile robot
nipulators cost as much as luxury cars, and wheeled mobile
that leverages a commodity smartphone for sensing and
robots such as the popular Husky A200 still cost $20K. Even
computation (Figure 1). The smartphone acts as the robot’s
the cheapest robots from Boston Dynamics, Franka Emika or
brain and sensory system. This brain is plugged into a cheap
Clearpath cost at least $10K. Few academic labs can afford
electromechanical body that costs less than $50.
to experiment with robotics on the scale of tens or hundreds
Using off-the-shelf smartphones as robot brains has nu-
of robots.
merous advantages beyond cost. Hardware components on
A number of recent efforts have proposed designs for more
custom robots are quickly outdated. In contrast, consumer-
affordable robots. Kau et al. [1] and Grimminger et al. [2]
grade smartphones undergo generational renewal on an an-
proposed quadruped robots that rely on low-cost actuators
nual cadence, acquiring cameras with higher resolution and
and cost $3K and e4K. Yang et al. [3], Gupta et al. [4], and
higher frame rate, faster processors, new sensors, and new
Gealy et al. [5] proposed manipulation robots that cost $2K,
communication interfaces. As a side effect, second-hand
$3K, and $5K respectively. A number of mobile robots for
smartphones are sold cheaply, ready for a second life as
hobbyist and researchers have been released which fall in
a robot. In addition to the rapid advancement of hardware
the $250–500 range. These include the AWS DeepRacer [6],
capabilities, smartphones benefit from a vibrant software
the DJI Robomaster S1 [7], the Nvidia JetBot [8], and the
ecosystem. Our work augments this highly capable bundle
DukieBot [9]. In order to achieve this price point, these
of sensing and computation with a mobile physical body and
platforms had to make compromises with regards to the
a software stack that supports robotics workloads.
physical body, sensing, communication, and compute. Is
Our work makes four contributions. (1) We design a small
there an alternative where robots become extremely cheap,
electric vehicle that relies on cheap and readily available
accessible to everyone, and yet possess extensive sensory
components with a hardware cost of only $50 as a basis
abilities and computational power?
for a low-cost wheeled robot. (2) We develop a software
In this work, we push further along the path to highly
stack that allows smartphones to use this vehicle as a body,
capable mobile robots that could be deployed at scale. Our
enabling mobile navigation with real-time onboard sensing
key idea is to leverage smartphones. We are inspired in
and computation. (3) We show that the proposed system is
part by projects such as Google Cardboard: by plugging
sufficiently powerful to support advanced robotics workloads
standard smartphones into cheap physical enclosures, these
such as person following and autonomous navigation. (4)
designs enabled millions of people to experience virtual
We perform extensive experiments that indicate that the pre-
reality for the first time. Can smartphones play a similar
sented approach is robust to variability across smartphones
role in robotics?
and robot bodies.
More than 40% of the world’s population own smart-
Our complete design and implementation, including all
phones. Commodity models now carry HD cameras, pow-
hardware blueprints and the software suite are freely
erful CPUs and GPUs, advanced IMUs, GPS, WiFi, Blue-
available at https://www.openbot.org to support affordable
1 Intelligent Systems Lab, Intel robotics research and education at scale.
II. R ELATED W ORK costs $30-50, and is capable of heavy robotic workloads such
as autonomous navigation.
Wheeled robots used for research can be divided into Researchers have also explored the intersection of smart-
three main categories: tiny robots used for swarm robotics, phones and robotics. Yim et al. [27] detect facial expressions
larger robots based on RC trucks used for tasks that require and body gestures using a smartphone mounted on a robot
extensive computation and sensing, and educational robots. to study social interaction for remote communication via a
Swarm robots [11], [12], [13] are inexpensive but have robotic user interface. DragonBot [28] is a cloud-connected
very limited sensing and compute. They are designed to 5-DoF toy robot for studying human/robot interaction; a
operate in constrained indoor environments with emphasis on smartphone is used for control and a visual interface.
distributed control and swarm behaviour. On the other end V.Ra [29] is a visual and spatial programming system for
of the spectrum are custom robots based on RC trucks [14], robots in the IoT context. Humans can specify a desired
[15], [16], [17], [6], [18]. They feature an abundance of trajectory via an AR-SLAM device (e.g. a smartphone) which
sensors and computation, supporting research on problems is then attached to a robot to execute this trajectory. In
such as autonomous navigation and mapping. However, they contrast to our work, the navigation is not autonomous but
are expensive and much more difficult to assemble and op- relies on user input. Oros et al. [30] leverage a smartphone
erate. Educational robots [19], [9] are designed to be simple as a sensor suite for a wheeled robot. The authors retrofit
to build and operate while maintaining sufficient sensing an RC truck with a smartphone mount and I/O interface to
and computation to showcase some robotic applications such enable autonomous operation. However, the key difference
as lane following. However, their sensors and compute are is that they stream the data back to a computer for pro-
usually not sufficient for cutting-edge research. Some robots cessing. Moreover, the proposed robot costs $350 without
such as the DuckieBot [9] and Jetbot [8] try to bridge this the smartphone and does not leverage recent advancements
gap with designs that cost roughly $250. However, these that enable onboard deep neural network inference or visual-
vehicles are small and slow. In contrast, our wheeled robot inertial state estimation. The project is stale, with no updates
body costs $50 or less and has a much more powerful to the software in 7 years.
battery, bigger chassis, and four rather than two motors. In summary, the aforementioned projects use the smart-
The body serves as a plug-in carrier for a smartphone, phone as a remote control for teleoperation, offload data to
which provides computation, sensing, and communication. a server for processing, or rely on commercial or outdated
Leveraging off-the-shelf smartphones allows this design to hardware and software. In contrast, our platform turns a
exceed the capabilities of much more expensive robots. smartphone into the brain of a fully autonomous robot with
Contemporary smartphones are equipped with mobile AI onboard sensing and computation.
accelerators, the capabilities of which are rapidly advancing.
Ignatov et al. [10] benchmark smartphones with state-of-the- III. S YSTEM
art neural networks for image classification, image segmen- A. A Body for a Low-cost Wheeled Robot
tation, object recognition, and other demanding workloads. A brain without a body cannot act. In order to leverage
Not only are most recent smartphones able to run these the computational power and sensing capabilities of a smart-
complex AI models, but they approach the performance of phone, the robot needs an actuated body. We develop a body
CUDA-compatible graphics cards. Lee et al. [20] show how for a low-cost wheeled robot which only relies on readily
to leverage mobile GPUs that are already available on most available electronics and 3D printing for its chassis. The total
smartphones in order to run complex AI models in real cost is $50 for building a single body, with 30% of that cost
time. They also discuss design considerations for optimizing due to good batteries. If building multiple robots, the cost
neural networks for deployment on smartphones. Our work further decreases. Table I shows the bill of materials.
harnesses these consumer hardware trends for robotics.
There have been a number of efforts to combine smart-
Component Quantity Unit Price Bulk Price
phones and robotics. In several hobby projects, smartphones
are used as a remote control for a robot [21], [22]. On 3D-printed Body 1 $5.00 $5.00
Arduino Nano 1 $8.00 $3.00
Kickstarter, Botiful [23] and Romo [24] raised funding for Motor + Tire 4 $3.50 $2.00
wheeled robots with smartphones attached for telepresence, Motor Driver 1 $3.00 $2.00
and Ethos [25] for a drone powered by a smartphone. Most Battery (18650) 3 $5.00 $3.00
Speed Sensor 2 $1.50 $1.00
related to our work is Wheelphone [26], where a smartphone Sonar Sensor 1 $2.00 $1.00
is mounted on a robot for autonomous navigation. Unfortu- Miscellaneous* 1 $5.00 $3.00
nately, this project is stale; the associated Github repos have Total $50 $30
only 1 and 4 stars and the latest contribution was several
TABLE I
years ago. The robot has two motors providing a maximum
B ILL OF MATERIALS . U NIT PRICE IS THE APPROXIMATE PRICE PER
speed of only 30 cm/s and is restricted to simple tasks such as
ITEM FOR A SINGLE VEHICLE . T HE BULK PRICE IS THE APPROXIMATE
following a black tape on the floor or obstacle avoidance on
PRICE PER ITEM FOR FIVE VEHICLES . *N OT INCLUDED IN TOTAL COST
a tabletop. Despite these drawbacks, it costs $250. Our robot
( SCREWS , WIRES , RESISTORS , LED S , SPRING , ETC .)
is more rugged, can reach a maximum speed of 150 cm/s,
Top Front Side Back
Fig. 2. Mechanical design. CAD design of the 3D-printed robot body.
Mechanical design. The chassis of the robot is 3D-printed

and is illustrated in Figure 2. It consists of a bottom plate
and a top cover which is fastened with six M3x25 screws.
The bottom plate features the mounting points for the motors
and electronics. The four motors are fixed with eight M3x25
screws. The motor controller and microcontroller attach to
the bottom plate. There are appropriate openings for the
indicator LEDs and the sonar sensor. The optical wheel
odometry sensors can be mounted on the top cover and
the bottom plate has grooves for the corresponding encoder
disks on the front wheels. The top plate features a universal
smartphone mount which uses a spring to adjust to different
phones. There is also an opening for the USB cable that
connects the smartphone to an Arduino microcontroller.
With standard settings on a consumer 3D printer (e.g. Ul-
timaker S5), the complete print requires 13.5 hours for the
bottom plate and 9.5 hours for the top cover with the phone
mount. It is possible to print at a faster pace with less
accuracy. The material weight is 146g for the bottom and
Fig. 3. Wiring diagram. Top: Electrical connections between battery,
103g for the top. Considering an average PLA filament price motor driver, microcontroller, speed sensors, sonar sensor and indicator
of $20/kg, the total material cost is about $5. LEDs. Bottom: Optional custom PCB to reduce wiring.
Electrical design. Figure 3 shows the electrical design. We

use the L298N breakout board as motor controller. The two The Arduino, motors, indicator LEDs, speed sensors, and
left motors are connected to one output and the two right mo- the ultrasonic sensor are simply plugged in. When building
tors to the other. The battery pack is connected to the power multiple robots, the PCB further reduces setup time and cost.
terminals to provide power to the motors as needed. Our
B. Software Stack
battery consists of three USB-rechargeable 18650 Lithium
cells connected in series, providing a voltage between 9.6V Our software stack consists of two components, illustrated
and 12.6V depending on their state-of-charge (SOC). An in Figure 4. The first is an Android application that runs
Arduino Nano board is connected to the smartphone via its on the smartphone. Its purpose is to provide an interface
USB port, providing a serial communication link and power. for the operator, collect datasets, and run the higher-level
Two LM393-based speed sensors, a sonar sensor and two perception and control workloads. The second component
indicator LEDs are connected to six of the digital pins. The is a program that runs on the Arduino. It takes care of the
optical speed sensors provide a wheel odometry signal for low-level actuation and some measurements such as wheel
the left and right front wheels. The sonar sensor provides odometry and battery voltage. The Android application and
a coarse estimate of free space in front of the robot. The the Arduino communicate via a serial communication link.
indicator LEDs can be toggled, providing visual means for In the following, we discuss both components in more detail.
the robot to communicate with its environment. We also Android application. We design a user interface which
use one of the analog pins as input to measure the battery provides visual and auditory feedback for interaction with
voltage through a voltage divider. Finally, four PWM pins the robot. We use Bluetooth communication to interface with
are connected to the motor controller. This allows us to common game console controllers (e.g. PS4, Xbox), which
adjust the speed and direction of the motors according to can be used to teleoperate the robot for data collection. To
the control commands received from the smartphone. We collect data, such as demonstrations for imitation learning,
have also designed a PCB with integrated battery monitoring we use the joystick inputs to control the robot and use the
and two TI-DRV8871 motor drivers for increased efficiency. buttons to trigger functionalities such as toggling control
OpenBot - Software Design
Android Arduino troller to act as a bridge between the vehicle body and the
smartphone. Its main task is to handle the low-level control
Interface with game
controller (e.g. PS4/Xbox)
Data Logger
➔ images
PWM control of motors of the vehicle and provide readings from vehicle-mounted
➔ vehicle control
➔ indicators
➔
➔
IMU, GPS, etc.
control, indicator Operate indicator signals
sensors. The program components are shown on the right
➔ vehicle modes
in Figure 4. The Arduino receives the vehicle controls and
Graphical user interface Run neural network Measure wheel speed indicator signals via the serial connection. It converts the
controls to PWM signals for the motor controller and toggles
Audio feedback for user Process network output Monitor battery voltage
the LEDs according to the indicator signal. The Arduino
State estimation Serial communication Serial communication also keeps track of the wheel rotations by counting the
interrupts at the optical sensors on the left and right front
wheels. The sonar sensor transmits and receives a sound
Fig. 4. Software design. Our Android application is responsible for high-
level computation on the smartphone and the Arduino program provides the
wave and the duration is measured and used to estimate the
low-level interface to the vehicle. free space in front of the vehicle. The battery voltage is
calculated by a scaled moving average of measurements at
the voltage divider circuit. All measurements are sent back
modes, logging, running a neural network, etc. We derive our
to the Android application through the serial link.
graphical user interface from the Android Tensorflow Object
Detection application [31] and extend it. Our GUI provides C. Comparison to Other Wheeled Robots
the camera feed and buttons to toggle data logging, control We compare to existing robot platforms in Table II. In
modes, and serial communication. It also allows switching contrast to other robots, our platform has an abundance
between different neural networks to control the vehicle of processing power, communication interfaces, and sensors
and provides relevant information such as image resolution, provided by the smartphone. Existing robots often rely on
inference time and predicted controls. We also integrate voice custom software ecosystems, which require dedicated lab
feedback for operation via the game controller. personnel who maintain the code, implement new features,
The Android ecosystem provides a unified interface to and implement drivers for new sensors. In contrast, we use
obtain sensor data from any Android smartphone. We build Android, one of the largest constantly evolving software
a data logger on top of that in order to collect datasets with ecosystems. All the low-level software for sensor integration
the robots. Currently, we record readings from the following and processing already exists and improves without any
sensors: camera, gyroscope, accelerometer, magnetometer, additional effort by the robotics community. All sensors are
ambient light sensor, and barometer. Using the Android API, already synchronized on the same clock, obviating what is
we are able to obtain the following sensor readings: RGB now a major challenge for many existing robots.
images, angular speed, linear acceleration, gravity, magnetic
IV. E VALUATION
field strength, light intensity, atmospheric pressure, latitude,
longitude, altitude, bearing, and speed. In addition to the To demonstrate that smartphones are suitable to pro-
phone sensors, we also record body sensor readings (wheel vide sensing, communication, and compute for interesting
odometry, free space estimate and battery voltage), which are robotics applications, we evaluate the presented platform on
transmitted via the serial link. two applications, person following and autonomous naviga-
We leverage the computational power of the smartphone tion. The experimental setups are described in Section IX and
to process the sensory input and compute the robots’ actions in Section X respectively. We conduct the evaluation using
in real time. While there are many classic motion planning a variety of popular smartphones from the past two years
algorithms, we focus on learning-based approaches, which with prices ranging from $120 to $750. The smartphones are
allow for a unified interface. In particular, we rely on the carefully selected to cover different manufactures, chipsets,
Tensorflow Lite infrastructure, which integrates seamlessly and sensor suites. Detailed specifications and benchmark
with smartphones [10], [20]. Our Android application fea- scores of the smartphones are provided in Section VIII.
tures model definitions for object detection and autonomous We discuss our general evaluation setup and procedure that
navigation. These define the input and output properties of ensures a fair comparison in the following and report the
the neural network. We build on top of the Tensorflow Object results in Section V.
Detection application [31] to detect people and perform A. Evaluation metrics
visual servoing to follow them. We also integrate a model In order to streamline our evaluation while providing a
for autonomous navigation inspired by Conditional Imitation comprehensive performance summary, we use three metrics:
Learning [15]. The deployment process is simple. After distance, success rate, and collisions. The distance is con-
training a model in Tensorflow, it is converted to a Tensorflow tinuous and we report it as a percentage of the complete
Lite model that can be directly deployed on the smartphone. trajectory. The distance measurement stops if an intersection
Both neural networks only rely on the camera feed for is missed, a collision occurs or the goal is reached. The
their predictions. Neither wheel odometry nor sonar sensor success rate is binary and indicates whether or not the goal
readings are required to produce the raw vehicle controls. was reached. We also count the number of collisions. All
Arduino program. We use an Arduino Nano microcon- results are averaged across three trials.
Setup Time [h]
Battery [min]
Speed [m/s]
Microphone
Weight [kg]
Retail Cost
Ecosystem
3G/4G/5G
Odometry
Size [cm]
Actuation
Bluetooth
Compute
Speaker
Camera
Display
LiDAR
Sonar
WiFi
IMU
GPS
Platform
AutoRally [18] $10,000 100 100x60x40 22 25 20+ BLDC+Servo 3 3 7 7 3 3 3 7 7 7 7 7 Mini-ITX PC ROS
F1/10 [17] $3600 3 55x30x20 4.5 18 20+ BLDC+Servo 3 3 3 7 3 7 3 3 7 7 7 7 Jetson TX2 ROS
RACECAR [32] $3400 10 55x30x20 4.5 18 20+ BLDC+Servo 3 3 3 7 3 7 3 3 7 7 7 7 Jetson TX1 ROS
BARC [14] $1000 3 54x28x21 3.2 - 20+ BLDC+Servo 3 3 7 7 3 7 3 7 7 7 7 7 Odroid XU-4 ROS
MuSHR [16] $900 3 44x28x14 3 11 20+ BLDC+Servo 3 3 3 7 3 7 3 3 7 7 7 7 Jetson Nano ROS
DeepRacer [6] $400 0.25 - - 6 15+ BDC+Servo 7 3 7 7 3 7 3 7 7 7 7 7 Intel Atom Custom
DonkeyCar [33] $250 2 25x22x12 1.5 9 15+ BDC+Servo 7 3 7 7 7 7 7 7 7 7 7 7 Raspberry Pi Custom
Duckiebot [9] $280 0.5 - - - - 2xBDC 7 3 7 7 7 7 3 7 7 7 7 7 Raspberry Pi Custom
Pheeno [13] $270 - 13x11 - 0.42 300+ 2xBDC 3 7 7 7 3 7 3 3 7 7 7 7 ARM Cortex-A7 Custom
JetBot [8] $250 1 20x13x13 - - - 2xBDC 7 3 7 7 7 7 3 7 7 7 7 7 Nvidia Jetson Custom
Create-2 [34] $200 - 34x34x9 3.6 - - 2xBDC 7 7 7 7 7 7 7 7 7 7 7 7 7 Custom
Thymio II [19] $170 - 11x11x5 0.46 0.14 - 2xBDC 7 7 7 7 3 7 7 7 7 3 3 7 Microcontroller Custom
AERobot [35] $20 0.1 3x3x3 0.03 - - 2xVibration 7 7 7 7 7 7 7 7 7 7 7 7 Microcontroller Custom
OpenBot $50? 0.25 24x15x12 0.7 1.5 45+ 4xBDC 3 3 7 3 3 3 3 3 3 3 3 3 Smartphone Android
TABLE II
ROBOTS . C OMPARISON OF WHEELED ROBOTIC PLATFORMS . T OP : ROBOTS BASED ON RC TRUCKS . B OTTOM : NAVIGATION ROBOTS FOR
DEPLOYMENT AT SCALE AND IN EDUCATION . ”–” INDICATES THAT NO INFORMATION IS AVAILABLE . *T HE COST OF THE SMARTPHONE IS NOT
INCLUDED AND VARIES .
B. Evaluation protocol Distance ↑ Success ↑ Collisions ↓ FPS ↑

Since our experiments involve different smartphones, MobileNet V1 V3 V1 V3 V1 V3 V1 V3
cheap robots, and a dynamic physical world, we make several Huawei P30 Pro 100% 100% 100% 100% 0.0 0.0 33 30
Google Pixel 4XL 100% 100% 100% 100% 0.0 0.0 32 28
considerations to ensure a fair evaluation. We divide each Xiaomi Mi9 100% 100% 100% 100% 0.0 0.0 32 28
experiment into several well-defined segments to ensure con- Samsung Note 10 100% 100% 100% 100% 0.0 0.0 16 22
sistency and minimize human error. To ensure that the robots OnePlus 6 100% 100% 100% 100% 0.0 0.0 11 15
Huawei P30 Lite 100% 99% 100% 83% 0.0 0.3 9 11
are initialized at the same position for each experiment, we Xiaomi Note 8 100% 100% 100% 100% 0.0 0.0 9 11
use markers at the start and end position of each segment. Xiaomi Poco F1 98% 100% 83% 100% 0.3 0.0 8 12
We also align all phones with their power button to the Samsung Note 8 58% 100% 33% 100% 0.0 0.0 6 10
Nokia 2.2 37% 50% 0% 0% 0.0 0.3 4 5
phone mount to ensure the same mounting position across
experiments. Since the inference time of smartphones can TABLE III
be affected by CPU throttling, we check the temperature P ERSON FOLLOWING . W E USE M OBILE N ET DETECTORS AND VISUAL
of each smartphone before starting an experiment and close SERVOING TO FOLLOW A PERSON . A LL RESULTS ARE AVERAGED
all applications running in the background. We use several ACROSS THREE TRIALS .
metrics to provide a comprehensive performance analysis.
V. R ESULTS
smartphones [10] and highlights the rapid rate at which
A. Person Following smartphones are improving. Please see the supplementary
In this experiment, we investigate the feasibility of run- video for qualitative results.
ning complex AI models on smartphones. We use object
detectors and apply visual servoing to follow a person. Our B. Autonomous Navigation
experiments show that all recent mid-range smartphones are We train a driving policy that runs in real time on most
able to track a person consistently at speeds of 10 fps or smartphones. Our learned policy is able to consistently
higher. The cheapest low-end phone (Nokia 2.2) performs follow along corridors and take turns at intersections. We
worst, but is surprisingly still able to follow the person compare it to existing driving policies and achieve similar
about half of the time. We expect that even low-end phones performance as the baselines while requiring about one order
will be able to run complex AI models reliably in the near of magnitude fewer parameters. We also successfully transfer
future. The Huawei P30 Pro was the best performer in our our driving policy to different smartphones and robot bodies.
comparison, closely followed by other high-end phones such When training on data acquired with multiple smartphones
as the Google Pixel 4XL and the Xiaomi Mi9. All recent and robots, we observe increased robustness. We show that
mid-range phones (e.g. Xiaomi Note 8, Huawei P30 Lite, our driving policy is able to generalize to previously unseen
Xiaomi Poco F1) clearly outperform the Samsung Galaxy environments, novel objects, and even dynamic obstacles
Note 8, which was a high-end phone just two years ago. such as people even though only static obstacles were present
This is due to dedicated AI accelerators present in recent in the training data.
Comparing driving policies. OpenBot enables benchmark- Distance ↑ Success ↑ Collisions ↓
ing using real robots. We compare our policy to two baselines Robot Body 1 94±5% 89±10% 0.0±0.0
across three trials in Table IV. To ensure optimal conditions Robot Body 2 94±5% 89±10% 0.0±0.0
for the baselines, we use the high-end smartphone Xiaomi Robot Body 3 92±0% 83±0% 0.0±0.0
Mi9. Our driving policy network is smaller by a factor of 7 Robot Body 4 89±5% 78±10% 0.1±0.1
or more than the baselines. Yet it outperforms PilotNet [36] TABLE VI
and achieves similar performance to CIL [15] while running N OVEL BODIES . W E TRAIN OUR DRIVING POLICY USING ONE BODY
at twice the speed. ( TOP ) AND THEN TEST IT ON OTHER BODIES ( BOTTOM ).
Distance ↑ Success ↑ Collisions ↓ FPS ↑ Params ↓

PilotNet [36] 92±0% 83±0% 0.0±0.0 60±1 9.6M resolution, aggressive downsampling, and small number of
CIL [15] 94±5% 89±10% 0.0±0.0 20±1 10.7M parameters in our network may serve as natural regularization
Ours 94±5% 89±10% 0.0±0.0 47±2 1.3M that prevents the network from overfitting to specific obsta-
TABLE IV cles. Since the network processes camera input on a frame-
BASELINES . W E COMPARE OUR DRIVING POLICY TO TWO BASELINES . by-frame basis, static and dynamic obstacles are treated on
A LL POLICIES ARE TRAINED FOR 100 EPOCHS USING THE SAME DATA the same basis. We also conjecture that the network has
AND HYPERPARAMETERS . learned some robustness to motion blur due to vibrations of
the vehicle. Our navigation policy is also able to generalize
Generalization to novel phones. Table V shows that our to novel environments within the same office building. Please
navigation policy can be trained with data from one phone refer to the supplementary video for qualitative results.
and generalize to other phones. How well the generalization Learning with multiple robots. We also investigated the
works depends on the target phone, especially its processing impact of using multiple different smartphones and robot
power and camera placement. We observe a degradation in bodies for data collection which is relevant for using our
performance for phones unable to run the driving policy in platform at scale. We provide detailed results in Section X-H
real time. Differences in camera placement affect qualitative and summarize the findings here. Training the driving poli-
driving performance; for tasks that require high precision this cies on data acquired from multiple smartphones improves
may need to be accounted for. The differences in camera generalization to other phones; every manufacturer tunes the
sensors (e.g. color reproduction and exposure) are largely color reproduction and exposure slightly differently, leading
overcome by data augmentation. to natural data augmentation. The driving policy trained on
data acquired with multiple robot bodies is the most robust;
Distance ↑ Success ↑ Collisions ↓ FPS ↑ since the smartphone was fixed, the network had to learn to
cope with noisy actuation and dynamics, which we show to
Xiaomi Mi9 94±5% 89±10% 0.0±0.0 47±2
be possible even with relatively small datasets.
Google Pixel 4XL 92±0% 83±0% 0.0±0.0 57±3
Huawei P30 Pro 97±5% 94±10% 0.0±0.0 51±0
Samsung Note 10 92±0% 83±0% 0.0±0.0 38±8
OnePlus 6T 89±5% 78±10% 0.1±0.1 32±0 VI. C ONCLUSION
Xiaomi Note 8 92±0% 83±0% 0.0±0.0 31±0
Huawei P30 Lite 92±0% 83±0% 0.0±0.0 30±1 This work aims to address two key challenges in robotics:
Xiaomi Poco F1 86±5% 72±10% 0.1±0.1 26±8
Samsung Note 8 83±0% 67±0% 0.2±0.0 19±3 accessibility and scalability. Smartphones are ubiquitous and
are becoming more powerful by the year. We have developed
TABLE V a combination of hardware and software that turns smart-
N OVEL PHONES . W E TRAIN OUR DRIVING POLICY USING ONE PHONE phones into robots. The resulting robots are inexpensive but
( TOP ) AND THEN TEST IT ON OTHER PHONES ( BOTTOM ). capable. Our experiments have shown that a $50 robot body
powered by a smartphone is capable of person following and
Generalization to novel bodies. Table VI shows that our real-time autonomous navigation. We hope that the presented
navigation policy can be trained with data from one robot work will open new opportunities for education and large-
body and generalize to other robot bodies. Due to the scale learning via thousands of low-cost robots deployed
cheap components, every body exhibits different actuation around the world.
noise which may change over time and is observable in its Smartphones point to many possibilities for robotics that
behaviour (e.g. a tendency to pull to the left or to the right). we have not yet exploited. For example, smartphones also
We address this by injecting noise in the training process provide a microphone, speaker, and screen, which are not
[15]. Further details are provided in Section X-C. commonly found on existing navigation robots. These may
Generalization to novel obstacles. Even though our driving enable research and applications at the confluence of human-
policies were only exposed to static obstacles in the form robot interaction and natural language processing. We also
of office chairs during data collection, they were able to expect the basic ideas presented in this work to extend to
generalize to novel static obstacles (potted plants) and even other forms of robot embodiment, such as manipulators,
dynamic obstacles (people) at test time. The low image aerial vehicles, and watercrafts.
APPENDIX B. Network
VII. S YSTEM OVERVIEW We use the SSD object detector with a pretrained Mo-
bileNet backbone [37]. To investigate the impact of inference
Figure 5 depicts the high-level overview of our system. It
time, we use two different versions of MobileNet, the original
comprises a smartphone mounted onto a low-cost robot body.
MobileNetV1 [37] and the lastest MobileNetV3 [38]. We use
The smartphone consumes sensor data (e.g. images, IMU,
the pretrained models released as part of the Tensorflow ob-
GPS, etc. ) and optionally user input in order to produce
ject detection API. Both models were trained on the COCO
high-level controls for the vehicle such as steering angle and
dataset [39] with 90 class labels. The models are quantized
throttle. The microcontroller on the robot body applies the
for improved inference speed on smartphone CPUs.
corresponding low-level actuation signals to the vehicle.
OpenBot - System
X. AUTONOMOUS NAVIGATION
Game controller control input a = (al, ar) corresponding to the left and right throttle
A. Experimental setup
Data collection
Manual control
Failsafe aleft The robots have to autonomously navigate through cor-
ridors in an office building without colliding. The driving
Smartphone
User interface
policy receives high-level guidance in the form of indicator
Sensor logger
State estimation
commands such as turn left / right at the next intersec-
Compute
tion [15]. Each trial consists of several segments with a total
aright of 2 straights, 2 left turns, and 2 right turns. More details
Network
Produce vehicle on the evaluation setup including a map with dimensions are
controls based on
sensor input provided in Section X-G.
B. Network
Fig. 5. System overview. Our wheeled robot leverages a smartphone for We design a neural network similar in spirit to the
sensing and computation. The robot body costs $50 without the smartphone. command-input variant of Conditional Imitation Learning
The platform supports person following and real-time autonomous naviga- [15]. Our network is about one order of magnitude smaller
tion in unstructured environments.
than existing networks and is able to run in real time even
on mid-range smartphones. We train this network using a
VIII. S MARTPHONES novel loss function (see Section X-D) and validation metrics
Table VII provides an overview of the smartphones we (see Section X-E). We obtain successful navigation policies
use in our experiments. We provide the main specifications with less than 30 minutes of labeled data and augmentation.
along with the Basemark OS II and Basemark X benchmark Our network is visualized in Figure 6. It takes an image
scores which measure the overall and graphical performance i and a command c as inputs and processes them via an
of smartphones. We also include the AI score [20] where image module I(i) and a command module C(c). The image
available. module consists of five convolutional layers with 32, 64,
96, 128 and 256 filters, each with a stride of 2, a kernel
IX. P ERSON F OLLOWING size of 5 for the first layer, and 3 for the remaining layers.
A. Experimental setup We apply relu activation functions, batch-normalization, and
20% dropout after each convolutional layer. The output is
The robot is tasked to follow a person autonomously. flattened and processed by two fully-connected layers with
To this end, we run a neural network for object detection 128 and 64 units. The command module is implemented
and only consider detections of the person class and reject as an MLP with 16 hidden units and 16 output units. The
detections with a confidence below a threshold of 50%. We outputs of the image module and the command module are
track detections across frames, and pick the one with the concatenated and fed into the control module A which is
highest confidence as the target. We apply visual servoing also implemented as an MLP. It has two hidden layers with
with respect to the center of the bounding box, keeping the 64 and 16 units and then linearly regresses to the action
person centered in the frame. We evaluate two variants of
the object detector (see Section IX-B) across ten different
smartphones. For fair comparison, we only use the CPU
with one thread on each phone. Using the GPU or the
Image
NNAPI can further improve the runtime on most phones.
We provide a quantitative evaluation in a controlled indoor Action
environment. The route involves a round trip between an
office and a coffee machine and includes four left turns and Command
four right turns. We average results across three trials for

each experiment. Please refer to Section V for the results. In
Fig. 6. Driving policy: Network architecture. Our compact neural
addition, the supplementary video contains qualitative results network for autonomous navigation runs in real time on most smartphones
in unstructured outdoor environments. we tested.
Mobile Phone Release Price Main Camera Memory/RAM CPU GPU Overall Graphics AI
Samsung Note 8 09/17 300 12 MP, f/1.7, 1/2.55” 64GB, 6GB Exynos 8895 Mali-G71 MP20 3374 40890 4555
Huawei P30 Pro 03/19 650 40 MP, f/1.6, 1/1.7” 128GB, 8GB HiSilicon Kirin 980 Mali-G76 MP10 4654 45889 27112
Google Pixel 4XL 10/19 750 12.2 MP, f/1.7, 1/2.55” 64GB, 6GB Qualcomm SM8150 Adreno 640 5404 – 32793
Xiaomi Note 8 08/19 170 48 MP, f/1.8, 1/2.0” 64GB, 4GB Qualcomm SDM665 Adreno 610 2923 17636 7908
Xiaomi Mi 9 02/19 380 48 MP, f/1.8, 1/2.0” 128GB, 6GB Qualcomm SM8150 Adreno 640 5074 45089 31725
OnePlus 6T 11/18 500 16 MP, f/1.7, 1/2.6” 128GB, 8GB Qualcomm SDM845 Adreno 630 4941 43886 18500
Samsung Note 10 08/19 750 12 MP, f/1.5-2.4, 1/2.55” 256GB, 8GB Exynos 9825 Mali-G76 MP12 4544 45007 24924
Huawei P30 Lite 04/19 220 48 MP, f/1.8, 1/2.0”, 128GB, 4GB Hisilicon Kirin 710 Mali-G51 MP4 2431 20560 -
Xiaomi Poco F1 08/18 290 12 MP, f/1.9, 1/2.55” 64GB, 6GB Qualcomm SDM845 Adreno 630 4034 43652 6988
Nokia 2.2 06/19 120 13 MP, f/2.2, 1/3.1” 16GB, 2GB Mediatek MT6761 PowerVR GE8320 848 5669 –
TABLE VII
S MARTPHONES . S PECIFICATIONS OF THE SMARTPHONES USED IN OUR EXPERIMENTS . W E REPORT THE OVERALL , GRAPHICS , AND AI
PERFORMANCE ACCORDING TO STANDARD BENCHMARKS . T OP : SIX SMARTPHONES USED TO COLLECT TRAINING DATA . B OTTOM : SMARTPHONES
USED TO TEST CROSS - PHONE GENERALIZATION . ”–” INDICATES THAT THE SCORE IS NOT AVAILABLE .
vector a. We concatenate the command c with the hidden D. Training Process

units for added robustness. We apply 50% dropout after all We augment the images by randomly adjusting hue, sat-
fully-connected layers. uration, brightness and contrast during training. In addition,
We use an image input size of 256x96, resulting in 1.3M we flip images and labels to increase our effective training
parameters. At the same input resolution, PilotNet [36] has set size and balance potential steering biases. We normalize
9.6M parameters and CIL [15] has 10.7M parameters. Our images and actions to the range [0, 1].
network runs in real time on most smartphones we tested. When training end-to-end driving policies on autonomous
The average inference times on the Samsung Galaxy Note navigation datasets, one common challenge is the huge label
10, Xiaomi Mi9, Xiaomi Pocofone F1, and Huawei P30 imbalance. The majority of the time, the vehicle is driving in
Lite are 19ms, 21ms, 29ms, and 32ms, respectively. Further a straight line, resulting in many images with the same label.
speedups are possible by quantization of the network weights One common approach is to resample the dataset or carefully
and by leveraging the GPU or the recent neural network API craft individual batches during training [15]. However, this
(NNAPI) [10]. usually requires a fixed dataset or computational overhead.
If the dataset is dynamically changing or arrives as a con-
C. Dataset Collection tinuous stream these methods are not feasible. Instead, we
address this imbalance with a weighted loss. The intuition
Figure 7 depicts the pipeline for training our driving is simple: the stronger the steering angle, the more critical
policy. We record a driving dataset with a human controlling the maneuver. Hence we use a loss with a weighted term
the robot via a game controller. In previous works, data was proportional to the steering angle combined with a standard
often collected with multiple cameras for added exploration MSE loss on the entire action vector to ensure that throttle
[36], [15]. Since we only use one smartphone camera, we is learned as well:
inject noise during data collection and record the recovery
maneuvers executed by the human operator [15]. We also L = w2 · MSE st , sp + MSE at , ap ,

(1)
scatter obstacles in the environment, such as chairs, for added
robustness. Please refer to Section X-F for further details, where at is the target action, ap is the predicted action, st
including
Autonomousmaps Navigation
and images -ofTraining
the environment.
Pipeline is the target steering angle, and w = (st + b) with a bias
b to control the weight of samples with zero steering angle.
Since our vehicle uses differential steering, the action vector
Control signal by Record control signal
START human operator via
game controller
and sensor data
(image, imu, etc.)
Noise
enabled
Yes
Add noise to control Apply control to robot consists of a two-dimensional control signal a = (al , ar ),
Dataset Collection No
corresponding to throttle for the left and right wheels. We
Preprossing: Normalize Sample Augmentation:
compute the steering angle as s = al − ar .
Remove steering bias: Update driving policy
Dataset
Sync images,
sensor data,
control
images
and controls to
the range [0,1]
batch from
dataset
(image, label)
adjust saturation,
hue, contrast,
brightness
flip images and labels
with probability 0.5
by optimizing our
novel loss function
We use the Adam optimizer with an initial learning rate
Training Process
of 0.0003 and train all models for 100 epochs. We obtain
successful navigation policies with less than 30 minutes of
labeled data and augmentation. Our validation metrics are
Fig. 7. Driving policy: Training pipeline. The flowchart explains the
1 further discussed in Section X-E.
complete process for obtaining our autonomous navigation policy. There
are two main components, dataset collection and training the driving policy E. Validation Metrics
which is represented by a neural network. A major challenge in training autonomous driving policies
and evaluating them based on the training or validation loss is
the lack of correlation to the final performance of the driving G. Evaluation Details
policy [40]. Different action sequences can lead to the same We design an evaluation setup that is simple to set up in
state. The validation loss measures the similarity between various environments in order to encourage benchmarking
target and prediction which is too strict. Hence, we define using OpenBot. The only thing needed is a T-junction as
two validation metrics which are less strict and reduce the shown in Figure 10. We define one trial as six segments
gap between offline and online evaluation. The first metric comprising two straights, two right turns, and two left turns.
measures whether the steering angle is within a given thresh- We distinguish between closed and open turns, the latter
old, which we set to 0.1. The second metric is even more being more difficult. To ensure a simple yet comprehen-
relaxed and only considers whether the steering direction sive comparison, we adopt the following metrics: success,
of the target and the prediction align. We find empirically distance, number of collisions and inference speed. Success
that these metrics are more reliable as the validation loss. is a binary value indicating weather or not a segment was
However, the correlation to the final driving performance is completed. The distance is measured along the boundary of a
still weak. We pick the best checkpoint based on the average segment without counting the intersections. This way, every
of these two metrics on a validation set. segment has a length of 10m and the metric is invariant to
F. Training environment different corridor widths. If an intersection is missed, we
measure the distance until the beginning of the intersection
We show a map of our training environment in Figure 8
(i.e. 5m). The number of collisions is recorded per segment.
and several images in Figure 9. We define three routes and
We measure the inference time of the driving policy per
call them R1, R2 and R3. R1 consists of 5 bi-directional
frame to compute the average FPS. All measurements are
segments with a total of 20 intersections: 8 left turns, 8 right
averaged across the six segments to obtain the results for one
turns, and 4 straights. One data unit corresponds to about
trial. We report the mean and standard deviation across three
8 minutes or 12,000 frames. R2 and R3 both consist of 2
trials for all metrics. All results in the paper are obtained
bi-directional segments with a total of two left turns, and
using this evaluation route.
two right turns at a T-junction. One data unit corresponds to
about 70 seconds or 1,750 frames. closed right turn straight #1 open right turn closed left turn straight #2 open left turn
5m 5m 5m 5m
5m 5m
5m 5m
5m 5m 5m 5m
Fig. 8. Training Routes. We collect data on three different routes: R1,

R2 and R3 (from left to right). R1 is composed of 5 bi-directional segments
with a total of 20 intersections: 8 left turns, 8 right turns, and 4 straights. R2 Fig. 10. Evaluation Route 1: T-junction. Our evaluation route consists
and R3 are two different T-junctions each with two bi-directional segments of six segments with a total of two straights, two right turns, and two left
with a total of two right turns, and two left turns. turns. We report mean and standard deviation across three trials.
H. Additional Experiments
For the following experiments, we collect multiple data
units along route R1 in the training environment (Figure 8)
with multiple robots and smartphones. We consider a total
of four datasets; each dataset consists of 12 data units or
approximately 96 minutes of data, half of which is collected
with noise and obstacles. Two datasets are used to investigate
the impact of using different phones and the other two to
Fig. 9. Training Environment. The images depict the environment where investigate the impact of using different bodies.
the training data was collected. Since these policies are trained on more data, we design
a more difficult evaluation route as shown in Figure 11. The
For the experiments in the paper, we collect a dataset with route contains the same type of maneuvers, but across two
the Xiaomi Mi9 which consists of two data units from R1 different intersections and divided into less segments. As a
and six data units from both R2 and R3. Half of the data result, small errors are more likely to accumulate, leading to
on R1 is collected with noise and obstacles and the other unsuccessful segments and a lower average success rate.
without. Half of the data on R2 and R3 is collected with
noise and the other without. The complete dataset contains I. Learning from data collected with multiple smartphones
approximately 45,000 frames corresponding to 30 minutes We investigate whether training on data from multiple
worth of data. phones helps generalization and robustness. We train two
Evaluation Mi9 P30 Lite Pocofone F1 Galaxy Note 10
Training All Mi9 ∆ All Mi9 ∆ All Mi9 ∆ All Mi9 ∆
Distance (%) ↑ 97±5 94±5 3 85±19 80±6 5 79±7 73±1 6 87±11 69±7 18
Success (%) ↑ 92±14 83±14 9 75±25 50±0 25 42±14 42±14 0 67±14 42±14 25
Collisions ↓ 0.0±0.0 0.0±0.0 0.0 1.0±1.0 0.0±0.0 1.0 0.3±0.6 1.3±0.6 -1.0 1.7±0.6 1.3±0.6 0.4
TABLE VIII
AUTONOMOUS NAVIGATION : T RANSFER ACROSS SMARTPHONES . W E REPORT THE MEAN AND STANDARD DEVIATION ACROSS THREE TRIALS .
E ACH TRIAL CONSISTS OF SEVERAL SEGMENTS WITH A TOTAL OF 2 STRAIGHTS , 2 LEFT TURNS , AND 2 RIGHT TURNS .
Evaluation - Route 2
right-left turn closed left turn double straight open right
predictions in real time, we did not include it in our main
experiments, which were focused on the impact of camera
7m 7m
sensor and manufacturer.
5m 5m
J. Learning from data collected with multiple robot bodies
We also investigate whether training on data from multiple
6m 6m
5m robot bodies helps generalization and robustness. One policy
is trained on data collected with three different bodies and
5m
another with the same amount of data from a single body;
7m 7m we keep the smartphone fixed for this set of experiments.
We evaluate both policies on the common training body, B1,
which was used during data collection. We also evaluate on
a held-out test body, B4.
Fig. 11. Evaluation Route 2: Double T-junction. Our evaluation route
consists of four segments with a total of two straights, two right turns, Evaluation Body 1 Body 4
and two left turns across two intersections. We report mean and standard Training B1-B3 B1 ∆ B1-B3 B1 ∆
deviation across three trials.
Distance (%) ↑ 97±5 94±5 3 94±5 92±8 2
Success (%) ↑ 92±14 83±14 9 83±14 75±25 8
Collisions ↓ 0.0±0.0 0.0±0.0 0.0 0.0±0.0 0.7±0.6 -0.7
identical driving policies, one on data acquired with six
different phones (Table VII, top) and another with the same TABLE IX
amount of data from only one phone, the Xiaomi Mi9; we AUTONOMOUS NAVIGATION : T RANSFER ACROSS ROBOT BODIES . W E
REPORT THE MEAN AND STANDARD DEVIATION ACROSS THREE TRIALS .
keep the robot body the same for this set of experiments. We
evaluate both policies on the common training phone, the E ACH TRIAL CONSISTS OF SEVERAL SEGMENTS WITH A TOTAL OF 2
STRAIGHTS , 2 LEFT TURNS , AND 2 RIGHT TURNS .
Mi9. We also evaluate both driving policies on three held-
out test phones that were not used for data collection and
differ in terms of camera sensor and manufacturer (Table VII, The results are summarized in Table IX. We find that
bottom). The P30 Lite has the same camera sensor as the the driving policy that was trained on multiple robot bodies
Mi9, but is from a different manufacturer. The Pocofone performs better, especially in terms of success rate, where
F1 has a different camera sensor, but is from the same small mistakes can lead to failure. The policy that was
manufacturer. The Galaxy Note 10 differs in both aspects, trained on a single body sways from side to side and even
manufacturer and camera sensor. collides with the environment when deployed on the test
The results are summarized in Table VIII. We find that body. The actuation of the bodies is noisy due to the cheap
the driving policy trained on data from multiple phones components. Every body responds slightly differently to the
consistently outperforms the driving policy trained on data control signals. Most bodies have a bias to veer to the left or
from a single phone. This effect becomes more noticeable to the right due to imprecision in the assembly or the low-
when deploying the policy on phones from different manu- level controls. The policy trained on multiple bodies learns
facturers and with different camera sensors. However, driving to be robust to these factors of variability, exhibiting stable
behaviour is sometimes more abrupt which is reflected by learned behavior both on the training bodies and on the held-
the higher number of collisions. This is probably due to the out test body.
different field-of-views and positions of the camera sensors Despite the learned robustness, the control policy is still
making learning more difficult. We expect that this will be somewhat vehicle-specific, e.g. the differential drive setup
overcome with more training data. and general actuation model of the motors. An alternative
We also performed some experiments using the low-end would be predicting a desired trajectory instead and using a
Nokia 2.2 phone, which costs about $100. It is able to run low-level controller to produce vehicle-specific actions. This
our autonomous navigation network at 10 frames per second. can further ease the learning process and lead to more general
Qualitatively, the driving performance is similar to the other driving policies.
phones we evaluated. However, since it was unable to make
R EFERENCES [25] xCraft, “Phonedrone ethos - a whole new dimension for your smart-
phone,” https://www.kickstarter.com/projects/137596013/phonedrone-
[1] N. Kau, A. Schultz, N. Ferrante, and P. Slade, “Stanford doggo: An ethos-a-whole-new-dimension-for-your-sm, 2015, accessed: 2020-06-
open-source, quasi-direct-drive quadruped,” in ICRA, 2019. 20.
[2] F. Grimminger, A. Meduri, M. Khadiv, J. Viereck, M. Wüthrich, [26] GCtronic, “Wheelphone,” http://www.wheelphone.com, 2013, ac-
M. Naveau, V. Berenz, S. Heim, F. Widmaier, J. Fiene, et al., “An open cessed: 2020-06-20.
torque-controlled modular robot architecture for legged locomotion [27] J. Yim, S. Chun, K. Jung, and C. D. Shaw, “Development of
research,” arXiv:1910.00093, 2019. communication model for social robots based on mobile service,” in
[3] B. Yang, J. Zhang, V. Pong, S. Levine, and D. Jayaraman, “Replab: A International Conference on Social Computing, 2010.
reproducible low-cost arm benchmark platform for robotic learning,” [28] A. Setapen, “Creating robotic characters for long-term interaction,”
arXiv:1905.07447, 2019. Ph.D. dissertation, MIT, 2012.
[29] Y. Cao, Z. Xu, F. Li, W. Zhong, K. Huo, and K. Ramani, “V. ra:
[4] A. Gupta, A. Murali, D. P. Gandhi, and L. Pinto, “Robot learning
An in-situ visual authoring system for robot-iot task planning with
in homes: Improving generalization and reducing dataset bias,” in
augmented reality,” in DIS, 2019.
NeurIPS, 2018.
[30] N. Oros and J. L. Krichmar, “Smartphone based robotics: Powerful,
[5] D. V. Gealy, S. McKinley, B. Yi, P. Wu, P. R. Downey, G. Balke,
flexible and inexpensive robots for hobbyists, educators, students and
A. Zhao, M. Guo, R. Thomasson, A. Sinclair, et al., “Quasi-direct
researchers,” IEEE Robotics & Automation Magazine, vol. 1, p. 3,
drive for low-cost compliant robotic manipulation,” in ICRA, 2019.
2013.
[6] B. Balaji, S. Mallya, S. Genc, S. Gupta, L. Dirac, V. Khare, G. Roy, [31] Tensorflow Object Detection Android Application, “https:
T. Sun, Y. Tao, B. Townsend, et al., “Deepracer: Educational au- //github.com/tensorflow/examples/tree/master/lite/examples/
tonomous racing platform for experimentation with sim2real reinforce- object detection/android,” accessed: 2020-06-20.
ment learning,” arXiv:1911.01562, 2019. [32] S. Karaman, A. Anders, M. Boulet, J. Connor, K. Gregson, W. Guerra,
[7] DJI Robomaster S1, “https://www.dji.com/robomaster-s1,” accessed: O. Guldner, M. Mohamoud, B. Plancher, R. Shin, et al., “Project-
2020-06-20. based, collaborative, algorithmic robotics for high school students:
[8] Nvidia JetBot, “https://github.com/nvidia-ai-iot/jetbot,” accessed: Programming self-driving race cars at mit,” in ISEC, 2017.
2020-06-20. [33] W. Roscoe, “An opensource diy self driving platform for small scale
[9] L. Paull, J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, cars,” https://www.donkeycar.com, accessed: 2020-06-20.
Y. F. Chen, C. Choi, J. Dusek, Y. Fang, et al., “Duckietown: an [34] M. Dekan, F. Duchoň, L. Jurišica, A. Vitko, and A. Babinec, “irobot
open, inexpensive and flexible platform for autonomy education and create used in education,” Journal of Mechanics Engineering and
research,” in ICRA, 2017. Automation, vol. 3, no. 4, pp. 197–202, 2013.
[10] A. Ignatov, R. Timofte, A. Kulik, S. Yang, K. Wang, F. Baum, M. Wu, [35] M. Rubenstein, B. Cimino, R. Nagpal, and J. Werfel, “Aerobot: An
L. Xu, and L. Van Gool, “AI benchmark: All about deep learning on affordable one-robot-per-student system for early robotics education,”
smartphones in 2019,” in ICCV Workshops, 2019. in ICRA, 2015.
[11] M. Rubenstein, C. Ahler, and R. Nagpal, “Kilobot: A low cost scalable [36] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,
robot system for collective behaviors,” in ICRA, 2012. P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End
[12] J. McLurkin, A. McMullen, N. Robbins, G. Habibi, A. Becker, to end learning for self-driving cars,” arXiv:1604.07316, 2016.
A. Chou, H. Li, M. John, N. Okeke, J. Rykowski, et al., “A robot [37] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
system design for low-cost multi-robot manipulation,” in IROS, 2014. T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Effi-
[13] S. Wilson, R. Gameros, M. Sheely, M. Lin, K. Dover, R. Gevorkyan, cient convolutional neural networks for mobile vision applications,”
M. Haberland, A. Bertozzi, and S. Berman, “Pheeno, a versatile arXiv:1704.04861, 2017.
swarm robotic research and education platform,” IEEE Robotics and [38] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan,
Automation Letters, vol. 1, no. 2, pp. 884–891, 2016. W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al., “Searching for
[14] J. Gonzales, F. Zhang, K. Li, and F. Borrelli, “Autonomous drifting mobilenetv3,” in ICCV, 2019.
with onboard sensors,” in Advanced Vehicle Control, 2016. [39] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
[15] F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
“End-to-end driving via conditional imitation learning,” in ICRA, 2018. context,” in ECCV, 2014.
[16] S. S. Srinivasa, P. Lancaster, J. Michalove, M. Schmittle, C. S. M. [40] F. Codevilla, A. M. Lopez, V. Koltun, and A. Dosovitskiy, “On offline
Rockett, J. R. Smith, S. Choudhury, C. Mavrogiannis, and F. Sadeghi, evaluation of vision-based driving models,” in ECCV, 2018.
“Mushr: A low-cost, open-source robotic racecar for education and
research,” arXiv:1908.08031, 2019.
[17] M. O’Kelly, V. Sukhil, H. Abbas, J. Harkins, C. Kao, Y. V. Pant,
R. Mangharam, D. Agarwal, M. Behl, P. Burgio, et al., “F1/10: An
open-source autonomous cyber-physical platform,” arXiv:1901.08567,
2019.
[18] B. Goldfain, P. Drews, C. You, M. Barulic, O. Velev, P. Tsiotras, and
J. M. Rehg, “Autorally: An open platform for aggressive autonomous
driving,” IEEE Control Systems Magazine, vol. 39, pp. 26–55, 2019.
[19] F. Riedo, M. Chevalier, S. Magnenat, and F. Mondada, “Thymio ii, a
robot that grows wiser with children,” in IEEE Workshop on Advanced
Robotics and its Social Impacts, 2013.
[20] J. Lee, N. Chirkov, E. Ignasheva, Y. Pisarchyk, M. Shieh, F. Riccardi,
R. Sarokin, A. Kulik, and M. Grundmann, “On-device neural net
inference with mobile GPUs,” arXiv:1907.01989, 2019.
[21] S. Owais, “Turn your phone into a robot,” https:
//www.instructables.com/id/Turn-Your-Phone-into-a-Robot/, 2015,
accessed: 2020-06-20.
[22] M. Rovai, “Hacking a rc car to control it using an android de-
vice,” https://www.hackster.io/mjrobot/hacking-a-rc-car-to-control-it-
using-an-android-device-7d5b9a, 2016, accessed: 2020-06-20.
[23] C. Delaunay, “Botiful, social telepresence robot for android,”
https://www.kickstarter.com/projects/1452620607/botiful-
telepresence-robot-for-android, 2012, accessed: 2020-06-20.
[24] Romotive, “Romo - the smartphone robot for everyone,”
https://www.kickstarter.com/projects/peterseid/romo-the-smartphone-
robot-for-everyone, 2012, accessed: 2020-06-20.

Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun

Uploaded by

Copyright:

Available Formats

Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun

Uploaded by

Copyright:

Available Formats

OpenBot: Turning Smartphones into Robots

Matthias Müller1 and Vladlen Koltun1

Abstract— Current robots are either expensive or make

design a small electric vehicle that costs $50 and serves as

Fig. 2. Mechanical design. CAD design of the 3D-printed robot body.

Mechanical design. The chassis of the robot is 3D-printed

Electrical design. Figure 3 shows the electrical design. We

B. Evaluation protocol Distance ↑ Success ↑ Collisions ↓ FPS ↑

Distance ↑ Success ↑ Collisions ↓ FPS ↑ Params ↓

four right turns. We average results across three trials for

vector a. We concatenate the command c with the hidden D. Training Process

Fig. 8. Training Routes. We collect data on three different routes: R1,

You might also like