Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun
Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun
Openbot: Turning Smartphones Into Robots: Matthias M Uller and Vladlen Koltun
Android Arduino troller to act as a bridge between the vehicle body and the
smartphone. Its main task is to handle the low-level control
Interface with game
controller (e.g. PS4/Xbox)
Data Logger
➔ images
PWM control of motors of the vehicle and provide readings from vehicle-mounted
➔ vehicle control
➔ indicators
➔
➔
IMU, GPS, etc.
control, indicator Operate indicator signals
sensors. The program components are shown on the right
➔ vehicle modes
in Figure 4. The Arduino receives the vehicle controls and
Graphical user interface Run neural network Measure wheel speed indicator signals via the serial connection. It converts the
controls to PWM signals for the motor controller and toggles
Audio feedback for user Process network output Monitor battery voltage
the LEDs according to the indicator signal. The Arduino
State estimation Serial communication Serial communication also keeps track of the wheel rotations by counting the
interrupts at the optical sensors on the left and right front
wheels. The sonar sensor transmits and receives a sound
Fig. 4. Software design. Our Android application is responsible for high-
level computation on the smartphone and the Arduino program provides the
wave and the duration is measured and used to estimate the
low-level interface to the vehicle. free space in front of the vehicle. The battery voltage is
calculated by a scaled moving average of measurements at
the voltage divider circuit. All measurements are sent back
modes, logging, running a neural network, etc. We derive our
to the Android application through the serial link.
graphical user interface from the Android Tensorflow Object
Detection application [31] and extend it. Our GUI provides C. Comparison to Other Wheeled Robots
the camera feed and buttons to toggle data logging, control We compare to existing robot platforms in Table II. In
modes, and serial communication. It also allows switching contrast to other robots, our platform has an abundance
between different neural networks to control the vehicle of processing power, communication interfaces, and sensors
and provides relevant information such as image resolution, provided by the smartphone. Existing robots often rely on
inference time and predicted controls. We also integrate voice custom software ecosystems, which require dedicated lab
feedback for operation via the game controller. personnel who maintain the code, implement new features,
The Android ecosystem provides a unified interface to and implement drivers for new sensors. In contrast, we use
obtain sensor data from any Android smartphone. We build Android, one of the largest constantly evolving software
a data logger on top of that in order to collect datasets with ecosystems. All the low-level software for sensor integration
the robots. Currently, we record readings from the following and processing already exists and improves without any
sensors: camera, gyroscope, accelerometer, magnetometer, additional effort by the robotics community. All sensors are
ambient light sensor, and barometer. Using the Android API, already synchronized on the same clock, obviating what is
we are able to obtain the following sensor readings: RGB now a major challenge for many existing robots.
images, angular speed, linear acceleration, gravity, magnetic
IV. E VALUATION
field strength, light intensity, atmospheric pressure, latitude,
longitude, altitude, bearing, and speed. In addition to the To demonstrate that smartphones are suitable to pro-
phone sensors, we also record body sensor readings (wheel vide sensing, communication, and compute for interesting
odometry, free space estimate and battery voltage), which are robotics applications, we evaluate the presented platform on
transmitted via the serial link. two applications, person following and autonomous naviga-
We leverage the computational power of the smartphone tion. The experimental setups are described in Section IX and
to process the sensory input and compute the robots’ actions in Section X respectively. We conduct the evaluation using
in real time. While there are many classic motion planning a variety of popular smartphones from the past two years
algorithms, we focus on learning-based approaches, which with prices ranging from $120 to $750. The smartphones are
allow for a unified interface. In particular, we rely on the carefully selected to cover different manufactures, chipsets,
Tensorflow Lite infrastructure, which integrates seamlessly and sensor suites. Detailed specifications and benchmark
with smartphones [10], [20]. Our Android application fea- scores of the smartphones are provided in Section VIII.
tures model definitions for object detection and autonomous We discuss our general evaluation setup and procedure that
navigation. These define the input and output properties of ensures a fair comparison in the following and report the
the neural network. We build on top of the Tensorflow Object results in Section V.
Detection application [31] to detect people and perform A. Evaluation metrics
visual servoing to follow them. We also integrate a model In order to streamline our evaluation while providing a
for autonomous navigation inspired by Conditional Imitation comprehensive performance summary, we use three metrics:
Learning [15]. The deployment process is simple. After distance, success rate, and collisions. The distance is con-
training a model in Tensorflow, it is converted to a Tensorflow tinuous and we report it as a percentage of the complete
Lite model that can be directly deployed on the smartphone. trajectory. The distance measurement stops if an intersection
Both neural networks only rely on the camera feed for is missed, a collision occurs or the goal is reached. The
their predictions. Neither wheel odometry nor sonar sensor success rate is binary and indicates whether or not the goal
readings are required to produce the raw vehicle controls. was reached. We also count the number of collisions. All
Arduino program. We use an Arduino Nano microcon- results are averaged across three trials.
Setup Time [h]
Battery [min]
Speed [m/s]
Microphone
Weight [kg]
Retail Cost
Ecosystem
3G/4G/5G
Odometry
Size [cm]
Actuation
Bluetooth
Compute
Speaker
Camera
Display
LiDAR
Sonar
WiFi
IMU
GPS
Platform
AutoRally [18] $10,000 100 100x60x40 22 25 20+ BLDC+Servo 3 3 7 7 3 3 3 7 7 7 7 7 Mini-ITX PC ROS
F1/10 [17] $3600 3 55x30x20 4.5 18 20+ BLDC+Servo 3 3 3 7 3 7 3 3 7 7 7 7 Jetson TX2 ROS
RACECAR [32] $3400 10 55x30x20 4.5 18 20+ BLDC+Servo 3 3 3 7 3 7 3 3 7 7 7 7 Jetson TX1 ROS
BARC [14] $1000 3 54x28x21 3.2 - 20+ BLDC+Servo 3 3 7 7 3 7 3 7 7 7 7 7 Odroid XU-4 ROS
MuSHR [16] $900 3 44x28x14 3 11 20+ BLDC+Servo 3 3 3 7 3 7 3 3 7 7 7 7 Jetson Nano ROS
DeepRacer [6] $400 0.25 - - 6 15+ BDC+Servo 7 3 7 7 3 7 3 7 7 7 7 7 Intel Atom Custom
DonkeyCar [33] $250 2 25x22x12 1.5 9 15+ BDC+Servo 7 3 7 7 7 7 7 7 7 7 7 7 Raspberry Pi Custom
Duckiebot [9] $280 0.5 - - - - 2xBDC 7 3 7 7 7 7 3 7 7 7 7 7 Raspberry Pi Custom
Pheeno [13] $270 - 13x11 - 0.42 300+ 2xBDC 3 7 7 7 3 7 3 3 7 7 7 7 ARM Cortex-A7 Custom
JetBot [8] $250 1 20x13x13 - - - 2xBDC 7 3 7 7 7 7 3 7 7 7 7 7 Nvidia Jetson Custom
Create-2 [34] $200 - 34x34x9 3.6 - - 2xBDC 7 7 7 7 7 7 7 7 7 7 7 7 7 Custom
Thymio II [19] $170 - 11x11x5 0.46 0.14 - 2xBDC 7 7 7 7 3 7 7 7 7 3 3 7 Microcontroller Custom
AERobot [35] $20 0.1 3x3x3 0.03 - - 2xVibration 7 7 7 7 7 7 7 7 7 7 7 7 Microcontroller Custom
OpenBot $50? 0.25 24x15x12 0.7 1.5 45+ 4xBDC 3 3 7 3 3 3 3 3 3 3 3 3 Smartphone Android
TABLE II
ROBOTS . C OMPARISON OF WHEELED ROBOTIC PLATFORMS . T OP : ROBOTS BASED ON RC TRUCKS . B OTTOM : NAVIGATION ROBOTS FOR
DEPLOYMENT AT SCALE AND IN EDUCATION . ”–” INDICATES THAT NO INFORMATION IS AVAILABLE . *T HE COST OF THE SMARTPHONE IS NOT
INCLUDED AND VARIES .
TABLE VII
S MARTPHONES . S PECIFICATIONS OF THE SMARTPHONES USED IN OUR EXPERIMENTS . W E REPORT THE OVERALL , GRAPHICS , AND AI
PERFORMANCE ACCORDING TO STANDARD BENCHMARKS . T OP : SIX SMARTPHONES USED TO COLLECT TRAINING DATA . B OTTOM : SMARTPHONES
USED TO TEST CROSS - PHONE GENERALIZATION . ”–” INDICATES THAT THE SCORE IS NOT AVAILABLE .
5m 5m 5m 5m
5m 5m
5m 5m
5m 5m 5m 5m
H. Additional Experiments
For the following experiments, we collect multiple data
units along route R1 in the training environment (Figure 8)
with multiple robots and smartphones. We consider a total
of four datasets; each dataset consists of 12 data units or
approximately 96 minutes of data, half of which is collected
with noise and obstacles. Two datasets are used to investigate
the impact of using different phones and the other two to
Fig. 9. Training Environment. The images depict the environment where investigate the impact of using different bodies.
the training data was collected. Since these policies are trained on more data, we design
a more difficult evaluation route as shown in Figure 11. The
For the experiments in the paper, we collect a dataset with route contains the same type of maneuvers, but across two
the Xiaomi Mi9 which consists of two data units from R1 different intersections and divided into less segments. As a
and six data units from both R2 and R3. Half of the data result, small errors are more likely to accumulate, leading to
on R1 is collected with noise and obstacles and the other unsuccessful segments and a lower average success rate.
without. Half of the data on R2 and R3 is collected with
noise and the other without. The complete dataset contains I. Learning from data collected with multiple smartphones
approximately 45,000 frames corresponding to 30 minutes We investigate whether training on data from multiple
worth of data. phones helps generalization and robustness. We train two
Evaluation Mi9 P30 Lite Pocofone F1 Galaxy Note 10
Training All Mi9 ∆ All Mi9 ∆ All Mi9 ∆ All Mi9 ∆
Distance (%) ↑ 97±5 94±5 3 85±19 80±6 5 79±7 73±1 6 87±11 69±7 18
Success (%) ↑ 92±14 83±14 9 75±25 50±0 25 42±14 42±14 0 67±14 42±14 25
Collisions ↓ 0.0±0.0 0.0±0.0 0.0 1.0±1.0 0.0±0.0 1.0 0.3±0.6 1.3±0.6 -1.0 1.7±0.6 1.3±0.6 0.4
TABLE VIII
AUTONOMOUS NAVIGATION : T RANSFER ACROSS SMARTPHONES . W E REPORT THE MEAN AND STANDARD DEVIATION ACROSS THREE TRIALS .
E ACH TRIAL CONSISTS OF SEVERAL SEGMENTS WITH A TOTAL OF 2 STRAIGHTS , 2 LEFT TURNS , AND 2 RIGHT TURNS .
Evaluation - Route 2
right-left turn closed left turn double straight open right
predictions in real time, we did not include it in our main
experiments, which were focused on the impact of camera
7m 7m
sensor and manufacturer.
5m 5m
J. Learning from data collected with multiple robot bodies
We also investigate whether training on data from multiple
6m 6m
5m robot bodies helps generalization and robustness. One policy
is trained on data collected with three different bodies and
5m
another with the same amount of data from a single body;
7m 7m we keep the smartphone fixed for this set of experiments.
We evaluate both policies on the common training body, B1,
which was used during data collection. We also evaluate on
a held-out test body, B4.
Fig. 11. Evaluation Route 2: Double T-junction. Our evaluation route
consists of four segments with a total of two straights, two right turns, Evaluation Body 1 Body 4
and two left turns across two intersections. We report mean and standard Training B1-B3 B1 ∆ B1-B3 B1 ∆
deviation across three trials.
Distance (%) ↑ 97±5 94±5 3 94±5 92±8 2
Success (%) ↑ 92±14 83±14 9 83±14 75±25 8
Collisions ↓ 0.0±0.0 0.0±0.0 0.0 0.0±0.0 0.7±0.6 -0.7
identical driving policies, one on data acquired with six
different phones (Table VII, top) and another with the same TABLE IX
amount of data from only one phone, the Xiaomi Mi9; we AUTONOMOUS NAVIGATION : T RANSFER ACROSS ROBOT BODIES . W E
REPORT THE MEAN AND STANDARD DEVIATION ACROSS THREE TRIALS .
keep the robot body the same for this set of experiments. We
evaluate both policies on the common training phone, the E ACH TRIAL CONSISTS OF SEVERAL SEGMENTS WITH A TOTAL OF 2
STRAIGHTS , 2 LEFT TURNS , AND 2 RIGHT TURNS .
Mi9. We also evaluate both driving policies on three held-
out test phones that were not used for data collection and
differ in terms of camera sensor and manufacturer (Table VII, The results are summarized in Table IX. We find that
bottom). The P30 Lite has the same camera sensor as the the driving policy that was trained on multiple robot bodies
Mi9, but is from a different manufacturer. The Pocofone performs better, especially in terms of success rate, where
F1 has a different camera sensor, but is from the same small mistakes can lead to failure. The policy that was
manufacturer. The Galaxy Note 10 differs in both aspects, trained on a single body sways from side to side and even
manufacturer and camera sensor. collides with the environment when deployed on the test
The results are summarized in Table VIII. We find that body. The actuation of the bodies is noisy due to the cheap
the driving policy trained on data from multiple phones components. Every body responds slightly differently to the
consistently outperforms the driving policy trained on data control signals. Most bodies have a bias to veer to the left or
from a single phone. This effect becomes more noticeable to the right due to imprecision in the assembly or the low-
when deploying the policy on phones from different manu- level controls. The policy trained on multiple bodies learns
facturers and with different camera sensors. However, driving to be robust to these factors of variability, exhibiting stable
behaviour is sometimes more abrupt which is reflected by learned behavior both on the training bodies and on the held-
the higher number of collisions. This is probably due to the out test body.
different field-of-views and positions of the camera sensors Despite the learned robustness, the control policy is still
making learning more difficult. We expect that this will be somewhat vehicle-specific, e.g. the differential drive setup
overcome with more training data. and general actuation model of the motors. An alternative
We also performed some experiments using the low-end would be predicting a desired trajectory instead and using a
Nokia 2.2 phone, which costs about $100. It is able to run low-level controller to produce vehicle-specific actions. This
our autonomous navigation network at 10 frames per second. can further ease the learning process and lead to more general
Qualitatively, the driving performance is similar to the other driving policies.
phones we evaluated. However, since it was unable to make
R EFERENCES [25] xCraft, “Phonedrone ethos - a whole new dimension for your smart-
phone,” https://www.kickstarter.com/projects/137596013/phonedrone-
[1] N. Kau, A. Schultz, N. Ferrante, and P. Slade, “Stanford doggo: An ethos-a-whole-new-dimension-for-your-sm, 2015, accessed: 2020-06-
open-source, quasi-direct-drive quadruped,” in ICRA, 2019. 20.
[2] F. Grimminger, A. Meduri, M. Khadiv, J. Viereck, M. Wüthrich, [26] GCtronic, “Wheelphone,” http://www.wheelphone.com, 2013, ac-
M. Naveau, V. Berenz, S. Heim, F. Widmaier, J. Fiene, et al., “An open cessed: 2020-06-20.
torque-controlled modular robot architecture for legged locomotion [27] J. Yim, S. Chun, K. Jung, and C. D. Shaw, “Development of
research,” arXiv:1910.00093, 2019. communication model for social robots based on mobile service,” in
[3] B. Yang, J. Zhang, V. Pong, S. Levine, and D. Jayaraman, “Replab: A International Conference on Social Computing, 2010.
reproducible low-cost arm benchmark platform for robotic learning,” [28] A. Setapen, “Creating robotic characters for long-term interaction,”
arXiv:1905.07447, 2019. Ph.D. dissertation, MIT, 2012.
[29] Y. Cao, Z. Xu, F. Li, W. Zhong, K. Huo, and K. Ramani, “V. ra:
[4] A. Gupta, A. Murali, D. P. Gandhi, and L. Pinto, “Robot learning
An in-situ visual authoring system for robot-iot task planning with
in homes: Improving generalization and reducing dataset bias,” in
augmented reality,” in DIS, 2019.
NeurIPS, 2018.
[30] N. Oros and J. L. Krichmar, “Smartphone based robotics: Powerful,
[5] D. V. Gealy, S. McKinley, B. Yi, P. Wu, P. R. Downey, G. Balke,
flexible and inexpensive robots for hobbyists, educators, students and
A. Zhao, M. Guo, R. Thomasson, A. Sinclair, et al., “Quasi-direct
researchers,” IEEE Robotics & Automation Magazine, vol. 1, p. 3,
drive for low-cost compliant robotic manipulation,” in ICRA, 2019.
2013.
[6] B. Balaji, S. Mallya, S. Genc, S. Gupta, L. Dirac, V. Khare, G. Roy, [31] Tensorflow Object Detection Android Application, “https:
T. Sun, Y. Tao, B. Townsend, et al., “Deepracer: Educational au- //github.com/tensorflow/examples/tree/master/lite/examples/
tonomous racing platform for experimentation with sim2real reinforce- object detection/android,” accessed: 2020-06-20.
ment learning,” arXiv:1911.01562, 2019. [32] S. Karaman, A. Anders, M. Boulet, J. Connor, K. Gregson, W. Guerra,
[7] DJI Robomaster S1, “https://www.dji.com/robomaster-s1,” accessed: O. Guldner, M. Mohamoud, B. Plancher, R. Shin, et al., “Project-
2020-06-20. based, collaborative, algorithmic robotics for high school students:
[8] Nvidia JetBot, “https://github.com/nvidia-ai-iot/jetbot,” accessed: Programming self-driving race cars at mit,” in ISEC, 2017.
2020-06-20. [33] W. Roscoe, “An opensource diy self driving platform for small scale
[9] L. Paull, J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, cars,” https://www.donkeycar.com, accessed: 2020-06-20.
Y. F. Chen, C. Choi, J. Dusek, Y. Fang, et al., “Duckietown: an [34] M. Dekan, F. Duchoň, L. Jurišica, A. Vitko, and A. Babinec, “irobot
open, inexpensive and flexible platform for autonomy education and create used in education,” Journal of Mechanics Engineering and
research,” in ICRA, 2017. Automation, vol. 3, no. 4, pp. 197–202, 2013.
[10] A. Ignatov, R. Timofte, A. Kulik, S. Yang, K. Wang, F. Baum, M. Wu, [35] M. Rubenstein, B. Cimino, R. Nagpal, and J. Werfel, “Aerobot: An
L. Xu, and L. Van Gool, “AI benchmark: All about deep learning on affordable one-robot-per-student system for early robotics education,”
smartphones in 2019,” in ICCV Workshops, 2019. in ICRA, 2015.
[11] M. Rubenstein, C. Ahler, and R. Nagpal, “Kilobot: A low cost scalable [36] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,
robot system for collective behaviors,” in ICRA, 2012. P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End
[12] J. McLurkin, A. McMullen, N. Robbins, G. Habibi, A. Becker, to end learning for self-driving cars,” arXiv:1604.07316, 2016.
A. Chou, H. Li, M. John, N. Okeke, J. Rykowski, et al., “A robot [37] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
system design for low-cost multi-robot manipulation,” in IROS, 2014. T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Effi-
[13] S. Wilson, R. Gameros, M. Sheely, M. Lin, K. Dover, R. Gevorkyan, cient convolutional neural networks for mobile vision applications,”
M. Haberland, A. Bertozzi, and S. Berman, “Pheeno, a versatile arXiv:1704.04861, 2017.
swarm robotic research and education platform,” IEEE Robotics and [38] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan,
Automation Letters, vol. 1, no. 2, pp. 884–891, 2016. W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al., “Searching for
[14] J. Gonzales, F. Zhang, K. Li, and F. Borrelli, “Autonomous drifting mobilenetv3,” in ICCV, 2019.
with onboard sensors,” in Advanced Vehicle Control, 2016. [39] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
[15] F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
“End-to-end driving via conditional imitation learning,” in ICRA, 2018. context,” in ECCV, 2014.
[16] S. S. Srinivasa, P. Lancaster, J. Michalove, M. Schmittle, C. S. M. [40] F. Codevilla, A. M. Lopez, V. Koltun, and A. Dosovitskiy, “On offline
Rockett, J. R. Smith, S. Choudhury, C. Mavrogiannis, and F. Sadeghi, evaluation of vision-based driving models,” in ECCV, 2018.
“Mushr: A low-cost, open-source robotic racecar for education and
research,” arXiv:1908.08031, 2019.
[17] M. O’Kelly, V. Sukhil, H. Abbas, J. Harkins, C. Kao, Y. V. Pant,
R. Mangharam, D. Agarwal, M. Behl, P. Burgio, et al., “F1/10: An
open-source autonomous cyber-physical platform,” arXiv:1901.08567,
2019.
[18] B. Goldfain, P. Drews, C. You, M. Barulic, O. Velev, P. Tsiotras, and
J. M. Rehg, “Autorally: An open platform for aggressive autonomous
driving,” IEEE Control Systems Magazine, vol. 39, pp. 26–55, 2019.
[19] F. Riedo, M. Chevalier, S. Magnenat, and F. Mondada, “Thymio ii, a
robot that grows wiser with children,” in IEEE Workshop on Advanced
Robotics and its Social Impacts, 2013.
[20] J. Lee, N. Chirkov, E. Ignasheva, Y. Pisarchyk, M. Shieh, F. Riccardi,
R. Sarokin, A. Kulik, and M. Grundmann, “On-device neural net
inference with mobile GPUs,” arXiv:1907.01989, 2019.
[21] S. Owais, “Turn your phone into a robot,” https:
//www.instructables.com/id/Turn-Your-Phone-into-a-Robot/, 2015,
accessed: 2020-06-20.
[22] M. Rovai, “Hacking a rc car to control it using an android de-
vice,” https://www.hackster.io/mjrobot/hacking-a-rc-car-to-control-it-
using-an-android-device-7d5b9a, 2016, accessed: 2020-06-20.
[23] C. Delaunay, “Botiful, social telepresence robot for android,”
https://www.kickstarter.com/projects/1452620607/botiful-
telepresence-robot-for-android, 2012, accessed: 2020-06-20.
[24] Romotive, “Romo - the smartphone robot for everyone,”
https://www.kickstarter.com/projects/peterseid/romo-the-smartphone-
robot-for-everyone, 2012, accessed: 2020-06-20.