Robotic Computing On FPGAs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Robotic Computing on FPGAs: Current Progress,

Research Challenges, and Opportunities


Zishen Wan1 , Ashwin Lele1 , Bo Yu2 , Shaoshan Liu2 , Yu Wang3 ,
Vijay Janapa Reddi4 , Cong Hao1 , and Arijit Raychowdhury1
1 School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
2
PerceptIn, Fremont, CA, USA
3
Department of Electronic Engineering, Tsinghua University, Beijing, China
4
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
{zishenwan, alele9, callie.hao}@gatech.edu, [email protected]
arXiv:2205.07149v1 [cs.RO] 14 May 2022

{bo.yu, shaoshan.liu}@perceptin.io, [email protected], [email protected]

Abstract—Robotic computing has reached a tipping point, datapath, making it capable of meeting real-time requirements
with a myriad of robots (e.g., drones, self-driving cars, logistic with high energy efficiency compared to CPUs and GPUs.
robots) being widely applied in diverse scenarios. The continuous Second, FPGA can adaptively generate custom architectures
proliferation of robotics, however, critically depends on efficient
computing substrates, driven by real-time requirements, robotic and update with the fast-evolving of robotic algorithms without
size-weight-and-power constraints, cybersecurity considerations, going through re-fabrication as ASIC [8]. Third, FPGA is
and dynamically changing scenarios. Within all platforms, FPGA flexible in dealing with highly diverse robotic workloads,
is able to deliver both software and hardware solutions with especially with partial reconfiguration allowing modification
low power, high performance, reconfigurability, reliability, and part of the operating board. Fourth, FPGA provides reliable
adaptivity characteristics, serving as the promising computing
substrate for robotic applications. This paper highlights the cur- design by leveraging reconfiguration to patch flows, compared
rent progress, design techniques, challenges, and open research to potential vulnerabilities detected in fixed architectures [9],
challenges in the domain of robotic computing on FPGAs. which is especially essential in safety-critical scenarios [10].
Overall, FPGA has the potential to deliver high-performance,
I. I NTRODUCTION low-power, reconfigurable, adaptive, and secure features in
Robotic computing is on the rise. A myriad of robots robotic computing, and is booming in autonomous applica-
such as drones, legged robots, and self-driving cars are on tions. However, several challenges, such as tedious develop-
the verge of becoming an integral part of our life [1], [2]. ment procedures, inefficient system support, and huge design
Robotics is typically an art of system integration both in space, remain in the FPGA-based robotic computing and
software and hardware (Fig. 1). The continuous proliferation impede the way ahead.
of robots, however, face computing challenges, raised from In this paper, we will discuss the current progress, chal-
the higher performance requirements, resource constraints, lenges, and opportunities for FPGA-based robotic computing.
miniaturization of machine form factors, dynamic operating Section II introduces the cross-layer stack of robotic system.
scenarios, and cybersecurity considerations. Therefore, it is Section III presents current FPGA accelerators and systems
essential to choose a proper computing substrate for robotic for robotic computing, with an emphasis on design techniques.
system that can meet real-time and power requirements and Section IV discusses challenges and opportunities for FPGA-
adapt to changing workloads. based robotic computing, and our view of the road ahead.
CPUs and GPUs are two widely-used computing platforms,
however, their performance and efficiency are still incompe- II. C ROSS -L AYER ROBOTIC C OMPUTING S YSTEMS
tent in real-time computation for complex robots. Take the This section introduces the abstraction layers of the robotic
motion planning task as an example, CPU typically takes a computing stack. We traverse down Fig. 1 to explain robotic-
few seconds to find the collision-free trajectory [3], making specific algorithms and systems building blocks.
it too slow for complex navigation tasks. GPUs can finish
planning tasks in hundreds of milliseconds, still insufficient A. Robotic-Computing Algorithm Layer
for many scenarios while at hundreds of watts cost [4]. Fig. 2 illustrates the representative algorithm building blocks
ASICs are recently developed for specific robotic workloads in robotic computing, including sense-plan-act (perception,
with low power and high performance [5]–[7], but their localization, planning, control) and end-to-end learning.
fixed architecture has difficulty in adapting to rapid-evolving Perception. The goal of perception is to sense the dynamic
robotic algorithms and dynamic scenarios, and is vulnerable surroundings and build a reliable and detailed representation
to cybersecurity threats. based on sensory data (e.g., camera, IMU, GPS, LiDAR). Per-
As an alternative, we believe FPGA is the promising com- ception usually includes feature extraction, stereo vision, ob-
pute substrate for robotic applications. First, FPGA increases ject detection, scene understanding, etc. In feature extraction,
the performance with massive parallelism and deeply pipelined key points are usually detected using FAST feature and ORB
Robots Cross-stack Environment
End-to-End Learning Robots
Robotic
Computing or
Planning &
Perception Localization
Algorithm Control
(Section 2)
End-to-End Feature Extraction Kalman Filtering Path Planning
Perception Localization Planning & Control Learning Functional Stereo Vision Pose Estimation Action Prediction
Blocks: Object Detection Map Generation Obstacle Avoidance
Scene Understanding Object Tracking Feedback Control
Robot Operating System (ROS)
Representative FAST, ORB, ELAS, SLAM, VIO, RRT, PRM, RRT*,
Node Service Node Algorithms: Registration RRT-C, PID
Neural Network
System
(Section 2) Node Topic Node
Publish Subscribe
Fig. 2: Applications and algorithm building blocks in robotic systems.
Execution Run-Time

Sensors Compute Hardware the following cognitive robotic tasks using a single neural
Hardware network model. Maps or separate planning stages are not
(Section 3)
IMU Camera GPS CPU GPU FPGA ASIC required in end-to-end learning. The neural network model
can be trained using reinforcement learning [18] or supervised
Energy- Reliable Reconfigu-
Requirements Real-Time Adaptive Predictable
Efficient & Secure rable learning [19]. The challenges of end-to-end learning include
Challenges Solutions & Research Opportunities (Section 4) alleviating the model simulation-to-reality performance gap,
Dynamic changing workloads 1 Reconfiguring robotic computing at run time
Unoptimized general solutions 2 Modularizing robotic computing kernels design
designing optimal reward functions, and improving model
Diverse hardware components 3 Mapping robotic computing on heterogeneous platforms explainability and robustness, which are actively explored.
Inefficient ROS support 4 Connecting FPGA to ROS ecosystem
Large #algorithms and #hardware 5 Benchmarking robotic computing kernels

Tedious development procedure


6 Automating robotic computing design flow B. Robotic-Computing System Layer
7 Building customized computing with the open-source framework
Inaccurate performance evaluation 8 Integrating robotic computing hardware in a simulation loop Robot Operating System (ROS). ROS is a commonly used
Fig. 1: Cross-layer stack of the robotic computing system, require- operating system to provide tools, libraries, and package man-
ments, research challenges, and open opportunities. agement for robotics development. It is a distributed frame-
work of processes that enables executables to be individually
designed and loosely coupled at runtime. Conceptually, the
descriptor [11]. Compared with all image pixels, operating
peer-to-peer network of ROS processes is called computation
on feature points can improve the robustness and compute
graph. The basic ROS computation graph includes nodes,
efficiency. Stereo vision is to obtain 3D structure information
topics, services, and masters, all of which provide data to the
of the scene through disparity calculation. Local, semi-global,
graph in different ways (Fig. 1). Each ROS node is a process
and global stereo matching algorithms are proposed based on
used to perform a task. ROS nodes communicate with each
operational scenarios [12]. Recently, advances in deep learning
other via topics or services. Topics allow one node to publish
have exposed robotic perception systems to more tasks.
messages that multiple other nodes can subscribe. Services
Localization. The goal of localization is to calculate the allow for creating a one-to-one communication between a
position and orientation of a robot itself in a given frame of service node and a client node. The ROS master is responsible
reference. Knowing the position fundamentally enables robots for storing operating parameters and managing other nodes.
to plan the trajectory and navigate, and knowing the orientation
further helps robots stabilize. Simultaneous localization and III. C URRENT P ROGRESS AND D ESIGN T ECHNIQUES
mapping (SLAM) is a commonly-used algorithm where the This section presents our designs and current progress for
robot simultaneously constructs a map of the environment FPGA-based robotic computing, with an emphasis on the
while localizing itself [13], and one principled mathemat- design traits and techniques.
ical approach to solving SLAM is maximum a posteriori Perception on FPGAs. The perception typically contributes
estimation. The filtering-based approach has recently been significantly to the end-to-end latency of robotic applications.
developed with Multi-State Kalman Filter-based algorithms Take the ORB perception module as an example, it usually
such as MSCKF VIO [14] and OpenVINS [15]. accounts for 50%-80% compute latency of the whole local-
Motion planning and control. The goal of motion planning ization scenario. To alleviate that, [20], [21] accelerate ORB-
is to find the optimal collision-free trajectory from the start based perception on FPGA for both aerial and ground robots.
position to the goal position, which is invoked during a The key design principles are to exploit task-level parallelisms
robot movement to adapt to environmental changes. Motion by frame-multiplexing feature extraction, and customize on-
planning is usually followed by a control module continuously chip memories to suit different types of data reuse. Another
tracking the differences between actual poses and poses on the series of illustrative works is to accelerate stereo matching,
pre-defined trajectory. Sampling-based solutions are widely which is the bottleneck of the stereo vision system. [22]
used for motion planning, such as Probabilistic Roadmap implements local stereo matching algorithms on FPGA with
(PRM) [16], Rapidly-exploring Random Tree (RRT) [17] and characterized hardware-software partitions. [23] proposes a
their variants, which generally contain three steps: roadmap parallel 3D graph cut algorithm for accelerating global stereo
construction, collision detection, and graph search. matching, achieving 166× speedup to CPU. FP-Stereo [24]
End-to-End learning system. End-to-end algorithms en- present streaming architecture and sampling-insensitive dis-
able skill learning directly from sensor input and perform all parity algorithm on FPGA to accelerate semi-global stereo
matching. Recently, the Bayesian approach with generative performant and flexible motion planning design. These tech-
probabilistic models facilitates efficient dense matching. An niques can be generalized to other implementations, serving
appealing example is iELAS [25], a hardware-friendly large- as a guide for future works.
scale stereo algorithm implemented on FPGA. iELAS reforms Multi-robot collaboration on FPGAs. Going beyond
the computational-intensive and irregular triangulation mod- single-robot applications, swarm robotics has been increas-
ules in a regular manner with intelligent points interpolation. ingly deployed in real-life scenarios where a team of robots
Additionally, FPGA has been widely used in accelerating neu- collaboratively finish a task. Multi-robot workload typically
ral networks for robotic perception and end-to-end learning. demonstrates unique compute challenges. Several algorithm
Several techniques, including quantization, loop optimization, kernels may need to process the data at the same time, leading
array partitioning, data reuse, and memory optimization, are to hardware resources conflicts. Therefore, the FPGA accel-
proposed. Interested readers are pointed to [26] for details. erator should support multi-thread and dynamic scheduling.
Localization on FPGAs. The backbone of SLAM is a An intriguing example is INCAME [33], a single-core multi-
complex non-linear optimization problem, bundle adjustment robot exploration framework that supports dynamic multi-task
(BA), which consumes a significant amount of time and power. scheduling with a virtual-instruction-based interrupt method.
π-BA [27] designs a co-observation optimization technique to The perception and control tasks are assigned high priorities,
accelerate BA based on the key inspection that not all 3D while long-term decisions and optimization have low priorities.
points appear on all images in a BA problem. Pisces [28] We envision that the multi-core multi-tasking FPGA acceler-
co-optimizes SLAM power consumption and latency by ex- ator will further improve the performance of the multi-robot
ploiting inherent SLAM sparsity. By orchestrating sparse data, system.
Pisces aligns correlated data and enables direct, parallel, and ROS on FPGAs. ROS-compliant FPGAs have recently
deterministic memory access. Going beyond point solutions, been developed with ROS becoming increasingly common in
Archytas [29] presents a template hardware synthesis solution robotics. ROS-compliant FPGAs must consider four functions:
that automatically generates a SLAM accelerator given the encapsulation of FPGA circuits, interface between ROS soft-
hardware template and algorithm data-flow graph. To make the ware and FPGA hardware, subscribe interface, and publish
design adaptable to various environments, [30] dynamically interface to ROS topic. Typically, large communication latency
optimizes the SLAM accelerator with an offline constructed between ROS components is the bottleneck of offloading com-
lookup table and clock gating without sending new bitstreams puting to FPGAs. [34] reduces the latency by implementing
to FPGA at run time. Typically, SLAM is suitable for unknown publish and subscribe messaging of ROS as hardware circuits,
indoor environments, while Registration is used for known making the direct ROS-FPGA communication possible and
indoor environments and visual-inertial-odometry (VIO) func- efficient. Recently, [35], [36] propose tools and frameworks
tions well for outdoor environments. No single localization to offload and accelerate ROS computational graph on FPGA.
algorithm fits all scenarios. Interestingly, these algorithms However, the ecosystem of ROS on FPGAs is still in its
share fundamental computation kernels amenable to matrix infancy, better interface, automated tools, and whole ROS
blocking. Eudoxus [21] implements SLAM, Registration, and acceleration are to be developed.
VIO on FPGA by accelerating common matrix operations,
with a lightweight runtime scheduler reducing variation. IV. R ESEARCH C HALLENGES AND F UTURE D IRECTIONS
Motion planning and control on FPGAs. Among the mo- This section discusses the research opportunities for FPGA-
tion planning pipeline, the computation of collision detection based robotic computing, and our view for road ahead (Fig. 1).
is usually the bottleneck. Take RRT as an example, when it 1 Reconfiguring robotic computing at run time. Robots
runs on CPU, 99% of the instructions are executed for collision usually operate in highly dynamic environments, thus design-
detection, taking up 90% of total computation time. Recent ing runtime-reconfigurable compute platforms is critical and
efforts have proposed to accelerate motion planning kernels can enable robots to be adaptive in various scenarios. Partial
through algorithm-hardware co-design on FPGAs. [31] con- reconfiguration (PR) is a key feature of FPGA. Using PR, part
structs robot-specific circuitry and architecture with roadmap of FPGA can be reconfigured at runtime without compromis-
pre-computation and massive path search parallelism, which ing the integrity of the applications running on those parts of
is able to solve a motion planning query in 16 µs. [32] further the device that are not being reconfigured. Therefore, PR can
presents a programmable dataflow architecture with a low-cost allow various robotic computing kernels to time-share part of
interconnection network, reducing the latency to 2.3 µs. an FPGA, leading to high performance and energy efficiency,
Several key design and optimization techniques are lever- and making FPGA a more suitable computing platform for
aged in FPGA-accelerated perception, localization, planning, dynamic and complex robotic workloads.
and control. From the software aspect, hardware-friendly al- 2 Modularizing robotic computing kernels design. The
gorithms and data structures are proposed to promote paral- number of robotic algorithms is booming, but many algorithm
lelism with reduced intrinsic recursions and algorithm com- variants share similar key computation blocks. It is thereby
plexity. From the hardware aspect, robot-specific architecture, imperative to modularize the robotic computing kernel design.
data sparsity, locality, parallelism, optimized interconnection We can build optimized hardware acceleration blocks for these
networks, and reduced data movement contribute to high- kernels as libraries or packages, while exploring their inherent
task-specific features such as sparsity, data flow, and memory envision the agile framework will intelligently search the huge
access patterns. During the design phase, robotics practition- design space and automatically choose the optimal algorithm-
ers can directly import these robotics-specific libraries and hardware parameters with the help of modular kernels, bench-
building blocks to build their FPGA design without delving marking, and machine learning-assist methods. New robotic-
into hardware engineering, which will greatly ease the design centric electronic design automation (EDA) tool needs to be
process. Modularizing the robotic algorithm design can help developed to convert the design to FPGA implementation.
roboticists create custom accelerators for a kernel without Automating the design follow will greatly facilitate the FPGA-
hardware expertise. based robotic computing development, and make FPGAs an
3 Mapping robotic computing on heterogeneous plat- ideal platform for fast prototyping and commercialization.
forms. One of the key technical challenges of designing 7 Building customized robotic computing with the
robotic compute systems is to develop a suitable computer open-source framework. The field of robotic computing is
architecture, along with a software stack that allows computa- still in its infancy and fast-changing, and numerous opportu-
tional flexibility. To improve the overall performance, FPGA- nities still exist in task-specific acceleration. The open-source
based System-on-Chip (SoC) solutions for robotic computing design framework with iteratively deployment, profiling, and
would be of the essence [8], which holistically integrates optimization has recently been developed for machine learning
various computing technologies, including CPU, GPU, FPGA, applications [43], but it is still to be explored for robotic
and accelerators. The OpenCL framework can be used for computing applications. Designers can build their custom
programming and executing programs across heterogeneous specialized and optimized processors based on the RISC-V
platforms, and accelerator-level parallelism is expected to be instruction set architecture (ISA). Defining and building an
explored [37]. By doing so, the SoCs are equipped with both open-source FPGA-based RISC-V robotics-on-chip processor
software and hardware programmability, having the capability with open-source frameworks would considerably facilitate the
to deliver high performance, low power, adaptive, and reliable design process and allow us to adapt to the rapidly changing
robotic computing. landscape of robotic computing algorithms and accelerators.
4 Connecting FPGA to ROS ecosystem. With ROS in- 8 Integrating robotic computing hardware in a sim-
creasingly utilized in robotics applications of all scales, robotic ulation loop. The FPGA-accelerated kernels are usually part
FPGA platforms need to be able to efficiently map ROS com- of the whole autonomy computing pipeline. The correlation
putational graphs on silicon. Going beyond the current work on among compute stages and other robotic cyber-physical com-
accelerating specific ROS libraries, the inter-process and intra- ponents will impact the final robotic system performance and
process between ROS nodes also need to be accelerated [38]. It lead to inaccurate hardware evaluation [44], [45]. Thus, instead
is worth noting that the hardware acceleration must be directly of isolated hardware development, adopting the hardware-in-
integrated into the ROS ecosystem to provide a seamless user the-loop (HIL) method is critical [46]. HIL requires plugging
experience for roboticists. A better interface between ROS the hardware platforms into the simulation to understand
and FPGA is expected to be delivered. Furthermore, through how robots respond to stimuli on FPGA or other compute
dynamically and efficiently mapping ROS to heterogeneous substrates. HIL can help designers quantify the FPGA real-
compute platforms, holistic hardware acceleration for robotic time performance within the whole system and enable robust
computing on ROS applications is expected to be achieved. evaluation without risking real robots. Particularly, HIL can
5 Benchmarking robotic computing kernels. Given the alleviate the FPGA hardware-induced gaps between training
proliferation of robotic kernels and the rapid advances of and deployment in learning-based systems. To perform faster
hardware platforms, benchmarking these robotic algorithms performance evaluation at an earlier design stage, a closed-
and systems in a comparable, quantitative, and validatable loop co-simulation framework of both FPGA architectural
manner is imperative. Such benchmarking comes into two behavior (e.g., FireSim [47], SystemModeler [48]) and robotic
folds, benchmarking a robotic algorithm across various hard- environment simulator (e.g., AirSim [49]) is necessary.
ware platforms, and benchmarking various robotic algorithms The abundance of challenges raised above provides plentiful
within the same hardware [39]. Particularly, benchmarks opportunities for research development at all levels. Endeav-
should consider the interactions of ROS and its computa- oring to solve these problems requires interdisciplinary ap-
tional graph. Benchmarking robotic computing will guide the proaches across all layers of computing stack, from algorithm
robotics and hardware researchers to investigate the trade-offs and system to architecture, micro-architecture, and circuits.
in accuracy, performance, and energy efficiency of various
robotic algorithms, and implement (or select) algorithms on V. C ONCLUSION
FPGAs and other platforms in a performance-portable way. Robotic computing is a rising area and critically depends on
6 Automating robotic computing design flow. Given efficient, adaptive, and reliable compute substrates. This paper
the increasing complexity of robotic algorithms and the cross- presents the cross-layer robotic computing stack and illustrates
stack nature, the development of robotic computing systems the current progress, along with FPGA design techniques.
is becoming slow and tedious. Thus, building a push-button We conclude the paper by discussing the challenges, research
flow with robotic task requirements as input to automatically opportunities, and roadmap for the next-generation FPGA-
generate robotic accelerator design is critical [40]–[42]. We based robotic computing systems.
R EFERENCES [26] K. Abdelouahab et al., “Accelerating cnn inference on fpgas: A survey,”
arXiv preprint arXiv:1806.01683, 2018.
[1] Z. Wan et al., “A survey of fpga-based robotic computing,” IEEE [27] Q. Liu et al., “π-ba: Bundle adjustment hardware accelerator based on
Circuits and Systems Magazine, vol. 21, no. 2, pp. 48–74, 2021. distribution of 3d-point observations,” IEEE Transactions on Computers,
[2] S. Liu et al., “Robotic computing on fpgas,” Synthesis Lectures on vol. 69, no. 7, pp. 1083–1095, 2020.
Computer Architecture, vol. 16, no. 1, pp. 1–218, 2021. [28] B. Asgari et al., “Pisces: power-aware implementation of slam by
[3] K. Hauser, “Lazy collision checking in asymptotically-optimal motion customizing efficient sparse algebra,” in 2020 57th ACM/IEEE Design
planning,” in 2015 IEEE international conference on robotics and Automation Conference (DAC), pp. 1–6, IEEE, 2020.
automation (ICRA), pp. 2951–2957, IEEE, 2015. [29] W. Liu et al., “Archytas: A framework for synthesizing and dynam-
[4] J. Pan and D. Manocha, “Gpu-based parallel collision detection for ically optimizing accelerators for robotic localization,” in MICRO-54:
fast motion planning,” The International Journal of Robotics Research, 54th Annual IEEE/ACM International Symposium on Microarchitecture,
vol. 31, no. 2, pp. 187–200, 2012. pp. 479–493, 2021.
[5] A. Suleiman et al., “Navion: A 2-mw fully integrated real-time [30] Q. Liu et al., “An energy-efficient and runtime-reconfigurable fpga-
visual-inertial odometry accelerator for autonomous navigation of nano based accelerator for robotic localization systems,” in 2022 IEEE Custom
drones,” IEEE Journal of Solid-State Circuits, pp. 1106–1119, 2019. Integrated Circuits Conference (CICC), IEEE, 2022.
[6] J.-H. Yoon and A. Raychowdhury, “Neuroslam: A 65-nm 7.25-to-8.79- [31] S. Murray et al., “The microarchitecture of a real-time robot motion
tops/w mixed-signal oscillator-based slam accelerator for edge robotics,” planning accelerator,” in 2016 49th Annual IEEE/ACM International
IEEE Journal of Solid-State Circuits, vol. 56, no. 1, pp. 66–78, 2020. Symposium on Microarchitecture (MICRO), pp. 1–12, IEEE, 2016.
[7] Z. Wan et al., “Circuit and system technologies for energy-efficient [32] S. Murray et al., “A programmable architecture for robot motion
edge robotics,” in 2022 27th Asia and South Pacific Design Automation planning acceleration,” in 2019 IEEE 30th International Conference
Conference (ASP-DAC), pp. 275–280, IEEE, 2022. on Application-specific Systems, Architectures and Processors (ASAP),
[8] V. Mayoral-Vilches and G. Corradi, “Adaptive computing in robotics, vol. 2160, pp. 185–188, IEEE, 2019.
leveraging ros 2 to enable software-defined hardware for fpgas,” arXiv [33] J. Yu et al., “Incame: Interruptible cnn accelerator for multi-robot ex-
preprint arXiv:2109.03276, 2021. ploration,” IEEE Transactions on Computer-Aided Design of Integrated
[9] P. Kocher et al., “Spectre attacks: Exploiting speculative execution,” in Circuits and Systems, 2021.
2019 IEEE Symposium on Security and Privacy (SP), pp. 1–19, 2019. [34] Y. Sugata et al., “Acceleration of publish/subscribe messaging in ros-
[10] Z. Wan et al., “Analyzing and improving fault tolerance of learning- compliant fpga component,” in Proceedings of the 8th International
based navigation systems,” in 2021 58th ACM/IEEE Design Automation Symposium on Highly Efficient Accelerators and Reconfigurable Tech-
Conference (DAC), pp. 841–846, IEEE, 2021. nologies, pp. 1–6, 2017.
[11] E. Rublee et al., “Orb: An efficient alternative to sift or surf,” in 2011 [35] D. P. Leal et al., “Automated integration of high-level synthesis fpga
International conference on computer vision, pp. 2564–2571, 2011. modules with ros2 systems,” in 2020 International Conference on Field-
[12] Z. Lu et al., “A resource-efficient pipelined architecture for real- Programmable Technology (ICFPT), pp. 292–293, IEEE, 2020.
time semi-global stereo matching,” IEEE Transactions on Circuits and [36] C. Lienen et al., “Reconros: Flexible hardware acceleration for ros2
Systems for Video Technology, 2021. applications,” in 2020 International Conference on Field-Programmable
[13] R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam Technology (ICFPT), pp. 268–276, IEEE, 2020.
system for monocular, stereo, and rgb-d cameras,” IEEE transactions [37] M. D. Hill and V. J. Reddi, “Accelerator-level parallelism,” Communi-
on robotics, vol. 33, no. 5, pp. 1255–1262, 2017. cations of the ACM, vol. 64, no. 12, pp. 36–38, 2021.
[14] K. Sun et al., “Robust stereo visual inertial odometry for fast au- [38] C. Lienen and M. Platzner, “Reconros executor: Event-driven pro-
tonomous flight,” IEEE Robotics and Automation Letters, vol. 3, no. 2, gramming of fpga-accelerated ros 2 applications,” arXiv preprint
pp. 965–972, 2018. arXiv:2201.07454, 2022.
[15] P. Geneva et al., “Openvins: A research platform for visual-inertial [39] S. M. Neuman et al., “Benchmarking and workload analysis of robot
estimation,” in 2020 IEEE International Conference on Robotics and dynamics algorithms,” in 2019 IEEE/RSJ International Conference on
Automation (ICRA), pp. 4666–4672, IEEE, 2020. Intelligent Robots and Systems (IROS), pp. 5235–5242, IEEE, 2019.
[16] B. Ichter et al., “Learned critical probabilistic roadmaps for robotic [40] S. Krishnan et al., “Autopilot: Automating soc design space ex-
motion planning,” in 2020 IEEE International Conference on Robotics ploration for swap constrained autonomous uavs,” arXiv preprint
and Automation (ICRA), pp. 9535–9541, IEEE, 2020. arXiv:2102.02988, 2021.
[17] S. M. LaValle et al., “Rapidly-exploring random trees: Progress and [41] S. M. Neuman et al., “Robomorphic computing: a design methodology
prospects,” Algorithmic and computational robotics: new directions, for domain-specific accelerators parameterized by robot morphology,”
vol. 5, pp. 293–308, 2001. in 26th ACM International Conference on Architectural Support for
[18] A. Anwar and A. Raychowdhury, “Autonomous navigation via deep Programming Languages and Operating Systems, pp. 674–686, 2021.
reinforcement learning for resource constraint edge nodes using transfer [42] S. Krishnan et al., “Autosoc: Automating algorithm-soc co-design for
learning,” IEEE Access, vol. 8, pp. 26549–26560, 2020. aerial robots,” arXiv preprint arXiv:2109.05683, 2021.
[19] A. Loquercio et al., “Dronet: Learning to fly by driving,” IEEE Robotics [43] S. Prakash et al., “Cfu playground: Full-stack open-source framework
and Automation Letters, vol. 3, no. 2, pp. 1088–1095, 2018. for tiny machine learning (tinyml) acceleration on fpgas,” arXiv preprint
[20] Z. Wan et al., “An energy-efficient quad-camera visual system for arXiv:2201.01863, 2022.
autonomous machines on fpga platform,” in 2021 IEEE 3rd International [44] S. Krishnan et al., “The sky is not the limit: A visual performance model
Conference on Artificial Intelligence Circuits and Systems (AICAS), for cyber-physical co-design in autonomous machines,” IEEE Computer
pp. 1–4, IEEE, 2021. Architecture Letters, vol. 19, no. 1, pp. 38–42, 2020.
[21] Y. Gan et al., “Eudoxus: Characterizing and accelerating localization [45] S. Krishnan et al., “Roofline model for uavs: A bottleneck analysis
in autonomous machines industry track paper,” in 2021 IEEE Interna- tool for onboard compute characterization of autonomous unmanned
tional Symposium on High-Performance Computer Architecture (HPCA), aerial vehicles,” in 2022 IEEE International Symposium on Performance
pp. 827–840, IEEE, 2021. Analysis of Systems and Software (ISPASS), IEEE, 2022.
[22] S. Perri et al., “Stereo vision architecture for heterogeneous systems- [46] B. Boroujerdian et al., “Mavbench: Micro aerial vehicle benchmarking,”
on-chip,” Journal of Real-Time Image Processing, pp. 393–415, 2020. in 2018 51st annual IEEE/ACM international symposium on microar-
[23] R. Kamasaka et al., “An fpga-oriented graph cut algorithm for accelerat- chitecture (MICRO), pp. 894–907, IEEE, 2018.
ing stereo vision,” in 2018 International Conference on ReConFigurable [47] S. Karandikar et al., “Firesim: Fpga-accelerated cycle-exact scale-out
Computing and FPGAs (ReConFig), pp. 1–6, IEEE, 2018. system simulation in the public cloud,” in 2018 ACM/IEEE 45th Annual
[24] J. Zhao et al., “Fp-stereo: Hardware-efficient stereo vision for em- International Symposium on Computer Architecture (ISCA), pp. 29–42,
bedded applications,” in 2020 30th International Conference on Field- IEEE, 2018.
Programmable Logic and Applications (FPL), pp. 269–276, IEEE, 2020. [48] M. Acevedo, “Fpga-based hardware-in-the-loop co-simulator platform
[25] T. Gao et al., “Ielas: An elas-based energy-efficient accelerator for real- for systemmodeler,” 2016.
time stereo matching on fpga platform,” in 2021 IEEE 3rd International [49] S. Shah et al., “Airsim: High-fidelity visual and physical simulation
Conference on Artificial Intelligence Circuits and Systems (AICAS), for autonomous vehicles,” in Field and service robotics, pp. 621–635,
pp. 1–4, IEEE, 2021. Springer, 2018.

You might also like