Heriot-Watt University
Research Gateway
FPGAs for Domain Experts
Citation for published version:
Vanderbauwhede, W, Scholz, S-B & Margala, M 2020, 'FPGAs for Domain Experts', International Journal of
Reconfigurable Computing, vol. 2020, 2725809. https://doi.org/10.1155/2020/2725809
Digital Object Identifier (DOI):
10.1155/2020/2725809
Link:
Link to publication record in Heriot-Watt Research Portal
Document Version:
Publisher's PDF, also known as Version of record
Published In:
International Journal of Reconfigurable Computing
Publisher Rights Statement:
© 2020 Wim Vanderbauwhede et al.
General rights
Copyright for the publications made accessible via Heriot-Watt Research Portal is retained by the author(s) and /
or other copyright owners and it is a condition of accessing these publications that users recognise and abide by
the legal requirements associated with these rights.
Take down policy
Heriot-Watt University has made every reasonable effort to ensure that the content in Heriot-Watt Research
Portal complies with UK legislation. If you believe that the public display of this file breaches copyright please
contact
[email protected] providing details, and we will remove access to the work immediately and
investigate your claim.
Download date: 09. Dec. 2021
Hindawi
International Journal of Reconfigurable Computing
Volume 2020, Article ID 2725809, 2 pages
https://doi.org/10.1155/2020/2725809
Editorial
FPGAs for Domain Experts
Wim Vanderbauwhede ,1 Sven-Bodo Scholz ,2 and Martin Margala
3
1
University of Glasgow, Glasgow, UK
Heriot-Watt University, Edinburgh, UK
3
University of Massachusetts Lowell, Lowell, MA, USA
2
Correspondence should be addressed to Wim Vanderbauwhede;
[email protected]
Received 4 May 2020; Accepted 16 September 2020; Published 27 October 2020
Copyright © 2020 Wim Vanderbauwhede et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Field-Programmable Gate Arrays (FPGAs) have recently
gained a lot of attention through demonstrated superior
performance over off-the-shelf architectures, not only with
respect to energy efficiency but also with respect to wallclock runtimes. For a long time, FPGAs had been used
primarily as prototyping devices or in embedded systems but
are now increasingly accepted as first-order computing
devices on desktops and servers. This change has been driven
by a combination of increasingly larger and resourceful
FPGAs and wider availability of mature and stable high-level
FPGA programming tools.
The application areas range across many domains from
high-finance to advanced machine learning. Despite the
availability of many tools for high-level synthesis and increasing ease of access to FPGA-based computing nodes
(e.g., via Amazon Web Services), domain experts still are far
away from utilising FPGAs to gain processing performance
unless preconfigured systems for their particular applications exist in readily available form. Manycore CPUs and
GPUs are still generally considered the only viable options
for domain experts looking to accelerate their applications.
Against this background, there has been considerable
research in recent years on making FPGAs accessible for
domain experts. With this special issue, we are bringing
together work that aims to break this barrier for a wider
applicability of FPGAs.
This special issue combines contributions from researchers and practitioners that share the vision of enabling
domain experts to benefit from the performance opportunities of FPGAs.
We hope that you enjoy this special issue, and this paper
collection as a whole can introduce readers to the varied and
challenging area of FPGA computing, presenting several
state-of-the-art solutions from diverse perspectives. All
accepted papers provide relevant and interesting research
techniques, models, and work directly applied to the area of
scientific FPGA programming.
Finally, we would like to thank all the authors for their
submissions to this special issue and the also the reviewers
for dedicating their time to provide detailed comments and
suggestions that helped to improve the quality of this special
issue.
The first paper, “Automatic Pipelining and Vectorization
of Scientific Code for FPGAs,” focuses on FPGA compilation
of legacy scientific code in Fortran. There is a very large body
of legacy scientific code still in use today, and much new
scientific code is still being written in Fortran-77. Many of
these codes would benefit from acceleration on GPUs and
FPGAs. Manual translation of such legacy code parallel code
for GPUs or FPGAs requires a considerable manual effort.
This is a major barrier to wider adoption of FPGAs. The
authors of this paper have been developing an automated
optimizing compiler to lower this barrier. Their aim is to
compile legacy Fortran code automatically to FPGA, without
any need for rewriting or insertion of pragma. The compiler
applies suitable optimizations based on static code analysis.
The paper focuses on two key optimizations, automatic
pipelining and vectorization. The compiler identifies portions of the legacy code that can be pipelined and vectorized.
The backend generates coarse-grained pipelines and
2
automatically vectorizes both the memory access and the
data path based on a cost model, generating an OpenCLHDL hybrid solution for FPGA targets on the Amazon
cloud. The results show up a performance improvement of
up to four times over baseline OpenCL code.
The second paper, “Dimension Reduction Using
Quantum Wavelet Transform on a High-Performance
Reconfigurable Computer,” introduces a very interesting
and exciting new field, the use of FPGAs for the acceleration
of quantum computing simulations. Simulation is a crucial
step in the development of quantum computers and algorithms, and FPGAs have huge potential to accelerate this
type of simulations. The paper proposes to combine dimension reduction techniques with quantum information
processing for application in domains that generate large
volumes of data such as high-energy physics (HEP). It focuses on using quantum wavelet transform (QWT) [1] to
reduce the dimensionality of high spatial resolution data.
The quantum wavelet transform takes advantage of quantum
superposition to reduce computing time for the processing
of exponentially larger amounts of information. The authors
present a new emulation architecture to perform QWT and
its inverse on high-resolution data, and a prototype of an
FPGA-based quantum emulator. Experiments using highresolution image data on a state-of-the-art multinode highperformance reconfigurable computer show that the proposed concepts represent a feasible approach to reducing the
dimensionality of high spatial resolution data generated by
applications in HEP.
The third paper, “Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain
Expertise—Designing a Computer Architecture via HLS),”
provides an in-depth description of a high-level synthesis
workflow around Vivado HLS tools. It comprises tools on
both sides of HLS: tools for design space exploration prior
to running HLS named COTSon and MYDSE as well as
tools for targeting a custom build hardware, the AXIOM
board. The article provides a good overview of the tools and
the overall workflow through the HLS tool. The abstract
description of the workflow is substantiated by an in-depth
presentation of example applications including the design
of a system for distributed computation across multiple
FPGA boards. Finally, some empirical evidence for the
predictive capabilities of the tool chain is being presented.
Overall, this contribution not only demonstrates the
challenges involved when designing complex systems with
HLS at the core nicely but also features the presentation of
custom-made tooling which can be used by the wider
community.
The fourth paper, “An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design
and Benchmarking with Intel Movidius Neural Compute
Stick,” presents a full on-chip FPGA hardware accelerator
for a separable convolutional neural network designed for a
keyword spotting application. This is a quantized neural
network realized exclusively using on-chip memories. The
design is based on the Intel Movidius Neural Compute Stick
and compares against this device, which deploys a custom
accelerator, the Intel Movidius Myriad X Vision Processing
International Journal of Reconfigurable Computing
Unit (VPU) [2]. The results show that better inference time
and energy per inference result can be obtained with
comparable accuracy. This is a striking result as the VPU is a
dedicated accelerator touting ultralow power and high
performance and serves to showcase the potential of
quantized CNNs on FPGAs.
The final paper “Implementing and Evaluating an
Heterogeneous, Scalable, Tridiagonal Linear System
Solver with OpenCL to Target FPGAs, GPUs, and CPUs,”
addresses the problem of solving linear systems, a very
common problem in scientific computing and HPC. As
indicated by the title, the paper focuses in particular on
diagonally dominant tridiagonal linear systems using the
truncated-SPIKE algorithm [3] and presents a numerically stable optimised FPGA implementation using the
open standard OpenCL [4]. The paper compares implementations of the algorithm on CPU, GPU, and FPGA as
well as provides comparison against an optimised
implementation of the TDMA solver [5]. The FPGA
implementation is shown to have better performance per
Watt than the CPU and GPU, and the truncated-SPIKE
algorithm outperforms the TDMA algorithm on FPGA
and CPU. The paper also demonstrates the potential of
utilising FPGAs, GPUs, and CPUs concurrently in a
heterogeneous computing environment to solve linear
systems.
Conflicts of Interest
The editors declare that they have no conflicts of interest.
Acknowledgments
The editors wish to acknowledge the collaborative funding
support from the UK EPSRC under grant P/L00058X/1.
Wim Vanderbauwhede
Sven-Bodo Scholz
Martin Margala
References
[1] A. Fijany and C. P. Williams, “Quantum wavelet transforms:
fast algorithms and complete circuits,” in Proceedings of the
NASA International Conference on Quantum Computing and
Quantum Communications, pp. 10–33, Springer, Palm Springs,
CA, USA, February 1998.
[2] S. Rivas-Gomez, A. J. Pena, D. Moloney, E. Laure, and
S. Markidis, “Exploring the vision processing unit as co-processor for inference,” in Proceedings of the IEEE International
Parallel and Distributed Processing Symposium Workshops
(IPDPSW), pp. 589–598, IEEE, Vancouver, Canada, May 2018.
[3] C. C. K. Mikkelsen and M. Manguoglu, “Analysis of the
truncated spike algorithm,” SIAM Journal on Matrix Analysis
and Applications, vol. 30, no. 4, pp. 1500–1519, 2009.
[4] A. Munshi, “The opencl specification,” in Proceedings of the
IEEE Hot Chips 21 Symposium (HCS), pp. 1–314, IEEE,
Stanford, CA, USA, August 2009.
[5] D. J. Warne, N. A. Kelson, and R. F. Hayward, “Comparison of
high level fpga hardware design for solving tri-diagonal linear
systems,” Procedia Computer Science, vol. 29, pp. 95–101, 2014.