Editorial FPGAs for Domain Experts

Sven-Bodo Scholz

Editorial FPGAs for Domain Experts

Sven-Bodo Scholz

2020

visibility

…

description

3 pages

link

1 file

Heriot-Watt University Research Gateway FPGAs for Domain Experts Citation for published version: Vanderbauwhede, W, Scholz, S-B & Margala, M 2020, 'FPGAs for Domain Experts', International Journal of Reconfigurable Computing, vol. 2020, 2725809. https://doi.org/10.1155/2020/2725809 Digital Object Identifier (DOI): 10.1155/2020/2725809 Link: Link to publication record in Heriot-Watt Research Portal Document Version: Publisher's PDF, also known as Version of record Published In: International Journal of Reconfigurable Computing Publisher Rights Statement: © 2020 Wim Vanderbauwhede et al. General rights Copyright for the publications made accessible via Heriot-Watt Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy Heriot-Watt University has made every reasonable effort to ensure that the content in Heriot-Watt Research Portal complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 09. Dec. 2021 Hindawi International Journal of Reconﬁgurable Computing Volume 2020, Article ID 2725809, 2 pages https://doi.org/10.1155/2020/2725809 Editorial FPGAs for Domain Experts Wim Vanderbauwhede ,1 Sven-Bodo Scholz ,2 and Martin Margala 3 1 University of Glasgow, Glasgow, UK Heriot-Watt University, Edinburgh, UK 3 University of Massachusetts Lowell, Lowell, MA, USA 2 Correspondence should be addressed to Wim Vanderbauwhede; [email protected] Received 4 May 2020; Accepted 16 September 2020; Published 27 October 2020 Copyright © 2020 Wim Vanderbauwhede et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Field-Programmable Gate Arrays (FPGAs) have recently gained a lot of attention through demonstrated superior performance over oﬀ-the-shelf architectures, not only with respect to energy eﬃciency but also with respect to wallclock runtimes. For a long time, FPGAs had been used primarily as prototyping devices or in embedded systems but are now increasingly accepted as ﬁrst-order computing devices on desktops and servers. This change has been driven by a combination of increasingly larger and resourceful FPGAs and wider availability of mature and stable high-level FPGA programming tools. The application areas range across many domains from high-ﬁnance to advanced machine learning. Despite the availability of many tools for high-level synthesis and increasing ease of access to FPGA-based computing nodes (e.g., via Amazon Web Services), domain experts still are far away from utilising FPGAs to gain processing performance unless preconﬁgured systems for their particular applications exist in readily available form. Manycore CPUs and GPUs are still generally considered the only viable options for domain experts looking to accelerate their applications. Against this background, there has been considerable research in recent years on making FPGAs accessible for domain experts. With this special issue, we are bringing together work that aims to break this barrier for a wider applicability of FPGAs. This special issue combines contributions from researchers and practitioners that share the vision of enabling domain experts to beneﬁt from the performance opportunities of FPGAs. We hope that you enjoy this special issue, and this paper collection as a whole can introduce readers to the varied and challenging area of FPGA computing, presenting several state-of-the-art solutions from diverse perspectives. All accepted papers provide relevant and interesting research techniques, models, and work directly applied to the area of scientiﬁc FPGA programming. Finally, we would like to thank all the authors for their submissions to this special issue and the also the reviewers for dedicating their time to provide detailed comments and suggestions that helped to improve the quality of this special issue. The ﬁrst paper, “Automatic Pipelining and Vectorization of Scientiﬁc Code for FPGAs,” focuses on FPGA compilation of legacy scientiﬁc code in Fortran. There is a very large body of legacy scientiﬁc code still in use today, and much new scientiﬁc code is still being written in Fortran-77. Many of these codes would beneﬁt from acceleration on GPUs and FPGAs. Manual translation of such legacy code parallel code for GPUs or FPGAs requires a considerable manual eﬀort. This is a major barrier to wider adoption of FPGAs. The authors of this paper have been developing an automated optimizing compiler to lower this barrier. Their aim is to compile legacy Fortran code automatically to FPGA, without any need for rewriting or insertion of pragma. The compiler applies suitable optimizations based on static code analysis. The paper focuses on two key optimizations, automatic pipelining and vectorization. The compiler identiﬁes portions of the legacy code that can be pipelined and vectorized. The backend generates coarse-grained pipelines and 2 automatically vectorizes both the memory access and the data path based on a cost model, generating an OpenCLHDL hybrid solution for FPGA targets on the Amazon cloud. The results show up a performance improvement of up to four times over baseline OpenCL code. The second paper, “Dimension Reduction Using Quantum Wavelet Transform on a High-Performance Reconﬁgurable Computer,” introduces a very interesting and exciting new ﬁeld, the use of FPGAs for the acceleration of quantum computing simulations. Simulation is a crucial step in the development of quantum computers and algorithms, and FPGAs have huge potential to accelerate this type of simulations. The paper proposes to combine dimension reduction techniques with quantum information processing for application in domains that generate large volumes of data such as high-energy physics (HEP). It focuses on using quantum wavelet transform (QWT) [1] to reduce the dimensionality of high spatial resolution data. The quantum wavelet transform takes advantage of quantum superposition to reduce computing time for the processing of exponentially larger amounts of information. The authors present a new emulation architecture to perform QWT and its inverse on high-resolution data, and a prototype of an FPGA-based quantum emulator. Experiments using highresolution image data on a state-of-the-art multinode highperformance reconﬁgurable computer show that the proposed concepts represent a feasible approach to reducing the dimensionality of high spatial resolution data generated by applications in HEP. The third paper, “Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS),” provides an in-depth description of a high-level synthesis workﬂow around Vivado HLS tools. It comprises tools on both sides of HLS: tools for design space exploration prior to running HLS named COTSon and MYDSE as well as tools for targeting a custom build hardware, the AXIOM board. The article provides a good overview of the tools and the overall workﬂow through the HLS tool. The abstract description of the workﬂow is substantiated by an in-depth presentation of example applications including the design of a system for distributed computation across multiple FPGA boards. Finally, some empirical evidence for the predictive capabilities of the tool chain is being presented. Overall, this contribution not only demonstrates the challenges involved when designing complex systems with HLS at the core nicely but also features the presentation of custom-made tooling which can be used by the wider community. The fourth paper, “An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick,” presents a full on-chip FPGA hardware accelerator for a separable convolutional neural network designed for a keyword spotting application. This is a quantized neural network realized exclusively using on-chip memories. The design is based on the Intel Movidius Neural Compute Stick and compares against this device, which deploys a custom accelerator, the Intel Movidius Myriad X Vision Processing International Journal of Reconﬁgurable Computing Unit (VPU) [2]. The results show that better inference time and energy per inference result can be obtained with comparable accuracy. This is a striking result as the VPU is a dedicated accelerator touting ultralow power and high performance and serves to showcase the potential of quantized CNNs on FPGAs. The ﬁnal paper “Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs,” addresses the problem of solving linear systems, a very common problem in scientiﬁc computing and HPC. As indicated by the title, the paper focuses in particular on diagonally dominant tridiagonal linear systems using the truncated-SPIKE algorithm [3] and presents a numerically stable optimised FPGA implementation using the open standard OpenCL [4]. The paper compares implementations of the algorithm on CPU, GPU, and FPGA as well as provides comparison against an optimised implementation of the TDMA solver [5]. The FPGA implementation is shown to have better performance per Watt than the CPU and GPU, and the truncated-SPIKE algorithm outperforms the TDMA algorithm on FPGA and CPU. The paper also demonstrates the potential of utilising FPGAs, GPUs, and CPUs concurrently in a heterogeneous computing environment to solve linear systems. Conflicts of Interest The editors declare that they have no conﬂicts of interest. Acknowledgments The editors wish to acknowledge the collaborative funding support from the UK EPSRC under grant P/L00058X/1. Wim Vanderbauwhede Sven-Bodo Scholz Martin Margala References [1] A. Fijany and C. P. Williams, “Quantum wavelet transforms: fast algorithms and complete circuits,” in Proceedings of the NASA International Conference on Quantum Computing and Quantum Communications, pp. 10–33, Springer, Palm Springs, CA, USA, February 1998. [2] S. Rivas-Gomez, A. J. Pena, D. Moloney, E. Laure, and S. Markidis, “Exploring the vision processing unit as co-processor for inference,” in Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 589–598, IEEE, Vancouver, Canada, May 2018. [3] C. C. K. Mikkelsen and M. Manguoglu, “Analysis of the truncated spike algorithm,” SIAM Journal on Matrix Analysis and Applications, vol. 30, no. 4, pp. 1500–1519, 2009. [4] A. Munshi, “The opencl speciﬁcation,” in Proceedings of the IEEE Hot Chips 21 Symposium (HCS), pp. 1–314, IEEE, Stanford, CA, USA, August 2009. [5] D. J. Warne, N. A. Kelson, and R. F. Hayward, “Comparison of high level fpga hardware design for solving tri-diagonal linear systems,” Procedia Computer Science, vol. 29, pp. 95–101, 2014.

Log In

Editorial FPGAs for Domain Experts

Related papers

Related papers