A Hardware-Oriented Echo State Network and Its FPGA Implementation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Journal of Robotics, Networking and Artificial Life

Vol. 7(1); June (2020), pp. 58–62


DOI: https://doi.org/10.2991/jrnal.k.200512.012; ISSN 2405-9021; eISSN 2352-6386
https://www.atlantis-press.com/journals/jrnal

Research Article
A Hardware-Oriented Echo State Network and its
FPGA Implementation

Kentaro Honda, Hakaru Tamukoh*


Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, 2-4 Hibikino, Wakamatsu-ku, Kitakyushu,
Fukuoka 808-0196, Japan

ARTICLE INFO ABSTRACT


Article History This paper proposes implementation of an Echo State Network (ESN) to Field Programmable Gate Array (FPGA). The proposed
Received 11 November 2019 method is able to reduce hardware resources by using fixed-point operation, quantization of weights, which includes accumulate
Accepted 04 March 2020 operations and efficient dataflow modules. The performance of the designed circuit is verified via experiments including
prediction of sine and cosine waves. Experimental result shows that the proposed circuit supports to 200 MHz of operation
Keywords frequency and facilitates faster computing of the ESN algorithm compared with a central processing unit.
Reservoir computing
echo state network
© 2020 The Authors. Published by Atlantis Press SARL.
field programmable gate array This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION which the neurons of the reservoir layer are randomly connected
to each other.
Neural networks are highly expected to be applied into embedded
ESN is described by Equations (1) and (2),
systems such as robots and automobiles. However, Deep Neural
Networks (DNNs) [1] require high computational power because
x(t ) = f ((1 - d )x(t - 1) + d (winu(t ) + wres x(t - 1))) (1)
a lot of accumulate operations are being processed using them.
Generally, graphics processing units are used to accelerate these
z (t ) = wout × x(t )  (2)
computations; however, as their power consumption is high, imple-
menting embedded systems using them is difficult due to a power
limit. To mitigate this problem, we have implemented DNNs into where x(t) and z(t) are output of the reservoir and output layer,
hardware such as Field Programmable Gate Arrays (FPGAs), realiz- respectively, time t, u(t) is an input data, d is the leak rate, which is
ing high-speed calculation with low power consumption. the rate of the term x(t − 1) that affects x(t). win, wres, and wout are the
weights of the input, reservoir and output layer, respectively. The
In this paper, we have implemented an Echo State Network (ESN) activation function f is defined as the hyperbolic tangent function.
[2], a kind of Reservoir Computing (RC) into an FPGA. An RC is a The reservoir layer follows the Echo State Property (ESP) [3] and its
Recurrent Neural Network (RNN) model in which only the weights weights are initialized by the following steps:
of an output layer are defined in the training step. ESNs are able to
learn time-series data faster than general RNNs such as Long Short- 1. All weights of the reservoir layer are generated from a normal
term Memory (LSTM). In ESNs, a lot of accumulate operations of distribution.
input data and weights are executed, however, there are limitations
of FPGA resources such as Loot Up Table (LUT), Flip Flop (FF) and 2. A spectral radius (the maximum eigenvalue of the weights) is
Digital Signal Processor (DSP). As a result, we have modified the calculated and all the generated weights are divided by it.
algorithms and architectures of ESNs. Furthermore, we implement 3. All weights are multiplied by a constant value.
the proposed hardware-oriented algorithms into the FPGA and
show the effectiveness of the proposed methods by comparing the
In standard RNNs, all weights are updated following the backprop-
proposed circuit with other.
agation through time algorithm [4]. On the other hand, in the ESN,
only the weights of the output layer are updated in one-shot learning
through ridge regression as follows:
2.  ECHO STATE NETWORK
wout = ( X T X + l I )-1 X TY  (3)
The ESN is a type of RC which consists of three layers: an input
layer, a reservoir layer and an output layer, shown in Figure 1, in
where X is the matrix of x(t) for all time-series, Y is the matrix of the
Corresponding author. Email: [email protected]
* supervised signal for all time-series, and l is the regularization term.
K. Honda and H. Tamukoh / Journal of Robotics, Networking and Artificial Life 7(1) 58–62 59

uk = m + s vk  (5)

where uk and yk are input and output at time k. a, b, g, d, m, s are


hyper parameters which we set to value (0.3, 0.05, 1.5, 0.1, 1, 0.5).
vk is the random number from 0 to 1. Training data contains 4000
time steps and test data contains 300 time steps, but we used only
the last 200 time steps data.
Figure 3 shows the prediction of 100–200 time steps of the quan-
tized model where the number of reservoir neurons was 1000.
The black line represents the supervised signal and the blue line
Figure 1 | Echo state network. represents output of the quantized model. The quantized model
was able to reproduce NARMA10. Figure 4 shows the MSE of the
supervised signal and outputs of each model, with varying in the
number of neurons in the reservoir. The accuracy of the quantized
model was similar to conventional model.

3.2.  Fixed Point


Generally, computations are conducted with floating point num-
bers, which are an exponential representation, and can represent
a wide range of numbers. A circuit using floating point numbers
is more complex as it requires many FPGA resources. In con-
trast, although the fixed-point representation can only represent
Figure 2 | Circuit of neuron. a narrow range of numbers, the circuit resources is less complex
compared with that using the floating point.

3.  HARDWARE-ORIENTED ESN


There are certain limitations of FPGA’s resources, therefore, we
3.3. Sequence Product–Sum
have to modify the algorithms to suitable for FPGA implementa- of Output Layer
tion. In this paper, we design a circuit for the ESN by following
three methods. As shown in Figure 5, general product–sum operations can be rep-
resented by a tree structure. Using this representation, the number
of adders and multipliers increases with the number of neurons.
3.1. Quantization
One way to reduce the complexity of a circuit is using quantized
values that are able to simplify the computation while maintaining
its accuracy [5].
Therefore, we calculated the outputs of the reservoir layer
[Equation (1)] using quantized weights. Generally, the weights
of input and reservoir layers are real numbers resulting in
several DSPs to compute real number multiplications. Therefore,
we transformed the real valued weights to ternary values: 0 or ±1.
Furthermore, the accuracy by using this quantization for both
training and prediction mode are maintained. Figure 3 | Output of quantized model.

The circuit of the neuron is shown in Figure 2. Where n is the


number of reservoir’s neurons. un and wn are inputs and weights of
input and reservoir layers, respectively, and m is the bit width of the
input data. Furthermore, the circuit is able to calculate accumulate
operations using only AND and OR operations.
We have verified the accuracy of a quantized model and the con-
ventional model. The task carried out to evaluate their performance
was NARMA10 [6] with equations as follows:
9
yk +1 = a yk + b yk + åy k -i + g uk uk -9 + d  (4)
i =0 Figure 4 | MSE of each model.
60 K. Honda and H. Tamukoh / Journal of Robotics, Networking and Artificial Life 7(1) 58–62

Figure 7 | Circuits architecture of conventional model.


Figure 5 | Tree structure of the product–sum circuit.

Figure 6 | Sequence structure of the product–sum circuit.

Therefore, in this research, we sequentially calculated the product–


sum of the output layer. Figure 6 illustrates the product–sum oper-
ation by the proposed method, where Ai is an intermediate variable
that temporarily stores the accumulate value. As this method con- Figure 8 | Circuits architecture of proposed model.
sists of only a single adder, multiplier, and register per neuron in
the output layer, the complexity of the circuit is reduced.
Table 1 | Details of each circuit

4.  FPGA IMPLEMENTATION Weights of input


Module of output
Operation and reservoir
layer
As shown in Figure 7, the conventional model used two pipeline. In layers
process 1 and 2, the reservoir module calculates the state of a single Conventional Floating point Real values Tree structure
neuron in reservoir layer and stores it in memory. In process 3, the Proposed Fixed point Ternary values Sequence structure
reservoir module repeats the process 1 and 2 for the rest of reservoir
neurons. In process 4, an output module calculates the output of a
single neuron in output layer by using tree structure. In process 5, Table 2 | Experimental conditions
the output module repeats process 4 for the rest of output neurons.
Tool SDSoC 2018.3
Figure 8 illustrates the circuits architecture of the proposed model.
We implement the sequential structure of the product–sum cir- Target device Zynq UltraScale+ MPSoC ZCU102
cuit (as in Figure 6) in parallel for the output layer. Therefore, the Clock frequency 200 MHz
proposed circuit is able to calculate a single neuron of reservoir
layer and output layer simultaneously in process 4. As a result, the
proposed model processes more efficiency than the conventional 6. RESULTS
model. Table 1 shows the comparison between the conventional
model and the proposed model. Figure 9 shows the prediction of the conventional and proposed
circuits. The black, blue, and red lines represent the supervised
signal, prediction of the conventional circuit and prediction of the
5. EXPERIMENT proposed circuit, respectively. Both circuits were able to repro-
duce sine and cosine waves. Tables 3 and 4 shows the utilization
In the experiment, we have created two types of circuits in order of resources for the conventional and proposed circuit, respec-
to verify the effectiveness of the proposed circuit and compared tively. The proposed method was able to reduce the overall use
its calculation speed with those of the other devices. The task to of resources approximately 50%. Table 5 shows the comparison
evaluate their performance is the prediction problem of sine and between electric energy of conventional circuit and proposed cir-
cosine waves. The number of neurons in the input, reservoir and cuit. The proposed method reduced the electric energy consump-
the output layers were 2, 100, and 2, respectively, and the prediction tion by approximately 80% compared with the conventional one.
was computed in an FPGA. The target device is a Zynq UltraScale+ Table 6 shows a comparison between the computation speed of the
MPSoC ZCU102. Furthermore, the experiment was conducted FPGA and other devices. The proposed circuit was approximately
with an operating frequency of 200 MHz and a data width of 32-bits 25 and 340 times faster than a desktop CPU and embedded CPU,
operations [7]. Table 2 shows experimental conditions. respectively.
K. Honda and H. Tamukoh / Journal of Robotics, Networking and Artificial Life 7(1) 58–62 61

Table 6 | Computation speed of devices

Platform Latency (ms)


CPU 3.2GHz (i7-8700) 5.215
Embedded CPU 1.2GHz (Quad Arm
68.123
Cortex-A53)
FPGA 200MHz (XCZU-
0.200
9EG-2FFVBG1156E)

possible while the circuit resources were reduced. To achieve this,


fixed-point computation, quantification of weights, and sequential
product–sum computation techniques were used.
In the future, it is expected to apply the proposed circuit and meth-
ods to embedded systems such as automobiles and robots.

Figure 9 | Output of circuits. CONFLICTS OF INTEREST


The authors declare they have no conflicts of interest.
Table 3 | Utilization of resources for the conventional circuit

Used Total Utilization


ACKNOWLEDGMENT
BRAM_18k 106 912 11.57
DSP_48E 519 2520 20.48 This research is supported by JSPS KAKENHI grant number
LUT 60557 274,080 22.07 17H01798.
FF 96556 548,160 17.55

REFERENCES
Table 4 | Utilization of resources for the proposed circuit
[1] G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for
Used Total Utilization deep belief nets, Neural Comput. 18 (2006), 1527–1554.
BRAM_18k 48 912 5.26 [2] H. Jaeger, The “echo state” approach to analysing and training
DSP_48E 20 2520 0.79 recurrent neural networks – with an Erratum note, German
LUT 28933 274,080 10.56 National Research Center for Information Technology GMD,
FF 44021 548,160 8.03 Bonn, Germany, Technical Report, 148 (2001), 13.
[3] I.B. Yildiz, H. Jaeger, S.J. Kiebel, Re-visiting the echo state prop-
erty, Neural Netw. 35 (2012), 1–9.
Table 5 | Electric energy consumption of each circuits [4] P.J. Werbos, Backpropagation through time: what it does and how
to do it, Proc. IEEE 78 (1990), 1550–1560.
Electric energy
Latency (ms) Power (w) [5] Y. Aratani, Y.Y. Jye, A. Suzuki, D. Shuto, T. Morie, H. Tamukoh,
(w*ms)
Multi-valued quantization neural networks toward hardware
Conventional 0.43 1.46 0.63 implementation, IEEE International Conference on Artificial Life
Proposed 0.20 0.67 0.13 and Robotics (ICAROB), 22 (2017), 132–135.
[6] A.F. Atiya, A.G. Parlos, New results on recurrent network training:
unifying the algorithms and accelerating convergence, IEEE
7. CONCLUSION Trans. Neural Netw. 11 (2000), 697–709.
[7] XILINX, Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit,
We were able to successfully adapt the circuit to enhance ESN com- available from: https://www.xilinx.com/products/boards-and-
putation in the FPGA. As a result, high-speed computation was kits/ek-u1-zcu102-g.html (accessed December 1, 2019).
62 K. Honda and H. Tamukoh / Journal of Robotics, Networking and Artificial Life 7(1) 58–62

AUTHORS INTRODUCTION

Mr. Kentaro Honda Associate Prof. Hakaru Tamukoh


He received Master of Engineering degree He received the B.Eng. degree from Miyazaki
from Kyushu Institute of Technology in University, Japan, in 2001. He received the
2019. His research interests include neural M.Eng. and the PhD degree from Kyushu
network and digital circuit. Institute of Technology, Japan, in 2003 and
2006, respectively. He was a postdoctoral
research fellow of 21st century center of
excellent program at Kyushu Institute of
Technology, from April 2006 to September
2007. He was an Assistant Professor of Tokyo University of
Agriculture and Technology, from October 2007 to January
2013. He is currently an Associate Professor in the Graduate
School of Life Science and System Engineering, Kyushu
Institute of Technology, Japan. His research interest includes
hardware/software complex system, digital hardware design,
neural networks, soft-computing and home service robots. He
is a member of IEICE, SOFT, JNNS, IEEE, JSAI and RSJ.

You might also like