Approximate Multiplier
Approximate Multiplier
Approximate Multiplier
Abstract— Improving the accuracy of a neural network (NN) data mining, and neural network (NN)-based classifiers are
usually requires using larger hardware that consumes more especially required for computational resources. Approximate
energy. However, the error tolerance of NNs and their appli- computing is an emerging design paradigm that can reduce
cations allow approximate computing techniques to be applied
to reduce implementation costs. Given that multiplication is the the system cost without reducing the system effectiveness.
most resource-intensive and power-hungry operation in NNs, It leverages the inherent error tolerance of many applications,
more economical approximate multipliers (AMs) can significantly such as machine learning, multimedia processing, pattern
reduce hardware costs. In this article, we show that using recognition, and computer vision, to allow some accuracy to be
AMs can also improve the NN accuracy by introducing noise. traded off to save hardware cost [1]. NNs are now recognized
We consider two categories of AMs: 1) deliberately designed
and 2) Cartesian genetic programing (CGP)-based AMs. The as providing the most effective solutions to many challenging
exact multipliers in two representative NNs, a multilayer percep- pattern recognition and machine learning tasks such as image
tron (MLP) and a convolutional NN (CNN), are replaced with classification [2]. Due to their intrinsic error tolerance char-
approximate designs to evaluate their effect on the classification acteristics and high computation and implementation costs,
accuracy of the Mixed National Institute of Standards and there is increasing interest in using approximation in NNs.
Technology (MNIST) and Street View House Numbers (SVHN)
data sets, respectively. Interestingly, up to 0.63% improvement in Approximation in the memories, where the synaptic weights
the classification accuracy is achieved with reductions of 71.45% are stored [3], approximation in the computation, such as using
and 61.55% in the energy consumption and area, respectively. approximate multipliers (AMs) [4], [5] and approximation
Finally, the features in an AM are identified that tend to make in neurons [6], [7], are all strategies that have already been
one design outperform others with respect to NN accuracy. Those reported in the literature.
features are then used to train a predictor that indicates how well
an AM is likely to work in an NN. Given that multipliers are the main bottleneck of NNs
[8]–[10], this article focuses on the use of AMs in NNs.
Index Terms— Approximate multipliers (AMs), Cartesian The work in [11] showed that using approximate adders
genetic programing (CGP), convolutional NN (CNN), multi-layer
perceptron (MLP), neural networks (NNs). (with reasonable area and power savings) has an unacceptable
negative impact on the performance of NNs, so only exact
adders are used in this article.
I. I NTRODUCTION Several AMs have been proposed in the literature that
decrease the hardware cost, while maintaining acceptably
T HE increasing energy consumption of computer systems
still remains a serious challenge in spite of advances in
energy-efficient design techniques. Today’s computing systems
high accuracy. We divide the AMs into two main categories:
1) deliberately designed multipliers, which include designs that
are increasingly used to process huge amounts of data, and are obtained by making some changes in the truth table of
they are also expected to present computationally demanding the exact designs [12] and 2) Cartesian genetic programing
natural human interfaces. For example, pattern recognition, (CGP)-based multipliers, which are designs that are generated
automatically using the CGP heuristic algorithm [13]. Note
Manuscript received June 2, 2019; revised August 4, 2019; accepted that there are other classes of AMs that are based on analog
September 3, 2019. This work was supported in part by the Natural Sci-
ences and Engineering Research Council of Canada (NSERC) under Project mixed-signal processing [14], [15]. However, they are not
RES0018685 and Project RES0025211; and in part by the INTER-COST considered in this article since our focus is on digital design
under project LTC18053. (Corresponding author: Mohammad Saeed Ansari.) that is more flexible in implementation than analog-/mixed-
M. S. Ansari, B. F. Cockburn, and J. Han are with the Department
of Electrical and Computer Engineering, University of Alberta, Edmonton, signal-based designs.
AB T6G 1H9, Canada (e-mail: [email protected]; [email protected]; There is a tradeoff between the accuracy and the hardware
[email protected]). cost, and there is no single best design for all applications.
V. Mrazek, L. Sekanina, and Z. Vasicek are with the IT4Innovations
Centre of Excellence, Faculty of Information Technology, Brno University Thus, selecting the appropriate AM for any specific appli-
of Technology, 612 66 Brno, Czech Republic (e-mail: [email protected]; cation is a complex question that typically requires careful
[email protected]; [email protected]). consideration of multiple alternative designs. In this article,
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. the objective is to find the AMs that improve the performance
Digital Object Identifier 10.1109/TVLSI.2019.2940943 of an NN, i.e., by reducing the hardware cost while preserving
1063-8210 c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
an acceptable output accuracy. To the best of our knowledge, where N denotes the number of neurons in layer l − 1 and
this article is the first that attempts to find the critical features wi j,l−1 denotes the connection weight between the neuron i
of an AM that make it superior to others for use in an NN. in layer l − 1 and the neuron j in layer l [2].
Our benchmark multipliers, including 500 CGP-based AMs SVHN is a data set of house digit images taken from Google
and 100 variants of deliberately designed multipliers, are eval- Street View images [18]. The data set contains 73 257 images
uated for two standard NNs: a multi-layer perceptron (MLP) for training and 26 032 images for testing. Each digit is
that classifies the MNIST data set [16] and a convolutional NN represented as a pair of a 32 × 32 RGB image and its label.
(CNN), LeNet-5 [17], that classifies the SVHN data set [18]. We used LeNet-5 [17] to classify this data set. This CNN
After each network is trained while using double-precision consists of two sets of convolutional and average pooling
floating-point exact multipliers, the accurate multipliers are layers, followed by a third convolutional layer, and then a
replaced with one approximate design (selected from the set fully-connected layer. It also uses ReLU AF, which simply
of benchmark multipliers), and then five steps of retraining are implements max(0, x). The convolutional and fully connected
performed. This process is repeated for each of the benchmark layers account for 98% of all the multiplications [13], therefore
multipliers, resulting in 600 variants for each of the two con- approximation is applied only to these layers. In order to
sidered NNs. The retraining is done for each AM only once. reduce the complexity, we converted the original 32 × 32 RGB
Then, the inference is performed to evaluate the accuracy. images to 32 × 32 grayscale images using the standard “luma”
Since the simulations always start from the same point, i.e., we mapping [13]
run the retraining steps on the pre-trained network (with exact
multipliers), there is no randomness, and therefore the results Y = 0.299 × R + 0.587 × G + 0.114 × B (2)
will be consistent if the simulation is repeated. where R, G, and B denote the intensities of red, green, and
The rest of this article is organized as follows. Section II blue additive primaries, respectively.
specifies the considered networks and different types of To train an NN, the synaptic weights are initialized to
AMs. Section III evaluates the considered multipliers from random values. Then, the network is trained by using the
two perspectives: 1) application-independent metrics and standard backpropagation-based supervised learning method.
2) application-dependent metrics, and discusses the implica- During the training process, the weights are adjusted to
tions of the results. Section IV is devoted to feature selection reduce the error. Instead of starting the training with ran-
and describes how the most critical features in an AM can dom initial weights, one can use the weights of a previ-
be identified. Section V discusses the error and hardware ously trained network. Initializing the weights in this way
characteristics of the AMs and recommends the five best AMs. is referred to as using a pre-trained network [2]. Note that
For further performance analysis, these five multipliers are a pretrained network can be retrained and used to perform
then used to implement an artificial neuron. Finally, Section VI a different task on a different data set. Usually, only a few
summarizes and concludes this article. steps of retraining are required to fine-tune the pre-trained
network.
II. P RELIMINARIES
This section provides background information on the two B. Approximate Multipliers
benchmark NNs and describes the considered AMs.
Through comprehensive simulations, we confirmed that
8-bit multipliers are just wide enough to provide reasonable
A. Employed Neural Networks and Data Sets
accuracies in NNs [10], [20]. Therefore, only 8-bit versions
MNIST (Mixed National Institute of Standards and Tech- of the approximate multipliers were evaluated in this article.
nology) is a data set of handwritten numbers that consists 1) Deliberately Designed Approximate Multipliers: Delib-
of a training set of 60 000 and a test set of 10 000 28 × 28 erately designed AMs are obtained by making carefully chosen
images and their labels [16]. We used an MLP network with simplifying changes in the truth table of the exact multiplier.
784 input neurons (one for each pixel of the monochrome In general, there are three ways of generating AMs [12], [21]:
image), 300 neurons in the hidden layer, and ten output 1) approximation in generating the partial products, such as
neurons, whose outputs are interpreted as the probability the under-designed multiplier (UDM) [22]; 2) approximation
of each of the classification into ten target classes (digits in the partial product tree, such as the broken-array multiplier
0 to 9) [16]. This MLP uses the sigmoid activation function (BAM) [23] and the error-tolerant multiplier (ETM) [24];
(AF). An AF introduces nonlinearity into the neuron’s output and 3) approximation in the accumulation of partial products,
and maps the resulting values onto either the interval [−1, 1] such as the inaccurate multiplier (ICM) [25], the approximate
or [0, 1] [19]. Using the sigmoid AF, the neuron j in layer l, compressor-based multiplier (ACM) [26], the AM [27], and
where 0 < l ≤ lmax , computes an AF of the weighted sum of the truncated AM (TAM) [28]. The other type of deliberately
its inputs, x j,l , as given by designed AM that is considered in this article is the recently
1 proposed alphabet set multiplier (ASM) [10].
x j,l =
1 + e−sum j,l Here, we briefly review the design of the deliberately
N designed AMs.
sum j,l = x i,l−1 × wi j,l−1 (1) The UDM [22] is designed based on an approximate 2 × 2
i=1 multiplier. This approximate 2 × 2 multiplier produces 1112,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ANSARI et al.: IMPROVING THE ACCURACY AND HARDWARE EFFICIENCY OF NNs USING APPROXIMATE MULTIPLIERS 3
instead of 10012 to save one output bit when both of the inputs TABLE I
are 112 . C ONSIDERED F EATURES OF THE E RROR F UNCTION
The BAM [23] omits the carry-save adders for the least
significant bits (LSBs) in an array multiplier in both the
horizontal and vertical directions. In other words, it truncates
the LSBs of the inputs to permit a smaller multiplier to be
used for the remaining bits.
The ETM [24] divides the inputs into separate MSB and
LSB parts that do not necessarily have equal widths. Every
bit position in the LSB part is checked from left to right and
if at least one of the two operands is 1, checking is stopped
and all of the remaining bits from that position onward are set
to 1. On the other hand, normal multiplication is performed
for the MSB part.
The ICM [25] uses an approximate (4:2) counter to build
III. E VALUATION OF A PPROXIMATE M ULTIPLIERS IN
AMs. The approximate 4-bit multiplier is then used to con-
N EURAL N ETWORKS
struct larger multipliers.
The ACM [26] is designed by using approximate 4:2 com- This section considers both application-dependent and
pressors. The two proposed approximate 4:2 compressors application-independent metrics to evaluate the effects of AMs
(AC1 and AC2) are used in a Dadda multiplier with four in NNs.
different schemes.
The AM [27] uses a novel approximate adder that generates A. Application-Independent Metrics
a sum bit and an error bit. The error of the multiplier is then Application-independent metrics measure the design fea-
alleviated by using the error bits. The truncated version of the tures that do not change from one application to another. Given
AM multiplier is called the TAM [28]. that AMs are digital circuits, these metrics can be either error
The ASM [10] decomposes the multiplicand into short bit or hardware metrics. Error function metrics are required for
sequences (alphabets) that are multiplied by the multiplier. the feature selection analysis.
Instead of multiplying the multiplier with the multiplicand, The main four error metrics are the error rate (ER), the error
some lower-order multiples of the multiplier are first calcu- distance (ED), the absolute ED (AED), and the relative ED
lated (by shift and add operations) and then some of those (RED). We evaluated all 600 multiplier designs using the
multiples are added in the output stage of the ASM [10]. nine features extracted from these four main metrics, as given
It should be noted that the ASM design was optimized for in Table I. All of the considered multipliers were implemented
use in NNs, and so it is not directly comparable to the in MATLAB and simulated over their entire input space,
other AMs considered in this article when used in other i.e., for all 256 × 256 = 65536 combinations.
applications. The definitions for most of these features are given in
Based on these main designs, variants were obtained by
changing the configurable parameter in each design, forming ED = E−A
A
a set of 100 deliberately designed approximate multipliers. For RED = 1 −
example, removing different carry-save adders from the BAM E
AED = |E − A|
multiplier results in different designs; also, the widths of the
MSB and LSB parts in the ETM multiplier can be varied to 1 N
RMSED = × (Ai − E i ) 2 (3)
yield different multipliers. N
i=1
2) CGP-Based Approximate Multipliers: Unlike the delib- 2
erately designed AMs, the CGP-based designs are generated 1 N
1 N
automatically using CGP [13]. Although several heuristic VarED = × EDi − × EDi .
N N
approaches have been proposed in the literature for approx- i=1 i=1
imating a digital circuit, we used CGP, since it is intrinsically Those that are not given in (3) are evident from the
multi-objective and has been successfully used to generate description. Note that E and A in (3) refer to the exact and
other high-quality approximate circuits [29]. approximate multiplication results, respectively. Also, note that
A candidate circuit in CGP is modeled as a 2-D array of the mean-/ variance-related features in Table I are measured
programable nodes. The nodes in this problem are the 2-input over the entire output domain of multipliers (N = 65536),
Boolean functions, i.e., AND, OR, XOR, and others. The initial i.e., 256 × 256 = 65536 cases for the employed eight-bit
population P of CGP circuits includes several designs of exact multipliers.
multipliers and a few circuits that are generated by performing Note that the variance and the root mean square (RMS) are
mutations on accurate designs. Single mutations (by randomly distinct metrics, as specified in (3). Specifically, the variance
modifying the gate function, gate input connection, and/or pri- measures the spread of the data around the mean, while the
mary output connections) are used to generate more candidate RMS measures the spread of the data around the best fit. In the
solutions. More details are provided in [13] and [29]. case of error metrics, the best possible fit is zero.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ANSARI et al.: IMPROVING THE ACCURACY AND HARDWARE EFFICIENCY OF NNs USING APPROXIMATE MULTIPLIERS 5
A. Feature Selection
Feature selection is a statistical way of removing less
relevant features that are not as important to achieving accurate
classification performance. There are many potential benefits
to feature selection including facilitating data understanding
and space dimensionality reduction [40], [41]. In this article,
feature selection algorithms are used to select a subset of mul-
tipliers’ error function features that are most useful for building
a good predictor. This predictor anticipates the behavior of an
AM in an NN. to be a necessary, but not a sufficient condition for better
Scikit-learn is a free machine learning tool that is widely accuracy.
used for feature selection [42]. It accepts an input data array Given that class 1 AMs tend to have smaller Var-ED and
and their corresponding labels to build an estimator that imple- RMS-ED values and the observation that double-sided errors
ments a fitting method. We used the three classifiers, recursive are necessary for a good AM, the difference in the error
feature elimination (RFE) [43], mutual information (MI) [44], magnitude should be small to meet the RMS-ED requirement
and Extra-Tree [45]. i.e., having small RMS-ED values. Moreover, since the error
The RFE classifier iteratively prunes the least important should be double-sided to have a small variance, these errors
features from the current set of features until the desired should be distributed around zero.
number of features is reached. The i th output of the RFE
corresponds to the ranking position of the i th feature, such that B. Training the Classifier
the selected (i.e., the estimated best) features are assigned a Now, having found the most important features of the error
rank of 1. Note that in RFE, the nested feature subsets contain function of an AM, we can use them to predict how well a
complementary features and are not necessarily individually given AM would work in an NN. In this section, we explain
the most relevant features [43]. MI is another useful fea- how to build a classifier that has the error features of an AM
ture selection technique that relies on nonparametric methods as inputs and predicts if it belongs to class 1 or class 0.
based on entropy estimation from the K -nearest neighbor 1) NN-Based Classifier: The error features of 500 randomly
distances, as described in [44]. Each feature is assigned a selected multipliers were used to train the NN-based classifier
score, where higher scores indicate more important features. and those of the 100 remaining multipliers were used as the
Finally, the tree-based estimators can also be used to compute test samples to obtain the classification accuracy of the trained
feature importance to discard less relevant features. Extra-Tree, model. We designed a three-layer MLP with 20 neurons in the
an extremely randomized tree classifier, is a practical classifier hidden layer and two neurons in the output layer (since we
that is widely used for feature selection [45]. Similar to MI, have two classes of multipliers). The number of neurons in the
the i th output of this classifier identifies the importance of the input layer equals the number of features that are considered
i th feature, such that the higher the output score, the more for classification. The number of considered multiplier error
important the feature is. features that were used as inputs to the NN-based classifier was
The results of each of the three aforementioned feature varied from 1 up to 9 (for nine features, in total, see Table I).
selection algorithms are provided in Table II. The results The resulting classification accuracies, plotted in Fig. 3, reflect
in Table II show that Var-ED is the most important feature how well the classifier classifies AMs into class 1 or class 0.
according to all three classifiers. RMS-ED is another important Note that when fewer than nine features are selected,
metric, i.e., the most important metric according to RFE, the combination of features giving the highest accuracy is
the second-most critical feature in MI, and the third-most sig- reported in Fig. 3. The combination of features is selected
nificant metric in Extra-Tree classifier. Our simulation results according to the results in Table II and is given in Table III.
show that the average value of the Var-ED and RMS-ED To choose two features, for example, the candidate features
features for class 0 multipliers are 20.21× and 6.42× greater are selected from the top-ranked ones in Table II: 1) Var-ED
than those of the class 1 AMs, respectively. and Mean-AED (by Extra-Tree); 2) Var-ED and RMS-ED
Other important features that have a good ranking in the (by MI); and 3) Mean-ED, Var-ED, and RMS-ED (by RFE).
three classifiers are MEAN-AED and VAR-AED. We also For these four features (i.e., Mean-ED, Var-ED, RMS-ED,
observed that the multipliers that produced better accuracies and Mean-AED), we consider all six possible combinations
in an NN than the exact multiplier (class 1 multipliers) all and report the results for the combination that gives the
have double-sided error functions. Thus, they overestimate the highest accuracy. Using the same process as in this example,
actual multiplication product for some input combinations and the feature combinations for which the accuracy is maximized
underestimate it for others. Having double-sided EDs seems were found, and are provided in Table III.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ANSARI et al.: IMPROVING THE ACCURACY AND HARDWARE EFFICIENCY OF NNs USING APPROXIMATE MULTIPLIERS 7
TABLE III
F EATURE C OMBINATIONS T HAT G IVE THE H IGHEST M ULTIPLIER
C LASSIFICATION A CCURACY
A. Error Analysis
Fig. 5 compares class 0 and class 1 multipliers with respect
to four important error features: Var-ED, RMS-ED, Mean-
AED, and Var-AED. This plot shows how the class 1 and
class 0 multipliers measure differently for the considered
features. As shown in Fig. 5, class 1 multipliers generally have
smaller Mean-AED, Var-ED, Var-AED, and RMS-ED values,
when compared to class 0 multipliers. It also shows, in the
zoomed-in insets, that some class 0 multipliers having smaller
Var-AED, RMS-ED, Mean-AED, and/or Var-ED values than
some class 1 multipliers is the reason why some multipliers
are misclassified by the classifiers.
B. Hardware Analysis
To further understand the quality of AMs, we performed a
hardware analysis. The main hardware metrics of a multiplier,
i.e., power consumption, area, and critical path delay, and PDP,
are considered in this analysis. Note that all of the considered
multipliers in this article are pure combinational circuits for
which the throughput is inversely proportional to the critical
path delay.
Fig. 6 shows two scatter plots that best distinguish the two
classes of AMs, which are area versus delay (see Fig. 6(a))
and power consumption versus delay (see Fig. 6(b)). Note that
only the results for the SVHN data set are shown as the results
for the MNIST are almost the same.
As the results in Fig. 6 show, unlike for the error metrics,
there is no clear general trend in the hardware metrics.
However, the designs with small delay and power consumption
are preferred for NN applications, as discussed next.
As AMs are obtained by simplifying the design of an exact
multiplier, more aggressive approximations can be used to
further reduce the hardware cost and energy consumption.
Fig. 4. NN accuracy using the same AMs for different data sets.
(a) Pareto-optimal design in PDP for the SVHN. (b) Behavior of SVHN As previously discussed, some multipliers have almost similar
Pareto-optimal multipliers for the MNIST. accuracies, while as shown in Fig. 4, they have different
hardware measures. The main reasons are as follows: 1) the
hardware cost of a digital circuit totally depends on how it is
the performance of a multiplier is application dependent. implemented in hardware; e.g., array and Wallace multipliers
To illustrate this claim, we have plotted the Pareto-optimal are both exact designs, and therefore they have the same
designs in power-delay product (PDP) for the SVHN data set classification accuracy. However, they have different hardware
using all 600 AMs in Fig. 4(a). costs and 2) the classification accuracy of NNs is application
Fig. 4(b) shows the performance of the Pareto-optimal dependent and it depends on the network type, the data set,
multipliers in PDP for the SVHN data set for the MNIST the learning algorithm, and the number of training iterations.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ANSARI et al.: IMPROVING THE ACCURACY AND HARDWARE EFFICIENCY OF NNs USING APPROXIMATE MULTIPLIERS 9
Fig. 6. Hardware comparison between class 0 and class 1 AMs. (a) Area
versus delay for class 1 and class 0 AMs. (b) Power versus delay for class 1
and class 0 AMs.
ANSARI et al.: IMPROVING THE ACCURACY AND HARDWARE EFFICIENCY OF NNs USING APPROXIMATE MULTIPLIERS 11
[5] M. Courbariaux, Y. Bengio, and J.-P. David, “Training deep neural [29] Z. Vasicek and L. Sekanina, “Evolutionary approach to approximate
networks with low precision multiplications,” 2014, arXiv:1412.7024. digital circuits design,” IEEE Trans. Evol. Comput., vol. 19, no. 3,
[Online]. Available: https://arxiv.org/abs/1412.7024 pp. 432–444, Jun. 2015.
[6] S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN: [30] (2016). EvoApprox8b—Approximate Adders and Multipliers Library.
Energy-efficient neuromorphic systems using approximate computing,” [Online]. Available: http://www.fit.vutbr.cz/research/groups/ehw/
in Proc. Int. Symp. Low Power Electron. Design, 2014, pp. 27–32. approxlib/
[7] Q. Zhang, T. Wang, Y. Tian, F. Yuan, and Q. Xu, “ApproxANN: An [31] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
approximate computing framework for artificial neural network,” in R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
Proc. Design, Autom. Test Eur. Conf. Exhib., 2015, pp. 701–706. from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
[8] M. Marchesi, G. Orlandi, F. Piazza, and A. Uncini, “Fast neural networks 2014.
without multipliers,” IEEE Trans. Neural Netw., vol. 4, no. 1, pp. 53–62, [32] C. S. Leung, H.-J. Wang, and J. Sum, “On the selection of weight decay
Jan. 1993. parameter for faulty networks,” IEEE Trans. Neural Netw., vol. 21, no. 8,
[9] Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio, “Neural networks pp. 1232–1244, Aug. 2010.
with few multiplications,” 2015, arXiv:1510.03009. [Online]. Available: [33] Y. Shao, G. N. Taff, and S. J. Walsh, “Comparison of early stopping
https://arxiv.org/abs/1510.03009 criteria for neural-network-based subpixel classification,” IEEE Geosci.
[10] S. S. Sarwar, S. Venkataramani, A. Ankit, A. Raghunathan, and K. Roy, Remote Sens. Lett., vol. 8, no. 1, pp. 113–117, Jan. 2011.
“Energy-efficient neural computing with approximate multipliers,” ACM [34] Y. Luo and F. Yang. (2014). Deep Learning With Noise. [Online].
J. Emerg. Technol. Comput. Syst., vol. 14, no. 2, 2018, Art. no. 16. Available: hp://www.andrew.cmu.edu/user/fanyang1/deep-learning-with-
[11] H. R. Mahdiani, M. H. S. Javadi, and S. M. Fakhraie, “Efficient utiliza- noise.pdf
tion of imprecise computational blocks for hardware implementation of [35] N. Nagabushan, N. Satish, and S. Raghuram, “Effect of injected noise
imprecision tolerant applications,” Microelectron. J., vol. 61, pp. 57–66, in deep neural networks,” in Proc. Int. Conf. Comput. Intell. Comput.
Mar. 2017. Res., 2016, pp. 1–5.
[12] H. Jiang, C. Liu, L. Liu, F. Lombardi, and J. Han, “A review, classifi- [36] T. He, Y. Zhang, J. Droppo, and K. Yu, “On training bi-directional
cation, and comparative evaluation of approximate arithmetic circuits,” neural network language model with noise contrastive estimation,” in
ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 4, p. 60, Aug. 2017. Proc. 10th Int. Symp. Chin. Spoken Lang. Process., 2016, pp. 1–5.
[13] V. Mrazek, S. S. Sarwar, L. Sekanina, Z. Vasicek, and K. Roy, “Design [37] A. F. Murray and P. J. Edwards, “Enhanced MLP performance and fault
of power-efficient approximate multipliers for approximate artificial tolerance resulting from synaptic weight noise during training,” IEEE
neural networks,” in Proc. 35th Int. Conf. Comput.-Aided Design, 2016, Trans. Neural Netw., vol. 5, no. 5, pp. 792–802, Sep. 1994.
pp. 1–7. [38] J. Sum, C.-S. Leung, and K. Ho, “Convergence analyses on on-line
[14] E. H. Lee and S. S. Wong, “Analysis and design of a passive switched- weight noise injection-based training algorithms for MLPs,” IEEE Trans.
capacitor matrix multiplier for approximate computing,” IEEE J. Solid- Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1827–1840, Nov. 2012.
State Circuits, vol. 52, no. 1, pp. 261–271, Jan. 2017. [39] K. Ho, C.-S. Leung, and J. Sum, “Objective functions of online weight
[15] S. Gopal et al., “A spatial multi-bit sub-1-V time-domain matrix mul- noise injection training algorithms for MLPs,” IEEE Trans. Neural
tiplier interface for approximate computing in 65-nm CMOS,” IEEE J. Netw., vol. 22, no. 2, pp. 317–323, Feb. 2011.
Emerg. Sel. Topics Circuits Syst., vol. 8, no. 3, pp. 506–518, Sep. 2018. [40] I. Guyon, S. Gunn, A. Ben-Hur, and G. Dror, “Result analysis of
the NIPS 2003 feature selection challenge,” in Proc. Adv. Neural Inf.
[16] Y. LeCun, C. Cortes, and C. Burges. (2010). MNIST handwritten
Process. Syst., 2005, pp. 545–552.
digit database. AT&T Labs. [Online]. Available: http://yann.lecun.com/
[41] I. Guyon and A. Elisseeff, “An introduction to variable and feature
exdb/mnist
selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, Jan. 2003.
[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
[42] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach.
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
pp. 2278–2324, Nov. 1998.
[43] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for
[18] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, cancer classification using support vector machines,” Mach. Learn.,
“Reading digits in natural images with unsupervised feature learning,” in vol. 46, nos. 1–3, pp. 389–422, 2002.
Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011, [44] A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual
p. 5. information,” Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip.
[19] R. J. Schalkoff, Artificial Neural Networks, vol. 1. New York, NY, USA: Top., vol. 69, no. 6, 2004, Art. no. 066138.
McGraw-Hill, 1997. [45] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,”
[20] N. P. Jouppi et al., “In-datacenter performance analysis of a tensor Mach. Learn., vol. 63, no. 1, pp. 3–42, 2006.
processing unit,” in Proc. ACM/IEEE 44th Annu. Int. Symp. Comput. [46] MathWorks. MATLAB Classification Learner App. Accessed: Oct. 1,
Archit. (ISCA), 2017, pp. 1–12. 2019. [Online]. Available: https://www.mathworks.com/help/stats/
[21] M. S. Ansari, H. Jiang, B. F. Cockburn, and J. Han, “Low-power classificationlearner-app.html
approximate multipliers using encoded partial products and approximate [47] (2015). ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
compressors,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 3, [Online]. Available: http://www.image-net.org/challenges/LSVRC/
pp. 404–416, Sep. 2018. [48] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
[22] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
with an underdesigned multiplier architecture,” in Proc. 24th Int. Conf. Process. Syst., 2012, pp. 1097–1105.
VLSI Design, 2011, pp. 346–351.
[23] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired
imprecise computational blocks for efficient VLSI implementation of
soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 57, no. 4, pp. 850–862, Apr. 2010.
[24] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed
multiplier for error-tolerant application,” in Proc. Int. Conf. Electron
Devices Solid-State Circuits, 2010, pp. 1–4.
[25] C.-H. Lin and I.-C. Lin, “High accuracy approximate multiplier with Mohammad Saeed Ansari (S’16) received the
error correction,” in Proc. 31st Int. Conf. Comput. Design, Oct. 2013, B.Sc. and M.Sc. degrees in electrical and electronic
pp. 33–38. engineering from Iran University of Science and
[26] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and Technology, Tehran, Iran, in 2013 and 2015, respec-
analysis of approximate compressors for multiplication,” IEEE Trans. tively. He is currently working toward the Ph.D.
Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015. degree in electrical and computer engineering at the
[27] C. Liu, J. Han, and F. Lombardi, “A low-power, high-performance University of Alberta, Edmonton, AB, Canada.
approximate multiplier with configurable partial error recovery,” in Proc. His current research interests include approxi-
Design, Autom. Test Eur. Conf. Exhib., 2014, pp. 1–4. mate computing, design of computing hardware for
[28] H. Jiang, J. Han, F. Qiao, and F. Lombardi, “Approximate radix-8 booth emerging machine learning applications, multilayer
multipliers for low-power and high-performance operation,” IEEE Trans. perceptrons (MLPs), convolutional NNs (CNNs) in
Comput., vol. 65, no. 8, pp. 2638–2644, Aug. 2016. particular, and reliability and fault tolerance.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Vojtech Mrazek (M’18) received the Ing. and Ph.D. Zdenek Vasicek received the Ing. and Ph.D. degrees
degrees in information technology from the Faculty in electrical engineering and computer science
of Information Technology, Brno University of Tech- from the Faculty of Information Technology, Brno
nology, Brno, Czech Republic, in 2014 and 2018, University of Technology, Brno, Czech Republic,
respectively. in 2006 and 2012, respectively.
He is currently a Researcher with the Evolv- He is currently an Associate Professor with the
able Hardware Group, Faculty of Information Faculty of Information Technology, Brno University
Technology, Brno University of Technology. He of Technology. His current research interests include
is also a Visiting Postdoctoral Researcher with evolutionary design and optimization of complex
the Department of Informatics, Institute of Com- digital circuits and systems.
puter Engineering, Technische Universität Wien Dr. Vasicek received the Silver and Gold medals
(TU Wien), Vienna, Austria. He has authored or coauthored over 30 con- at HUMIES, in 2011 and 2015, respectively.
ference/journal papers focused on approximate computing and evolvable
hardware. His current research interests include approximate computing,
genetic programming, and machine learning.
Dr. Mrazek received several awards for his research in approximate com-
puting, including the Joseph Fourier Award for research in computer science
and engineering in 2018.