Overcoming Computational Errors in Sensing Platforms Through Embedded Machine-Learning Kernels
Overcoming Computational Errors in Sensing Platforms Through Embedded Machine-Learning Kernels
Overcoming Computational Errors in Sensing Platforms Through Embedded Machine-Learning Kernels
8, AUGUST 2015
1459
I. I NTRODUCTION
Manuscript received May 23, 2013; revised December 21, 2013 and
June 13, 2014; accepted July 21, 2014. Date of publication August 8, 2014;
date of current version July 22, 2015. This work was supported in part
by Semiconductor Research Corporation, in part by the National Science
Foundation under Grant CCF-1253670, in part by MARCO, and in part by the
Defense Advanced Research Projects Agency, through the Center for Future
Architectures Research and Systems on Nanoscale Information Fabrics.
Z. Wang and N. Verma are with the Department of Electrical
Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail:
[email protected]; [email protected]).
K. H. Lee was with the Department of Electrical Engineering, Princeton
University, Princeton, NJ 08544 USA, and now with Samsung Research
America, Dallas, TX, USA (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI.2014.2342153
1063-8210 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1460
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 8, AUGUST 2015
WANG et al.: OVERCOMING COMPUTATIONAL ERRORS IN SENSING PLATFORMS THROUGH EMBEDDED ML KERNELS
1461
Fig. 2. SVM decision function and effective decision boundary in the feature
space using (a) linear kernel and (b) nonlinear kernel to increase the flexibility
of the effective decision boundary.
Fig. 3.
Architecture where a small kernel of fault-protected hardware
(shown shaded) enables resilient system-level performance; estimates of
block gate counts from our system are shown (MCU used is MSP430 from
OpenCores [26]).
1462
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 8, AUGUST 2015
Fig. 5.
Feature vectors for a seizure-detection system (shown in twodimensions via PCA for visualization). (a) Distribution variances are only due
to the application signals and are modeled by the original decision boundary.
(b) Variances are also due to computational errors, making a new decision
boundary necessary by training on the error-affected data.
WANG et al.: OVERCOMING COMPUTATIONAL ERRORS IN SENSING PLATFORMS THROUGH EMBEDDED ML KERNELS
1463
error-free feature extraction and classification. For this classification, a model can be used that addresses the expected
statistics generated due to the application signals alone; such
a model is generic across instances of the system and can be
derived a priori since it is not impacted by the errors caused by
a particular instance of a fault-affected processor. Though the
auxiliary system does not provide perfect labels, it provides
labels that are accurate up to the classification performance
that is achievable by an error-free system. As described in
Section V-A, we find that estimated labels thus enable performance very close to that of perfect labels. As mentioned above,
the implementation of error-free feature extractor on the MCU
incurs a high energy cost; thus, this is only done during onetime or infrequent training to generate the estimated labels,
allowing the energy to be amortized over the real-time operation of the fault-affected processor. Note, in this paper, where
we consider permanent faults, training is performed only once.
In general, various metrics can be considered for detecting a
change in the data statistics to trigger retraining; for instance,
changes in the rate with which feature vectors fall near the
classification boundary can be monitored (i.e., by computing
histograms of the marginal distance [39]). However, such
approaches for triggering retraining are not covered in this
paper.
As shown in Fig. 6, the auxiliary system performs classification on input data in parallel with processing through the
fault-affected processor. The training labels and training data
are thus provided by the auxiliary system and fault-affected
processor, respectively, to an embedded trainer that can operate
at low speed to construct an error-aware model. While in the
demonstrated systems, the auxiliary system and the trainer are
implemented by the fault-protected MCU, in voltage-scaled
systems (such as [15]), these can be implemented via a lowerror mode.
In addition to the need for infrequent training, active learning is used within the architecture to further minimize the
duty cycle and hardware complexity of the auxiliary system.
Active learning [40] is an approach where the optimal training
data is selected from an available pool to reduce the training
and labeling effort required. A common metric used to select
training instances is the marginal distance from an initial
decision boundary [39]. This metric is implicitly computed
by the classification of kernels shown in Fig. 2 (the marginal
distance, thus computed, is actually compared with a threshold to make classification decisions). In this paper, we also
1464
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 8, AUGUST 2015
Fig. 8. Feature vector distributions for (a) baseline case without errors and
(b) case with errors, where the MI in the resulting data is degraded.
Fig. 7.
(a) Active-learning system, where the permanent, fault-affected
system chooses data to be used for training. (b) Error-affected feature vectors
selected during one iteration using a marginal-distance criterion. Note that
though other feature vectors appear closer to the decision boundary, this is in
fact an artifact of PCA.
H(Y|X) =
xX
p(x)
(3)
y=1,1
WANG et al.: OVERCOMING COMPUTATIONAL ERRORS IN SENSING PLATFORMS THROUGH EMBEDDED ML KERNELS
1465
Fig. 9.
FPGA based experimentation and demonstration flow to enable
controllable scaling of fault rate as well as multiple gate-level implementations
for randomized fault locations within the circuit.
Fig. 11. FPGA test setup (with Xilinx ML509 Virtex 5 boards) for the DUT
system and an Ethernet transceiver (enabling data exchange with PC).
by editing the netlist to introduce multiplexers on randomlyselected output nodes. As shown in Fig. 10, the multiplexers
can be configured via the faultCtrl signal to assert a logic 0/1
at the output, depending on the faultVal signal; stuck-at-0/1
faults are thus represented. Following the introduction of
multiplexers, a fault-control module is introduced to enable
control of the faultCtrl and faultVal signals following FPGA
mapping. Using this approach, faults can be injected at a
controllable rate, and multiple instances of the circuit can be
tested at each fault rate to consider the impact of various fault
locations within the circuit. In our experiments, we perform
tests on five instances of the fault-affected circuit for each fault
rate. The faults are static for each test, set by configuring static
values for the faultCtrl and faultVal signals. The final netlist,
with circuits for injecting faults (multiplexers and fault-control
module), is then implemented on an FPGA, designated as
the device-under-test (DUT) FPGA. A second FPGA serves
simply as an Ethernet transceiver to stream input data from
a host PC and to capture output data to the host PC for post
processing and analysis. Fig. 11 shows the setup, wherein two
Xilinx Virtex 5 FPGAs are used.
V. E XPERIMENTAL D EMONSTRATIONS AND R ESULTS
This section presents the applications and systems used to
demonstrate the DDHR concept. It then presents results from
the hardware experiments. Finally, the results are analyzed
using the metric of MI.
A. Applications for Demonstration
Two application systems for sensor-data classification are
used to demonstrate the DDHR concept: EEG-based seizure
detection and ECG-based cardiac-arrhythmia detection. The
architectural and technical details are presented below.
1) EEG-Based Seizure Detector: The processor for the
seizure-detection system is shown in Fig. 12(a). The system is based on a state-of-the-art detection algorithm
1466
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 8, AUGUST 2015
Fig. 13. Bit-error statistics of the computed feature vectors, showing the
BERs for different fault levels and the RMS of the feature errors (normalized
to the RMS of the true feature values) for (a) and (b) seizure-detection system
and (c) and (d) arrhythmia-detection system.
Fig. 12.
Processors for (a) EEG-based seizure detection and
(b) ECG-based cardiac arrhythmia detection used in our demonstrations (for
the seizure detector, two of the shown channels are used). Both designs are
implemented and synthesized from RTL, with the shaded blocks protected
during fault injection (these account for 7% for the seizure detector and
31% for the cardiac arrhythmia detector).
WANG et al.: OVERCOMING COMPUTATIONAL ERRORS IN SENSING PLATFORMS THROUGH EMBEDDED ML KERNELS
1467
Fig. 14. Performance of the systems with respect to the fault rates, for the cases with and without DDHR (five instances of the system are tested at each
fault rate). (a)(c) Seizure detector performance. (d) and (e) Cardiac-arrhythmia detector performance (where the true-negative rate is set at 95% for all test
cases). DDHR consistently restores system performance up to a fault level of 20 nodes for the seizure detector and 480 nodes for the cardiac-arrhythmia
detector.
Fig. 15. Output histograms from the SVM classifier for seizure-detector
test case. (a) Case of the baseline detector without errors. (b) Case of a
detector with errors (due to twelve faults), but without DDHR. (c) Case with
errors (due to twelve faults) but with DDHR. The errors initially degrade the
separation between the class distributions, but DDHR restores the separation
enabling classification.
1468
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 8, AUGUST 2015
Fig. 17. Scatter plots of TP rate versus MI and PMI, achieved through DDHR
within the (a) and (b) seizure-detection system and (c) and (d) arrhythmiadetection system (note that parameters are intentionally set for a true negative
rate of 98% for the seizure detector and 95% for cardiac-arrhythmia detector
to facilitate comparison across all cases). A threshold of 0.02 bits and 0.5 bits
can be used, respectively, to indicate when DDHR will successfully restore
system performance.
WANG et al.: OVERCOMING COMPUTATIONAL ERRORS IN SENSING PLATFORMS THROUGH EMBEDDED ML KERNELS
1469
1470
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 8, AUGUST 2015