Skip to main content

ABHIROOP BHATTACHARJEE

BITS Pilani, ELECTRICAL AND ELECTRONICS ENGINEERING, Undergraduate

Followers

0

Following

0

Public Views

InterestsView All (6)

Uploads

Papers by ABHIROOP BHATTACHARJEE

DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing

arXiv (Cornell University), Mar 30, 2023

XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars

arXiv (Cornell University), Feb 15, 2023

Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing

Proceedings of the Great Lakes Symposium on VLSI 2023

In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate... more In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area-& compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently encountered non-idealities during inference include crossbar circuit-level parasitic resistances and device-level non-idealities such as stochastic read noise and temporal drift. In this work, our goal is to closely examine the distortions caused by these nonidealities on the dot-product operations in analog crossbars and explore the feasibility of a nearly training-less solution via crossbaraware fine-tuning of batchnorm parameters in real-time to mitigate the impact of the non-idealities. This enables reduction in hardware costs in terms of memory and training energy for IMC noise-aware retraining of the DNN weights on crossbars. CCS CONCEPTS • Hardware → Emerging architectures.

DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

arXiv (Cornell University), Feb 9, 2023

With ever increasing depth and width in deep neural networks to achieve state-of-the-art performa... more With ever increasing depth and width in deep neural networks to achieve state-of-the-art performance, deep learning computation has significantly grown, and dot-products remain dominant in overall computation time. Most prior works are built on conventional dot-product where weighted input summation is used to represent the neuron operation. However, another implementation of dot-product based on the notion of angles and magnitudes in the Euclidean space has attracted limited attention. This paper proposes DeepCAM, an inference accelerator built on two critical innovations to alleviate the computation time bottleneck of convolutional neural networks. The first innovation is an approximate dot-product built on computations in the Euclidean space that can replace addition and multiplication with simple bit-wise operations. The second innovation is a dynamic size content addressable memory-based (CAM-based) accelerator to perform bit-wise operations and accelerate the CNNs with a lower computation time. Our experiments on benchmark image recognition datasets demonstrate that DeepCAM is up to 523× and 3498× faster than Eyeriss and traditional CPUs like Intel Skylake, respectively. Furthermore, the energy consumed by our DeepCAM approach is 2.16× to 109× less compared to Eyeriss.

XploreNAS : Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars

ACM Transactions on Embedded Computing Systems

Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acc... more Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute-efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to severe security threats in their large-scale deployment. Thus, finding adversarially robust DNN architectures for non-ideal crossbars is critical to the safe and secure deployment of DNNs on the edge. This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS that searches for hardware-efficient & adversarially robust neural architectures for non-ideal crossbar platforms. We use the one-shot Neural Architecture Search (NAS) approach to train a large Supernet with crossbar-awareness and sample adversarially robust Subnets therefrom, maintaining competitive h...

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

Cornell University - arXiv, Oct 23, 2022

Spiking Neural Networks (SNNs) are an active research domain towards energy efficient machine int... more Spiking Neural Networks (SNNs) are an active research domain towards energy efficient machine intelligence. Compared to conventional artificial neural networks (ANNs), SNNs use temporal spike data and bio-plausible neuronal activation functions such as Leaky-Integrate Fire/Integrate Fire (LIF/IF) for data processing. However, SNNs incur significant dot-product operations causing high memory and computation overhead in standard von-Neumann computing platforms. To this end, In-Memory Computing (IMC) architectures have been proposed to alleviate the "memory-wall bottleneck" prevalent in von-Neumann architectures. Although recent works have proposed IMC-based SNN hardware accelerators, the following key implementation aspects have been overlooked 1) the adverse effects of crossbar non-ideality on SNN performance due to repeated analog dot-product operations over multiple time-steps 2) hardware overheads of essential SNN-specific components such as the LIF/IF and data communication modules. To this end, we propose SpikeSim, a tool that can perform realistic performance, energy, latency and area evaluation of IMCmapped SNNs. SpikeSim consists of a practical monolithic IMC architecture called SpikeFlow for mapping SNNs. Additionally, the non-ideality computation engine (NICE) and energy-latencyarea (ELA) engine performs hardware-realistic evaluation of SpikeFlow-mapped SNNs. Based on 65nm CMOS implementation and experiments on CIFAR10, CIFAR100 and TinyImagenet datasets, we find that the LIF/IF neuronal module has significant area contribution (> 11% of the total hardware area). To this end, we propose SNN topological modifications that leads to 1.24× and 10× reduction in the neuronal module's area and the overall energy-delay-product value, respectively. Furthermore, in this work, we perform a holistic comparison between IMC implemented ANN and SNNs and conclude that lower number of time-steps are the key to achieve higher throughput and energyefficiency for SNNs compared to 4-bit ANNs. The code repository for the SpikeSim tool will be made available in this Github link.

Proceedings of the 59th ACM/IEEE Design Automation Conference

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and en... more Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neuralnetwork parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)-CIFAR10, CI-FAR100, and Fashion-MNIST, show that MIME achieves ∼ 3.48× memory-efficiency and ∼ 2.4 − 3.1× energy-savings compared to conventional multi-task inference in Pipelined task mode.

Examining the Robustness of Spiking Neural Networks on Non-ideal Memristive Crossbars

ACM/IEEE International Symposium on Low Power Electronics and Design

Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial N... more Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) owing to their asynchronous, sparse, and binary information processing. To improve the energy-efficiency and throughput, SNNs can be implemented on memristive crossbars where Multiply-and-Accumulate (MAC) operations are realized in the analog domain using emerging Non-Volatile-Memory (NVM) devices. Despite the compatibility of SNNs with memristive crossbars, there is little attention to study on the effect of intrinsic crossbar non-idealities and stochasticity on the performance of SNNs. In this paper, we conduct a comprehensive analysis of the robustness of SNNs on non-ideal crossbars. We examine SNNs trained via learning algorithms such as, surrogate gradient and ANN-SNN conversion. Our results show that repetitive crossbar computations across multiple timesteps induce error accumulation, resulting in a huge performance drop during SNN inference. We further show that SNNs trained with a smaller number of time-steps achieve better accuracy when deployed on memristive crossbars.

Rate Coding Or Direct Coding: Which One Is Better For Accurate, Robust, And Energy-Efficient Spiking Neural Networks?

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore vari... more Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes. Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system as they show state-of-the-art performance on largescale datasets. Despite their usage, there is little attention to comparing these two coding schemes in a fair manner. In this paper, we conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency. First, we compare the performance of two coding techniques with various architectures and datasets. Then, we measure the robustness of the coding techniques on two adversarial attack methods. Finally, we compare the energy-efficiency of two coding schemes on a digital hardware platform. Our results show that direct coding can achieve better accuracy especially for a small number of timesteps. In contrast, rate coding shows better robustness to adversarial attacks owing to the non-differentiable spike generation process. Rate coding also yields higher energy-efficiency than direct coding which requires multi-bit precision for the first layer. Our study explores the characteristics of two codings, which is an important design consideration for building SNNs 1 .

SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks

ArXiv, 2022

Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alterna... more Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. Recently, SNNs with backpropagation through time (BPTT) have achieved a higher accuracy result on image recognition tasks compared to other SNN training algorithms. Despite the success on the algorithm perspective, prior works neglect the evaluation of the hardware energy overheads of BPTT, due to the lack of a hardware evaluation platform for SNN training algorithm design. Moreover, although SNNs have been long seen as an energy-efficient counterpart of ANNs, a quantitative comparison between the training cost of SNNs and ANNs is missing. To address the above-mentioned issues, in this work, we introduce SATA (Sparsity-Aware Training Accelerator), a BPTT-based training accelerator for SNNs. The proposed SATA provides a simple and re-configurable accelerator architecture for the general-purpose hardware evaluation platform, which makes it easier to analyze the training energy for SNN training algorithms. Based on SATA, we show quantitative analyses on the energy efficiency of SNN training and make a comparison between the training cost of SNNs and ANNs. The results show that SNNs consume 1.27× more total energy with considering sparsity (spikes, gradient of firing function, and gradient of membrane potential) when compared to ANNs. We find that such high training energy cost is from time-repetitive convolution operations and data movements during backpropagation. Moreover, to guide the future SNN training algorithm design, we provide several observations on energy efficiency with respect to different SNN-specific training parameters.

A low power, charge-sensitive preamplifier integrated with a silicon nanowire biosensor

2021 IEEE Latin America Electron Devices Conference (LAEDC)

Charge amplifiers, characterized by their charge-sensitivity or charge gain, are essential compon... more Charge amplifiers, characterized by their charge-sensitivity or charge gain, are essential components of a transducer-interfacing system that amplify charge signals emerging from various sensors and convert them into voltage signals. Today, with increased scaling of MOSFETs, it becomes challenging to obtain the charge amplifiers with high charge sensitivity and low power. Furthermore, if the charge amplifier is to be operated at lower bandwidth, then obtaining low noise at higher charge sensitivity is also difficult. In this paper, the authors propose a novel design of a charge-sensitive preamplifier in 90 nm CMOS technology that can be operated at the frequency range of 10 Hz-10 kHz, suitable for biosignals at lower frequencies. The opamp in the preamplifier is designed to have a folded-cascode structure with composite cascoding at its input transistors for low power operation. A feedback resistance of 185.20 GΩ for the preamplifier is actively realized using long-channel cascode MOSFET stage in the opamp, thereby eliminating the need for a large value of passive resistance on-chip. The preamplifier has a high charge sensitivity of 8.875 mV/fC at a low power consumption of 214.32 nW with input-referred noise of 256.89 μV for the bandwidth 10 Hz-10 kHz. The preamplifier operation is verified by interfacing the model of the preamplifier with the small-signal equivalent of a SPICE model of a Silicon Nanowire Field-effect Transistor (SiNW-FET) based biosensor which was proposed for impedimetric sensing of biomolecules.

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and en... more Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)-CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ∼ 3.48× memory-efficiency and ∼ 2.4−3.1× energy-savings compared to conventional multitask inference in Pipelined task mode.

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021

As neural networks gain widespread adoption in embedded devices, there is a growing need for mode... more As neural networks gain widespread adoption in embedded devices, there is a growing need for model compression techniques to facilitate seamless deployment in resourceconstrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most quantization approaches take a fully trained model, then apply different heuristics to determine the optimal bit-precision for different layers of the network, and finally retrain the network to regain any drop in accuracy. Based on Activation Density-the proportion of non-zero activations in a layer-we propose a novel intraining quantization method. Our method calculates optimal bitwidth/precision for each layer during training yielding an energyefficient mixed precision model with competitive accuracy. Since we train lower precision models progressively during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for retraining. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures and report the accuracy and energy estimates for the same. We achieve up to 4.5× benefit in terms of estimated multiply-and-accumulate (MAC) reduction while reducing the training complexity by 50% in our experiments. To further evaluate the energy benefits of our proposed method, we develop a mixed-precision scalable Process In Memory (PIM) hardware accelerator platform. The hardware platform incorporates shift-add functionality for handling multibit precision neural network models. Evaluating the quantized models obtained with our proposed method on the PIM platform yields about 5× energy reduction compared to baseline 16-bit models. Additionally, we find that integrating activation density based quantization with activation density based pruning (both conducted during training) yields up to ∼198× and ∼44× energy reductions for VGG19 and ResNet18 architectures respectively on PIM platform compared to baseline 16-bit precision, unpruned models.

Efficiency-driven Hardware Optimization for Adversarially Robust Neural Networks

2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021

With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) er... more With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of Deep Neural Networks (DNNs) has become imperative. We will focus on how to address adversarial robustness for DNNs through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energyspending component for DNNs, hardware approaches in the past have focused on optimizing the memory. One such approach is approximate digital CMOS memories with hybrid 6T-8T SRAM cells that enable supply voltage (Vdd) scaling yielding low-power operation, without significantly affecting the performance due to read/write failures incurred in the 6T cells. In this paper, we show how the bit-errors in the 6T cells of hybrid 6T-8T memories minimize the adversarial perturbations in a DNN. Essentially, we find that for different configurations of 8T-6T ratios and scaled Vdd operation, noise incurred in the hybrid memory architectures is bound within specific limits. This hardware noise can potentially interfere in the creation of adversarial attacks in DNNs yielding robustness. Another memory optimization approach involves using analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements. However, crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of the DNNs. We will show how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization.

SwitchX- Gmin-Gmax Switching for Energy-Efficient and Robust Implementation of Binary Neural Networks on Memristive Xbars

ArXiv, 2020

Memristive crossbars can efficiently implement Binarized Neural Networks (BNNs) wherein the weigh... more Memristive crossbars can efficiently implement Binarized Neural Networks (BNNs) wherein the weights are stored in high-resistance states (HRS) and low-resistance states (LRS) of the synapses. We propose SwitchX mapping of weights onto crossbars such that the power consumed by the crossbars and the impact of crossbar non-idealities, that lead to degradation in computational accuracy, are minimized. Essentially, SwitchX maps the binary weights in such manner that the crossbar comprises of more HRS than LRS synapses. Increased HRS in a crossbar will decrease the overall output dot-product current and thus lead to power savings. Interestingly, BNNs mapped onto crossbars with SwitchX also exhibit better robustness against adversarial attacks than the corresponding software BNN baseline as well as the standard crossbar mapped BNNs. Finally, we combine SwitchX with state-aware training (that further increases the feasibility of HRS states during weight mapping) to boost the robustness and ...

Comprehensive Understanding of Silicon-Nanowire Field-Effect Transistor Impedimetric Readout for Biomolecular Sensing

Micromachines, 2020

Impedance sensing with silicon nanowire field-effect transistors (SiNW-FETs) shows considerable p... more Impedance sensing with silicon nanowire field-effect transistors (SiNW-FETs) shows considerable potential for label-free detection of biomolecules. With this technique, it might be possible to overcome the Debye-screening limitation, a major problem of the classical potentiometric readout. We employed an electronic circuit model in Simulation Program with Integrated Circuit Emphasis (SPICE) for SiNW-FETs to perform impedimetric measurements through SPICE simulations and quantitatively evaluate influences of various device parameters to the transfer function of the devices. Furthermore, we investigated how biomolecule binding to the surface of SiNW-FETs is influencing the impedance spectra. Based on mathematical analysis and simulation results, we proposed methods that could improve the impedimetric readout of SiNW-FET biosensors and make it more explicable.

Rethinking Non-idealities in Memristive Crossbars for Adversarial Robustness in Neural Networks

ArXiv, 2020

Deep Neural Networks (DNNs) have been shown to be prone to adversarial attacks. Memristive crossb... more Deep Neural Networks (DNNs) have been shown to be prone to adversarial attacks. Memristive crossbars, being able to perform Matrix-Vector-Multiplications (MVMs) efficiently, are used to realize DNNs on hardware. However, crossbar nonidealities have always been devalued since they cause errors in performing MVMs, leading to computational accuracy losses in DNNs. Several software-based defenses have been proposed to make DNNs adversarially robust. However, no previous work has demonstrated the advantage conferred by the crossbar nonidealities in unleashing adversarial robustness. We show that the intrinsic hardware non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization. We evaluate the adversarial resilience of state-of-the-art DNNs (VGG8 & VGG16 networks) using benchmark datasets (CIFAR-10, CIFAR-100 & Tiny Imagenet) across various crossbar sizes. We find that crossbar non-idealities unleash significantly greater adversarial robustness (> 10−20%) in crossbar-mapped DNNs than baseline software DNNs. We further assess the performance of our approach with other state-of-the-art efficiency-driven adversarial defenses and find that our approach performs significantly well in terms of reducing adversarial loss.

Examining and Mitigating the Impact of Crossbar Non-idealities for Accurate Implementation of Sparse Deep Neural Networks

Recently several structured pruning techniques have been introduced for energy-efficient implemen... more Recently several structured pruning techniques have been introduced for energy-efficient implementation of Deep Neural Networks (DNNs) with lesser number of crossbars. Although, these techniques have claimed to preserve the accuracy of the sparse DNNs on crossbars, none have studied the impact of the inexorable crossbar non-idealities on the actual performance of the pruned networks. To this end, we perform a comprehensive study to show how highly sparse DNNs, that result in significant crossbar-compression-rate, can lead to severe accuracy losses compared to unpruned DNNs mapped onto non-ideal crossbars. We perform experiments with multiple structured-pruning approaches (such as, C/F pruning, XCS and XRS) on VGG11 and VGG16 DNNs with benchmark datasets (CIFAR10 and CIFAR100). We propose two mitigation approaches - Crossbar column rearrangement and Weight-Constrained-Training (WCT) - that can be integrated with the crossbar-mapping of the sparse DNNs to minimize accuracy losses incu...

DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing

arXiv (Cornell University), Mar 30, 2023

XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars

arXiv (Cornell University), Feb 15, 2023

Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing

Proceedings of the Great Lakes Symposium on VLSI 2023

In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate... more In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area-& compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently encountered non-idealities during inference include crossbar circuit-level parasitic resistances and device-level non-idealities such as stochastic read noise and temporal drift. In this work, our goal is to closely examine the distortions caused by these nonidealities on the dot-product operations in analog crossbars and explore the feasibility of a nearly training-less solution via crossbaraware fine-tuning of batchnorm parameters in real-time to mitigate the impact of the non-idealities. This enables reduction in hardware costs in terms of memory and training energy for IMC noise-aware retraining of the DNN weights on crossbars. CCS CONCEPTS • Hardware → Emerging architectures.

DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

arXiv (Cornell University), Feb 9, 2023

With ever increasing depth and width in deep neural networks to achieve state-of-the-art performa... more With ever increasing depth and width in deep neural networks to achieve state-of-the-art performance, deep learning computation has significantly grown, and dot-products remain dominant in overall computation time. Most prior works are built on conventional dot-product where weighted input summation is used to represent the neuron operation. However, another implementation of dot-product based on the notion of angles and magnitudes in the Euclidean space has attracted limited attention. This paper proposes DeepCAM, an inference accelerator built on two critical innovations to alleviate the computation time bottleneck of convolutional neural networks. The first innovation is an approximate dot-product built on computations in the Euclidean space that can replace addition and multiplication with simple bit-wise operations. The second innovation is a dynamic size content addressable memory-based (CAM-based) accelerator to perform bit-wise operations and accelerate the CNNs with a lower computation time. Our experiments on benchmark image recognition datasets demonstrate that DeepCAM is up to 523× and 3498× faster than Eyeriss and traditional CPUs like Intel Skylake, respectively. Furthermore, the energy consumed by our DeepCAM approach is 2.16× to 109× less compared to Eyeriss.

XploreNAS : Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars

ACM Transactions on Embedded Computing Systems

Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acc... more Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute-efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to severe security threats in their large-scale deployment. Thus, finding adversarially robust DNN architectures for non-ideal crossbars is critical to the safe and secure deployment of DNNs on the edge. This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS that searches for hardware-efficient & adversarially robust neural architectures for non-ideal crossbar platforms. We use the one-shot Neural Architecture Search (NAS) approach to train a large Supernet with crossbar-awareness and sample adversarially robust Subnets therefrom, maintaining competitive h...

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

Cornell University - arXiv, Oct 23, 2022

Spiking Neural Networks (SNNs) are an active research domain towards energy efficient machine int... more Spiking Neural Networks (SNNs) are an active research domain towards energy efficient machine intelligence. Compared to conventional artificial neural networks (ANNs), SNNs use temporal spike data and bio-plausible neuronal activation functions such as Leaky-Integrate Fire/Integrate Fire (LIF/IF) for data processing. However, SNNs incur significant dot-product operations causing high memory and computation overhead in standard von-Neumann computing platforms. To this end, In-Memory Computing (IMC) architectures have been proposed to alleviate the "memory-wall bottleneck" prevalent in von-Neumann architectures. Although recent works have proposed IMC-based SNN hardware accelerators, the following key implementation aspects have been overlooked 1) the adverse effects of crossbar non-ideality on SNN performance due to repeated analog dot-product operations over multiple time-steps 2) hardware overheads of essential SNN-specific components such as the LIF/IF and data communication modules. To this end, we propose SpikeSim, a tool that can perform realistic performance, energy, latency and area evaluation of IMCmapped SNNs. SpikeSim consists of a practical monolithic IMC architecture called SpikeFlow for mapping SNNs. Additionally, the non-ideality computation engine (NICE) and energy-latencyarea (ELA) engine performs hardware-realistic evaluation of SpikeFlow-mapped SNNs. Based on 65nm CMOS implementation and experiments on CIFAR10, CIFAR100 and TinyImagenet datasets, we find that the LIF/IF neuronal module has significant area contribution (> 11% of the total hardware area). To this end, we propose SNN topological modifications that leads to 1.24× and 10× reduction in the neuronal module's area and the overall energy-delay-product value, respectively. Furthermore, in this work, we perform a holistic comparison between IMC implemented ANN and SNNs and conclude that lower number of time-steps are the key to achieve higher throughput and energyefficiency for SNNs compared to 4-bit ANNs. The code repository for the SpikeSim tool will be made available in this Github link.

Proceedings of the 59th ACM/IEEE Design Automation Conference

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and en... more Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neuralnetwork parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)-CIFAR10, CI-FAR100, and Fashion-MNIST, show that MIME achieves ∼ 3.48× memory-efficiency and ∼ 2.4 − 3.1× energy-savings compared to conventional multi-task inference in Pipelined task mode.

Examining the Robustness of Spiking Neural Networks on Non-ideal Memristive Crossbars

ACM/IEEE International Symposium on Low Power Electronics and Design

Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial N... more Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) owing to their asynchronous, sparse, and binary information processing. To improve the energy-efficiency and throughput, SNNs can be implemented on memristive crossbars where Multiply-and-Accumulate (MAC) operations are realized in the analog domain using emerging Non-Volatile-Memory (NVM) devices. Despite the compatibility of SNNs with memristive crossbars, there is little attention to study on the effect of intrinsic crossbar non-idealities and stochasticity on the performance of SNNs. In this paper, we conduct a comprehensive analysis of the robustness of SNNs on non-ideal crossbars. We examine SNNs trained via learning algorithms such as, surrogate gradient and ANN-SNN conversion. Our results show that repetitive crossbar computations across multiple timesteps induce error accumulation, resulting in a huge performance drop during SNN inference. We further show that SNNs trained with a smaller number of time-steps achieve better accuracy when deployed on memristive crossbars.

Rate Coding Or Direct Coding: Which One Is Better For Accurate, Robust, And Energy-Efficient Spiking Neural Networks?

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore vari... more Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes. Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system as they show state-of-the-art performance on largescale datasets. Despite their usage, there is little attention to comparing these two coding schemes in a fair manner. In this paper, we conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency. First, we compare the performance of two coding techniques with various architectures and datasets. Then, we measure the robustness of the coding techniques on two adversarial attack methods. Finally, we compare the energy-efficiency of two coding schemes on a digital hardware platform. Our results show that direct coding can achieve better accuracy especially for a small number of timesteps. In contrast, rate coding shows better robustness to adversarial attacks owing to the non-differentiable spike generation process. Rate coding also yields higher energy-efficiency than direct coding which requires multi-bit precision for the first layer. Our study explores the characteristics of two codings, which is an important design consideration for building SNNs 1 .

SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks

ArXiv, 2022

Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alterna... more Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. Recently, SNNs with backpropagation through time (BPTT) have achieved a higher accuracy result on image recognition tasks compared to other SNN training algorithms. Despite the success on the algorithm perspective, prior works neglect the evaluation of the hardware energy overheads of BPTT, due to the lack of a hardware evaluation platform for SNN training algorithm design. Moreover, although SNNs have been long seen as an energy-efficient counterpart of ANNs, a quantitative comparison between the training cost of SNNs and ANNs is missing. To address the above-mentioned issues, in this work, we introduce SATA (Sparsity-Aware Training Accelerator), a BPTT-based training accelerator for SNNs. The proposed SATA provides a simple and re-configurable accelerator architecture for the general-purpose hardware evaluation platform, which makes it easier to analyze the training energy for SNN training algorithms. Based on SATA, we show quantitative analyses on the energy efficiency of SNN training and make a comparison between the training cost of SNNs and ANNs. The results show that SNNs consume 1.27× more total energy with considering sparsity (spikes, gradient of firing function, and gradient of membrane potential) when compared to ANNs. We find that such high training energy cost is from time-repetitive convolution operations and data movements during backpropagation. Moreover, to guide the future SNN training algorithm design, we provide several observations on energy efficiency with respect to different SNN-specific training parameters.

A low power, charge-sensitive preamplifier integrated with a silicon nanowire biosensor

2021 IEEE Latin America Electron Devices Conference (LAEDC)

Charge amplifiers, characterized by their charge-sensitivity or charge gain, are essential compon... more Charge amplifiers, characterized by their charge-sensitivity or charge gain, are essential components of a transducer-interfacing system that amplify charge signals emerging from various sensors and convert them into voltage signals. Today, with increased scaling of MOSFETs, it becomes challenging to obtain the charge amplifiers with high charge sensitivity and low power. Furthermore, if the charge amplifier is to be operated at lower bandwidth, then obtaining low noise at higher charge sensitivity is also difficult. In this paper, the authors propose a novel design of a charge-sensitive preamplifier in 90 nm CMOS technology that can be operated at the frequency range of 10 Hz-10 kHz, suitable for biosignals at lower frequencies. The opamp in the preamplifier is designed to have a folded-cascode structure with composite cascoding at its input transistors for low power operation. A feedback resistance of 185.20 GΩ for the preamplifier is actively realized using long-channel cascode MOSFET stage in the opamp, thereby eliminating the need for a large value of passive resistance on-chip. The preamplifier has a high charge sensitivity of 8.875 mV/fC at a low power consumption of 214.32 nW with input-referred noise of 256.89 μV for the bandwidth 10 Hz-10 kHz. The preamplifier operation is verified by interfacing the model of the preamplifier with the small-signal equivalent of a SPICE model of a Silicon Nanowire Field-effect Transistor (SiNW-FET) based biosensor which was proposed for impedimetric sensing of biomolecules.

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and en... more Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)-CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ∼ 3.48× memory-efficiency and ∼ 2.4−3.1× energy-savings compared to conventional multitask inference in Pipelined task mode.

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021

As neural networks gain widespread adoption in embedded devices, there is a growing need for mode... more As neural networks gain widespread adoption in embedded devices, there is a growing need for model compression techniques to facilitate seamless deployment in resourceconstrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most quantization approaches take a fully trained model, then apply different heuristics to determine the optimal bit-precision for different layers of the network, and finally retrain the network to regain any drop in accuracy. Based on Activation Density-the proportion of non-zero activations in a layer-we propose a novel intraining quantization method. Our method calculates optimal bitwidth/precision for each layer during training yielding an energyefficient mixed precision model with competitive accuracy. Since we train lower precision models progressively during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for retraining. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures and report the accuracy and energy estimates for the same. We achieve up to 4.5× benefit in terms of estimated multiply-and-accumulate (MAC) reduction while reducing the training complexity by 50% in our experiments. To further evaluate the energy benefits of our proposed method, we develop a mixed-precision scalable Process In Memory (PIM) hardware accelerator platform. The hardware platform incorporates shift-add functionality for handling multibit precision neural network models. Evaluating the quantized models obtained with our proposed method on the PIM platform yields about 5× energy reduction compared to baseline 16-bit models. Additionally, we find that integrating activation density based quantization with activation density based pruning (both conducted during training) yields up to ∼198× and ∼44× energy reductions for VGG19 and ResNet18 architectures respectively on PIM platform compared to baseline 16-bit precision, unpruned models.

Efficiency-driven Hardware Optimization for Adversarially Robust Neural Networks

2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021

With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) er... more With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of Deep Neural Networks (DNNs) has become imperative. We will focus on how to address adversarial robustness for DNNs through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energyspending component for DNNs, hardware approaches in the past have focused on optimizing the memory. One such approach is approximate digital CMOS memories with hybrid 6T-8T SRAM cells that enable supply voltage (Vdd) scaling yielding low-power operation, without significantly affecting the performance due to read/write failures incurred in the 6T cells. In this paper, we show how the bit-errors in the 6T cells of hybrid 6T-8T memories minimize the adversarial perturbations in a DNN. Essentially, we find that for different configurations of 8T-6T ratios and scaled Vdd operation, noise incurred in the hybrid memory architectures is bound within specific limits. This hardware noise can potentially interfere in the creation of adversarial attacks in DNNs yielding robustness. Another memory optimization approach involves using analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements. However, crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of the DNNs. We will show how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization.

SwitchX- Gmin-Gmax Switching for Energy-Efficient and Robust Implementation of Binary Neural Networks on Memristive Xbars

ArXiv, 2020

Memristive crossbars can efficiently implement Binarized Neural Networks (BNNs) wherein the weigh... more Memristive crossbars can efficiently implement Binarized Neural Networks (BNNs) wherein the weights are stored in high-resistance states (HRS) and low-resistance states (LRS) of the synapses. We propose SwitchX mapping of weights onto crossbars such that the power consumed by the crossbars and the impact of crossbar non-idealities, that lead to degradation in computational accuracy, are minimized. Essentially, SwitchX maps the binary weights in such manner that the crossbar comprises of more HRS than LRS synapses. Increased HRS in a crossbar will decrease the overall output dot-product current and thus lead to power savings. Interestingly, BNNs mapped onto crossbars with SwitchX also exhibit better robustness against adversarial attacks than the corresponding software BNN baseline as well as the standard crossbar mapped BNNs. Finally, we combine SwitchX with state-aware training (that further increases the feasibility of HRS states during weight mapping) to boost the robustness and ...

Comprehensive Understanding of Silicon-Nanowire Field-Effect Transistor Impedimetric Readout for Biomolecular Sensing

Micromachines, 2020

Impedance sensing with silicon nanowire field-effect transistors (SiNW-FETs) shows considerable p... more Impedance sensing with silicon nanowire field-effect transistors (SiNW-FETs) shows considerable potential for label-free detection of biomolecules. With this technique, it might be possible to overcome the Debye-screening limitation, a major problem of the classical potentiometric readout. We employed an electronic circuit model in Simulation Program with Integrated Circuit Emphasis (SPICE) for SiNW-FETs to perform impedimetric measurements through SPICE simulations and quantitatively evaluate influences of various device parameters to the transfer function of the devices. Furthermore, we investigated how biomolecule binding to the surface of SiNW-FETs is influencing the impedance spectra. Based on mathematical analysis and simulation results, we proposed methods that could improve the impedimetric readout of SiNW-FET biosensors and make it more explicable.

Rethinking Non-idealities in Memristive Crossbars for Adversarial Robustness in Neural Networks

ArXiv, 2020

Deep Neural Networks (DNNs) have been shown to be prone to adversarial attacks. Memristive crossb... more Deep Neural Networks (DNNs) have been shown to be prone to adversarial attacks. Memristive crossbars, being able to perform Matrix-Vector-Multiplications (MVMs) efficiently, are used to realize DNNs on hardware. However, crossbar nonidealities have always been devalued since they cause errors in performing MVMs, leading to computational accuracy losses in DNNs. Several software-based defenses have been proposed to make DNNs adversarially robust. However, no previous work has demonstrated the advantage conferred by the crossbar nonidealities in unleashing adversarial robustness. We show that the intrinsic hardware non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization. We evaluate the adversarial resilience of state-of-the-art DNNs (VGG8 & VGG16 networks) using benchmark datasets (CIFAR-10, CIFAR-100 & Tiny Imagenet) across various crossbar sizes. We find that crossbar non-idealities unleash significantly greater adversarial robustness (> 10−20%) in crossbar-mapped DNNs than baseline software DNNs. We further assess the performance of our approach with other state-of-the-art efficiency-driven adversarial defenses and find that our approach performs significantly well in terms of reducing adversarial loss.

Examining and Mitigating the Impact of Crossbar Non-idealities for Accurate Implementation of Sparse Deep Neural Networks

Recently several structured pruning techniques have been introduced for energy-efficient implemen... more Recently several structured pruning techniques have been introduced for energy-efficient implementation of Deep Neural Networks (DNNs) with lesser number of crossbars. Although, these techniques have claimed to preserve the accuracy of the sparse DNNs on crossbars, none have studied the impact of the inexorable crossbar non-idealities on the actual performance of the pruned networks. To this end, we perform a comprehensive study to show how highly sparse DNNs, that result in significant crossbar-compression-rate, can lead to severe accuracy losses compared to unpruned DNNs mapped onto non-ideal crossbars. We perform experiments with multiple structured-pruning approaches (such as, C/F pruning, XCS and XRS) on VGG11 and VGG16 DNNs with benchmark datasets (CIFAR10 and CIFAR100). We propose two mitigation approaches - Crossbar column rearrangement and Weight-Constrained-Training (WCT) - that can be integrated with the crossbar-mapping of the sparse DNNs to minimize accuracy losses incu...