Papers by Richard Dorrance
IEEE Electron Device Letters, Jun 2013
This letter presents a diode-magnetic tunnel junction (MTJ) magnetic random access memory cell in... more This letter presents a diode-magnetic tunnel junction (MTJ) magnetic random access memory cell in a 65-nm complimentary metal-oxide-semiconductor compatible process. A voltage-controlled magnetic anisotropy switching mechanism, in addition to STT, allows for a unipolar set/reset write scheme, where voltage pulses of the same polarity, but different amplitudes, are used to switch the MTJs. A small crossbar array is constructed from 65-nm MTJs fabricated on a silicon wafer, with switching voltages ∼1 V and thermal stability greater than
10 years, with discrete germanium diodes as access devices to allow for read/write operations. The crossbar architecture can be extended to multiple layers to create a 3-D stackable, nonvolatile memory with a sub-1F2 effective cell size.
IEEE Transactions on Electron Devices, Apr 2012
We present a design-space feasibility region, as a function of magnetic tunnel junction (MTJ) cha... more We present a design-space feasibility region, as a function of magnetic tunnel junction (MTJ) characteristics and target memory specifications, to explore the design margin of a one-transistor–one-magnetic-tunnel-junction (1T-1MTJ) memory cell for spin-transfer torque random access memories (STT-RAMs). Data from measured devices are used to model the statistical variation of an MTJ’s critical switching current and resistance. The sensitivity of the design space to different design parameters is also analyzed for the scaling of both the MTJ and the underlying transistor technology. A design flow, using a sensitivity-based analysis and an MTJ switching model based on the Landau–Lifshitz–Gilbert equation, is proposed to optimize design margins for gigabit-scale memories. Design points for improved yield, density, and memory performance are extracted from MTJ-compatible complementary metal–oxide–semiconductor (CMOS) technologies for 90-, 65-, 45-, and 32-nm processes. Predictive technology models are used to explore the future scalability of STT-RAMs in upcoming 22- and 16-nm technology nodes. Our analysis shows that, to achieve Flash-like densities (< 6F2) in advanced CMOS technologies, aggressive scaling of the critical switching current density will be required.
Proceedings of the 2014 ACM/SIDA International Symposium on Field-Programmable Gate Arrays (FPGA'14), Feb 2014
Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-... more Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing units (GPUs) have become the status quo for computing SpMxV. However, the computational throughput of these libraries for sparse matrices tends to be significantly lower than that of dense matrices, mostly due to the fact that the compression formats required to efficiently store sparse matrices mismatches traditional computing architectures. This paper describes an FPGA-based SpMxV kernel that is scalable to efficiently utilize the available memory bandwidth and computing resources. Benchmarking on a Virtex-5 SX95T FPGA demonstrates an average computational efficiency of 91.85%. The kernel achieves a peak computational efficiency of 99.8%, a >50x improvement over two Intel Core i7 processors (i7-2600 and i7-4770) and showing a >300x improvement over two NVIDA GPUs (GTX 660 and GTX Titan), when running the MKL and cuSPARSE sparse-BLAS libraries, respectively. In addition, the SpMxV FPGA kernel is able to achieve higher performance than its CPU and GPU counterparts, while using only 64 single-precision processing elements, with an overall 38-50x improvement in energy efficiency.
Proceedings of the 23rd International Conference on Field-Programmable Logic and Applications (FPL'13), Sep 2013
Compressive sensing (CS) is a promising technology for the low-power and cost-effective data acqu... more Compressive sensing (CS) is a promising technology for the low-power and cost-effective data acquisition in wireless healthcare systems. However, its efficient realtime signal reconstruction is still challenging, and there is a clear demand for hardware acceleration. In this paper, we present the first single-precision floating-point CS reconstruction engine implemented a Kintex-7 FPGA using the orthogonal matching pursuit (OMP) algorithm. In order to achieve high performance with maximum hardware utilization, we propose a highly parallel architecture that shares the computing resources among different tasks of OMP by using configurable processing elements (PEs). By fully utilizing the FPGA recourses, our implementation has 128 PEs in parallel and operates at 53.7 MHz. In addition, it can support 2x larger problem size and 10x more sparse coefficients than prior work, which enables higher reconstruction accuracy by adding finer details to the recovered signal. Hardware results from the ECG reconstruction tests show the same level of accuracy as the double-precision C program. Compared to the execution time of a 2.27 GHz CPU, the FPGA reconstruction achieves an average speed-up of 41x.
Proceedings of the International Electron Devices Meeting (IEDM'12), Feb 2013
We demonstrate voltage-induced (non-STT) switching of nanoscale, high resistance voltage-controll... more We demonstrate voltage-induced (non-STT) switching of nanoscale, high resistance voltage-controlled magnetic tunnel junctions (VMTJs) with pulses down to 10 ns. We show ~10x reduction in switching energies (compared to STT) with leakage currents < 105 A/cm2. Switching dynamics, from quasistatic to the nanosecond regime, are studied in detail. Finally, a strategy for eliminating the need for external magnetic-fields, where switching is performed by set/reset voltages of different amplitudes but same polarity, is proposed and verified experimentally.
Proceedings of 13th International Symposium on Quality Electronic Design (ISQED'12), Mar 2012
With scaling of CMOS and Magnetic Tunnel Junction (MTJ) devices, conventional low-current reading... more With scaling of CMOS and Magnetic Tunnel Junction (MTJ) devices, conventional low-current reading techniques for STT-RAMs face challenges in achieving reliability and performance improvements that are expected from scaled devices. The challenges arise from the increasing variability of the CMOS sensing current and the reduction in MTJ switching current. This paper proposes a short-pulse reading circuit, based on a body-voltage sensing scheme to mitigate the scaling issues. Compared to existing sensing techniques, our technique shows substantially higher read margin (RM) despite a much shorter sensing time. A narrow current pulse applied to an MTJ significantly reduces the probability of read disturbance. The RM analysis is validated by Monte-Carlo simulations in a 65-nm CMOS technology with both CMOS and MTJ variations considered. Simulation results show that our technique is able to provide over 300 mV RM at a GHz frequency across process-voltage-temperature (PVT) variations, while the reference designs require 4.3 ns and 2.3 ns sensing time for a 200 mV RM, respectively. The effective read energy per bit required by the proposed sensing circuit is around 195 ft in the nominal case.
Proceedings of the ACM/IEEE International Symposium on Nanoscale Architectures (NANOARCH'11), Jun 2011
Density of STT-RAMs is limited by the area cost and width of the access device in a cell since it... more Density of STT-RAMs is limited by the area cost and width of the access device in a cell since it needs to support the programming currents. This paper explores a cell structure that shares each cell’s access transistor with multiple MTJ memory elements. Feasibility and limitations of such a cell structure is explored for both reading and writing of the memory. The analytical and simulation results indicate that only small amount of sharing is possible and having MTJs that can handle a high read current without disturbing the cell is needed.
Proceedings of the ACM/IEEE International Symposium on Nanoscale Architectures (NANOARCH'11), Jun 2011
This paper introduces a design-space feasibility region as a function of MTJ characteristics and ... more This paper introduces a design-space feasibility region as a function of MTJ characteristics and memory target specifications. The sensitivity of the design space is analyzed for scaling of both MTJ and underlying transistor technology. Design points for improved yield, density, and memory performance can be extracted for 90nm down to 32nm processes based on measured MTJ devices. To achieve flash-like densities in upcoming 22nm and 16nm technology nodes, scaling of the critical switching current density is required.
Uploads
Papers by Richard Dorrance
10 years, with discrete germanium diodes as access devices to allow for read/write operations. The crossbar architecture can be extended to multiple layers to create a 3-D stackable, nonvolatile memory with a sub-1F2 effective cell size.
10 years, with discrete germanium diodes as access devices to allow for read/write operations. The crossbar architecture can be extended to multiple layers to create a 3-D stackable, nonvolatile memory with a sub-1F2 effective cell size.