# 3D stacked image sensor featuring low noise inductive coupling channels Masayuki Ikebe<sup>1</sup>, Tetsuya Asai<sup>1</sup>, Masafumi Mori<sup>1</sup>, Toshiyuki Itou<sup>1</sup>, Daisuke Uchida<sup>1</sup>, Yasuhiro Take<sup>2</sup>, Tadahiro Kuroda<sup>2</sup>, and Masato Motomura<sup>1</sup> 1 Graduate School of Information Science and Technology, Hokkaido University North14, West9, Sapporo, Hokkaido, 060-0814 Japan 3 Department of Electrical Engineering, Keio University 4-1-1 Hiyoshi, Hohoku-ku, Yokohama, 223-8521 Japan E-mail:ikebe@ist.hokudai.ac.jp **Abstract** This paper proposes 3D stacked module consisting of image sensor and digital logic dies connected through inductive coupling channels. Evaluation of a prototype module revealed radiation noise from the inductive coils to the image sensor is less than 0.4-LSB range along with ADC code, i.e., negligible. Aiming at high frame rate image sensor/processing module exploiting this attractive off-die interface, 1,000 fps motion vector (MV) estimation and classification engine for high-speed computational imaging in a 3D stacked module is assembled, and also tested. The module achieved a cognitive classification scheme employed on MV patterns, enabling the classification of moving objects not possible in conventional proposals. Keywords: ThruChip Interfaces (TCI), 3D stacked imager, radiation noise, MV estimation/classification #### 1. Introduction Computational imaging is a state-of-the-art digital imaging technology that captures and processes numerous image snapshots to create perceptually meaningful representation of our visual world (Fig. 1). Difficult challenge in computational imaging is to achieve both highspeed imaging and low-power image processing. In this paper, we propose a 3-D stacked module for such highspeed computational imaging applications consisting of our low-power CMOS imager [1] and an image-processor die where image snapshots are transferred to the low-power image processor via highspeed ThruChip Interfaces (TCIs) [2] utilizing inductive-coupling between numerous numbers of coils on each die. ## 2. Circuit Configuration Figure 2 shows a micro-photograph of our fabricated chip using 0.18-µm 1P4M CIS process. The chip is divided into two halves. The left half (Tx side) consists of an image sensor, an asynchronous parallel to serial converter (P2S) that serializes 12-bit parallel pixel data, and a TCI-Tx. In the right half (Rx side), the received serial data at TCI-Rx are converted into 12-bit (parallel) pixel data by an asynchronous serial to parallel converter (S2P). The TCI-Tx includes two sets of coils/drivers (TCI Tx cir) for transferring a pair of data and clock, whereas TCI-Rx includes a corresponding coils/amplifiers (TCI Rx cir). Figure 3 shows photographs of the MV estimation chip and 3-D stacked board, including the performance table. We evaluated the chip and 3-D module at 100 MHz transfer clock (187 fps with 200×200 pixels; 100 MHz / 14-bit = 7.1 MHz system clock). Our key proposal is to reduce computational cost of MV estimation using block matching (BM) method in the image processor by utilizing highspeed imaging and high-bandwidth image-data transfer between the imager and processor with TCIs, based on the fact that movement of real-world subjects on image sensors tends to be limited within 1 pixel under high fps condition. The minimum computational cost of BM with 1-pixel search range is obtained considering a minimum macro block of 3×3 pixels and a search area of 5×5 pixels. Estimated MVs are transferred to our motion classification subsystem (Figs. 4 and 5). #### 3. Experimental results Figure 6 shows measured TCI radiation noise characteristics. Since radiation noise decreases in proportion to the cube of the distance, the interference component should have dependencies on (A)ADC-code, (B)Tx power, as well as (C)coil locations. Measured results show no such dependencies, meaning no radiation noise interference detected. Figure 7 presents integrated MV outputs of the 3-D module. Motion sequences were captured by our imager [1] (top chip), and then transferred to the processor (bottom chip) via TCI, and the bottom chip produced the vector outputs. Figure 8 left shows 6 examples of motion patterns that our system was able to recognize, whereas the right shows a comparison table between state-of-the-art VLSIs [3][4] and the proposed classifier, indicating 2 major advantages with respect to them; namely, classification target and power consumption. ### 4. Conclusion This paper demonstrated that high fps image snapshots were the enabler for area/power efficient motion-vector estimation and classification systems. We showed, on the other hand, it is feasible to employ TCIs in highspeed imagers since their noise interference is negligible when coils are placed in a right manner. We hence conclude 3D stacking of imager/processor using TCIs, only requiring metal coils instead of costly TSVs, can become an attractive solution for highspeed computational imaging applications. #### References - [1] M. Ikebe, et al., VLSI Circuits 2015, pp. 82-83. - [2] D. Ditzel, et al., ThruChip wireless connections, Hot Chips 2014. - [3] G. Kim, et al., ISSCC 2014, pp. 182-184. - [4] M. Altaf et al., ISSCC 2015, pp. 394-396. Fig. 1 Overall concept of proposed imager/processor 3D stacked module Fig. 2 Facilitated chip and TCI. Fig. 3 Chip micrograph, 3D stacked module (board) snapshot, and system specification Fig. 4 Feature extraction (FE) scheme and its block components for proposed motion classifier Fig. 5 Architecture of proposed motion-classification subsystem implementing on-/off-chip feature extractors (FEs) Fig. 6 Measured characteristics of TCI noise. Fig. 7 Motion-vector (MV) examples estimated by our 3D stacked module | | | | | ISSCC 2014<br>[5] | ISSCC 2015<br>[6] | This work | |---------------------------------------|-----------------|--------------|------------|---------------------------|------------------------|-----------------------------| | approaching | moving to right | CCW rotation | Process | 65nm 1P8M<br>CMOS | 180nm 1P6M<br>CMOS | 180nm 1P6M<br>CMOS+FPGA | | ************************************* | | | Target | Static Image (HMD Apps) | Static Image (Seizure) | Image Seq.<br>(Motion) | | | <b>∡</b> ! | | Power | <778 mW | N/A | <7.2 mW /<br><497 mW | | | 1 9 1 | ANT | Gate count | 8.32M | N/A | 32k (nand2) /<br>29k (ALUT) | | moving upstairs | anomaly (1) | anomaly (2) | # of input | 16 | 16 | 37 / 3,700 | | 11 | | -101- | Classifier | Multi-layer<br>Perceptron | Linear SVM | Linear SVM | | | ( • ) | | | | | | Fig. 8 Proposed motion-classification system: demonstration (left) and comparison to latest neural-net-based hardware classifiers (right)