Microelectronics Journal: Seyed Mohamad Taghi Adl, Siamak Mohammadi

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Microelectronics Journal 76 (2018) 69–80

Contents lists available at ScienceDirect

Microelectronics Journal
journal homepage: www.elsevier.com/locate/mejo

A high performance dual clock elastic FIFO network interface for GALS NoC
Seyed Mohamad Taghi Adl a, Siamak Mohammadi a, b, *
a
Dependable System Design Lab, School of ECE, University of Tehran, Tehran, Iran
b
School of Computer Science, Institute of Fundamental Sciences (IPM), Tehran, Iran

A R T I C L E I N F O A B S T R A C T

Keywords: A dual clock register based elastic First-In First-Out Architecture is presented for Globally Asynchronous Locally
Dual clock FIFO Synchronous (GALS) Network on Chip interface. The FIFO is designed using synchronous elastic methods,
GALS NoC interface facilitating its synthesis with commercial CAD tools. This FIFO supports arbitrary phase and frequency for read
Performance evaluation and write operations and prepares safe data transmission between different clock domains. The presented
Process variation structure can be easily used as an interface between synchronous or asynchronous GALS modules. The FIFO is
simulated and analyzed with 32 nm PTM library in HSPICE. Metastability, process variation, throughput, power,
area, delay and maximum frequency are analyzed. Results show elastic FIFO power delay product (PDP) is 23%
less than similar synchronous FIFOs. Our proposed elastic FIFO has double capacity while the area is almost the
same. The elastic FIFO tolerates better high variability and can preserve its functionality by 5% in average more
than the DSPIN synchronous FIFO in presence of variation.

1. Introduction FIFOs. Both asynchronous and synchronous designs in addition to their


circuit features, profit from their design method intrinsic characteristics.
Increased use of Large Network on Chips (NoC) and spread of Asynchronous FIFOs can tolerate variation better [5]; however hand-
complicated System on Chips (SoC) have caused the clock distribution to shaking delay over a transaction may degrade the overall system per-
become a big concern in digital system design [1]. Thereby, designers are formance [6], while synchronous FIFOs are CAD compatible and can be
forced to use globally asynchronous locally synchronous (GALS) NoC easily designed and tested with the entire GALS NoC's modules together
structure [2] for large designs due to its particular characteristics. In [7]. Moreover, synchronous modules simply connect to a synchronous
GALS structures, each module can operate with its own clock frequency, FIFO, where asynchronous FIFOs need extra wrapper to convert control
thus the clock distribution problem can be nearly settled down. However, protocols [8].
the communication between two modules with different clock domains Meanwhile, a new circuit design paradigm, called elastic, has been
will not be as simple as synchronous design communication. presented that can benefit from asynchronous circuit advantages in
Data transmission between different clock domains may cause presence of clock signal which is compatible with general CAD tools [9].
metastability problem [3]. As neighboring modules do not have any The accuracy of clock signal in elastic design is not as important as in
knowledge of the clock edge of other modules, they cannot know when synchronous structures [10], and elastic circuits can tolerate clock skew
data is stable and ready to be captured. A FIFO is used as one of the most well. Elasticity concept is introduced with various titles in literature, such
basic and common methods to solve data transfer dilemma for GALS as latency insensitive [11], or synchronous elastic circuits [12].
NoCs [4]. FIFOs are responsible for metastability resolution, while they In this paper we propose a high performance dual clock elastic FIFO to
do not degrade the throughput of the overall system much. Various de- be used in a GALS NoC network interface. We mainly focus on syn-
signs have been presented for GALS network interfaces based on FIFO. All chronization function of network interface and do not consider standard
of these designs are comparable in terms of area, power consumption, interface protocols as in Ref. [13] or [14]. The presented FIFO has simple
delay, throughput and robustness against process and environmental and efficient design using elastic circuit's capabilities. The FIFO has
variations. double capacity, and its dual clock is well designed for GALS structures
We can categorize presented GALS network interfaces based on FIFO and works truly with different phase and frequency of read/write clocks.
in two main groups, asynchronous FIFOs and synchronous dual clock Two designs for this FIFO have been recommended. One is suited for a

* Corresponding author. Dependable System Design Lab, School of ECE, University of Tehran, Tehran, Iran.
E-mail addresses: [email protected] (S.M.T. Adl), [email protected] (S. Mohammadi).

https://doi.org/10.1016/j.mejo.2018.04.014
Received 4 December 2017; Received in revised form 25 April 2018; Accepted 25 April 2018

0026-2692/© 2018 Elsevier Ltd. All rights reserved.


S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Fig. 1. Three different SELF protocol states [15].

faster writing than reading and the other is when the reading is faster. first. Afterward, GALS synchronization challenges are explored and some
Our dual clock elastic FIFO is easily connected to synchronous, asyn- main dual clock FIFO structures will be explained.
chronous and elastic circuit's interfaces. The presented FIFO is compat-
ible with general CAD tools and there is no need for custom cell designs.
2.1. Elastic circuits
The obtained results demonstrate the presented dual clock elastic FIFO
consumes less power, while it possesses double data capacity compared
According to Cortadella studies, the notion of elasticity belongs to a
to similar structures. The FIFO can be used in higher frequency and shows
range of circuit design that can tolerate variable latency inputs and still
better resilience against high variation situation.
preserves functionality [16].
From the structural point of view, the main contributions of this paper
In this case synchronous systems are the least elastic designs and
can be expressed as follows:
delay insensitive asynchronous circuits are the most elastic structures.
The new method that is called synchronous elastic design, stays some-
- The elastic FIFO is simple and has double data capacity in less area
where in the middle of this range. Plenty of properties have been stated
than the most similar synchronous dual clock FIFO designs.
for synchronous elastic circuits on computational circuits [17]. We
- The presented structure benefits from fast, accurate, and efficient
evaluate synchronous elastic circuits for GALS NoC interfaces in this
full/empty detection mechanism.
study.
- Our FIFO does not need multi-bit synchronization which is compli-
Elastic circuits can preserve functionality while their inputs have
cated and unreliable, but instead uses simple single bit brute-force
arbitrary delays. A storage structure in elastic circuits which is called
synchronizer
Elastic Buffer (EB), unlike a flip flop in synchronous design, can hold two
- The presented dual clock elastic FIFO can easily connect to synchro-
distinct data in its two adjacent latches. Since elastic circuits operate with
nous, asynchronous, and elastic interfaces
clock, these circuits do not need special interface to communicate with
synchronous modules of GALS NoC.
In the following section, some related works are explained. In Section
Different structures have been presented for elastic concept [18,
3 we present the new FIFO structure. In Section 4, the evaluation results
19]. In this paper, we use the synchronous elastic structure presented
are presented and finally the paper is concluded in Section 5.
by Cortadella [15]. Valid and stop signals along with clock form an
elastic channel that controls flow of data using SELF protocol. Valid
2. Related work
control signal is sent in the same direction as data showing its val-
idity. Stop control signal is transmitted in the opposite direction of
In this section, some basic information and definitions necessary for
data flow. Inactivity of stop signal means the consumer can get new
the following sections are given. Elasticity concepts will be described
data.

Fig. 2. Elastic buffer [16]. (a) Structure. (b) Controller Structure. (c) Three possible states of buffer.

70
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Fig. 3. Token ring. (a) Structure. (b) Output with one bit encoding.

According to the SELF handshaking protocol, due to valid (V) and stop 2.2. GALS synchronization
(S) signals conditions, the circuit can be in three modes: Transfer, Idle, or
Retry. Property for each state is defined as follows: Each data transmission between two different clock domains may face
metastability problem, as the consumer does not know when data are
 Transfer, (V ^ : S) stable to be captured on the clock edge. To deal with metastability, two
 Idle, (: V) main classes of solutions are presented, which are called value-safe and
 Retry (V ^ S) time-safe [28]. Value safe methods wait for metastability to resolve such
as pausible clock techniques [29] or stretching the clock [28]. Timing
These modes are depicted in Fig. 1. In Transfer mode, data is for- safe techniques try to allocate a fixed period of time for metastability to
warded like in ordinary flip flops. be resolved. In GALS structure since any transaction between adjacent
In Retry mode, valid data is waiting for the stop situation to disappear. modules should pass through these synchronization interfaces, the whole
This means until the consumer gets ready to receive new data, the pro- network performance strongly depends on synchronization performance
ducer will have to wait before sending data. [30].
Vast researches have been done in synchronous elastic realm. Two simple solutions are available for the time safe method: either
Different asynchronous modules such as mutex, fork, and join have using a series of flip flops called synchronizer, or using FIFOs. Using
special elastic design [20,21]. Moreover, performance improvement synchronizers degrades the system's throughput impressively [31].
techniques used in synchronous circuits have also been implemented and Moreover, using synchronizer for multi-bit signals is unreliable and
evaluated with the elastic concept [22]. Methods such as retiming, synchronized output may not be valid. Synchronizers can be used only
recycling, and speculation have been mapped to elastic design [23–25] to when multi-bit signals change in one bit for consecutive values [32].
be used in design automation [26]. The second time safe synchronization solution is using FIFOs.
Many structures have been introduced for EBs [27] We use the EB Although FIFOs have a more complicated design and some area over-
design presented in Ref. [17], as shown in Fig. 2. In Fig. 2a, two L, and H head, they do not impact the network throughput very much, and are
latches play similar roles as master slave latches in ordinary flip flop (FF). suitable for multi-bit synchronization. The FIFO's delay is comparable
These latches are enabled by a control circuit depicted in Fig. 2b based on with that of a synchronizer. Practically, we can consider a FIFO as
SELF protocol. pipelined synchronizers. Thus, a FIFO seems to be a better choice for
According to valid and stop signals, the EB can enter three states: Full, GALS NoC communication. Synchronous and asynchronous FIFOs are
Half full, and Empty. The finite state machine expressing different states of two ways available for GALS NoC modules interfaces.
the EB is depicted in Fig. 2 When there is no valid data in the buffer
(V1 ¼ 0, V2 ¼ 0), the state is Empty. When valid data is being sent with no 2.3. FIFOs in GALS
restriction (S2 ¼ 0), the buffer is Half-full and operates like a flip flop. But
when the signal S2 is asserted, the buffer holds its current data, and can In GALS research area, different FIFOs have been presented. Asyn-
save another new data and thus store two different data. The EB gets Full chronous FIFOs [33,34] provide high degree of robustness and resilience
and propagates the stop (S1) signal backward to prevent data from being against environmental variation [5]. However, the data transfer rate is
overwritten. limited due to handshaking delay [35].
At first glance, an EB seems more complex than a FF imposing power Some improvements are suggested to decrease delay by using parallel
and area overhead. Further, as the EB gates the clock signal, it needs structure, or tree design FIFOs [36,37]. However, it is not easy to exploit
timing engineering, and thus the circuit may become subject to process them as an interface between synchronous modules since they need some
variation. However, in large circuit designs we will show power con- extra wrapper interface to reliably communicate between synchronous
sumption and area of elastic circuits are not much more than synchro- and asynchronous circuits [8]. Moreover, there is no mature and com-
nous circuits; because elastic circuits do not need extra control modules plete Computer Aided Design (CAD) tool to design and test asynchronous
necessary for synchronous circuits. Besides, elastic circuits can naturally and synchronous circuits simultaneously together [7]. Thus, designers
better tolerate variability compared to synchronous designs. tend to deploy dual clock synchronous FIFO for GALS NoC Network
interface, where read and write operations are synchronous with clock

Table 1
Different FIFO structures overview.
Storage structure Synchronizer used Access storage cells by Full/empty detector based on Design method CAD tool compatibility

Cumming FIFO [32] Ram based Multi bit Binary counter Gray code Synchronous Yes
Chelcea and Nowick FIFO [39] Register based Single bit Combinational Status register Sync/Async No (needs custom cells)
Apperson FIFO [38] Ram based Multi bit Binary counter Gray code Synchronous Yes
DSPIN FIFO [40] Register based Multi bit Token ring Bubble encoding Synchronous Yes
Two hot encoding
Our presented FIFO Register based Single bit Token ring Elastic control signals Elastic Yes
One hot encoding

71
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

but have different phases and frequencies. Chelcea and Nowick presented a register-based dual clock FIFO for a
Dual clock FIFO's structures mainly differ in data storage mechanism. GALS structure [39]. This FIFO has 5 main modules: data cells, full/-
Data management and storage structures for FIFOs can be ram_based or empty detector, put/get controller. An SR flip flop is used to indicate the
register_based. Ram_based structures are more scalable and suitable for data cell status, whether a data is valid or not. The full detector in-
deep FIFOs. While register_based FIFOs are usually small and cells are vestigates if any of adjacent cells are full, in which case the full signal is
accessed based on linear or circular mechanism used in the FIFO struc- asserted. Similarly, the empty detector checks all adjacent cells to see if
ture. In Linear FIFOs, data enters from one side and exits from the other, they are empty.
operating like a pipeline, and imposing a high delay. This FIFO is the basis of lots of researches but has some limitations:
Circular dual clock FIFOs and ram-based structures use some pointers
to indicate read and write places. Pointers in ram-based structures are - Full/empty detector needs some custom cell designs, and is not
usually generated by counters [32,38], whereas in register-based struc- compatible with standard design tools, although it has an optimal
tures, pointers are generated by token rings [39,40]. A Token ring is a area.
sequence of cascaded flip flops circulating some tokens with a clock, - If the read clock runs three times faster than the write clock, the
where different registers take turns on read/write. A simple token ring FIFO's functionality will fail, since the SR latches work asynchro-
structure and its output are depicted in Fig. 3. nously and assert their output before the data is written, and deassert
Based on researches on GALS NoC implementations [3,41], it is not the empty signal simultaneously. Now, if the read operation is per-
necessary to have very deep FIFO interfaces and usually a FIFO with less formed with high speed, the FIFO will try to read a data that is not yet
than 10-word depth can provide full throughput communication [42]. written. On the contrary, if the producer is fast this situation repeats
Therefore, we choose a circular register_based dual clock FIFO structure and the FIFO cell is overwritten.
for GALS NoC interface. - This FIFO is exposed to a glitch failure, as a glitch can change the state
of a register.
2.3.1. Dual clock FIFO structures
Some of the principal designs are summarized in Table 1. These pre- An improvement proposed in Ref. [45] uses standard cells instead of
eminent designs considered in studies will be introduced in the custom ones.
following. Apperson in Ref. [38] states that if the FIFO is used in real situation
In Ref. [32] a dual clock ram based FIFO has been presented with a there may be some clock needed to reach producer/consumer in large
binary code addressing. Binary addresses are converted to a special gray GALS NoCs, thereby this distance should be considered to generate
code to be synchronized with the corresponding clock domain and used full/empty signals. Other designs do not consider this restriction and
for full/empty detection. This structure is composed of 5 modules: assume that full/empty signals immediately reach the
memory, read pointer generator and empty detector, write pointer producer/consumer.
generator and full detector, read pointers synchronizers and write Another dual clock register based FIFO is presented in Ref. [40] and
pointers synchronizers. used as interface in DSPIN NoC [41]. The FIFO contains 5 main modules:
Gray codes are generated from binary addresses with an extra one bit. read/write token ring, full/empty detector, and data buffers. FIFO uses
The most significant bit indicates the rotation of code. As the gray code is read/write token rings to access FIFO cells. The full/empty is detected
a mirror code, less significant bits are repeated and show the addresses. based on read and write tokens. Since these tokens are generated in
The most significant bit shows whether we are on the next round for different clock domains, there is a need for multi-bit synchronization. For
access. This structure has some drawbacks. First, the design complexity correct multi-bit synchronization, token rings use bubble encoding and
and code conversion causes power and area overhead. Second, based on some extra modules are needed to control synchronizer output validity.
the gray code structure, this FIFO can only have capacity in power of two, With bubble encoding two consecutive tokens (ones) circulate. For
which is considered as a limitation. Moreover, recent studies have write token ring the first one indicates the write position, and for read
claimed that it is impossible to use and synchronize gray codes between token ring the first place after the second one represents the read place.
different clock domains when multiple queues are implemented. To To detect full FIFO, the detector checks whether the read and write
avoid synchronization problem, authors [43] have proposed a pointers have reached the same position. According to the synchronizer
loosely-coupled solution, which decouple shared buffering and syn- delay, the full condition is reported two cells before the real full situation.
chronization. This method imposes additional data latency, which leads This structure guarantees that if the FIFO is exposed to a burst write, no
to a larger minimum buffer capacity for full throughput operation. A data is overwritten. However, the FIFO utilization in usual read/write
multiple queue scheme has been presented in Ref. [44] to fulfil dynamic operations is degraded as the nominal FIFO capacity is not used. De-
virtual channel needs. signers of this FIFO believe that detecting an empty FIFO is more
A ram-based dual clock FIFO has been presented in Ref. [38] using important than a full one. Hence, they have designed a more complex
pausible clock technique. Its structure is similar to that of [32]. The ram is empty detector. Because, if the full signal is asserted sooner, only the
accessed through a binary code. The full/empty detector operates based FIFO utilization is degraded with no impact on the functionality, whereas
on gray code because of multi-bit synchronization needs. They have the empty signal destroys the functionality.
considered the timing path between FIFO and producer/consumer. Based on [40] a dual clock FIFO designed [42] and used for inter-
Therefore, an adaptive circuit is designed to assert full/empty some clock facing different clock domains. The main claim of the authors is the
cycles sooner based on this timing path. smaller area and less delay in mesochronous usage. A reconfigurable dual
clock FIFO structure based on [40] is presented in Ref. [46] for adaptive
voltage/frequency domains.

3. Elastic dual clock FIFO

We present an elastic dual clock FIFO used as a GALS NoC interface


that is responsible for synchronization when crossing clock domains. It is
capable of storing double data with almost less area compared to its
synchronous counterpart. The power consumption is less, although it
slightly suffers from more delay.
Fig. 4. Modification module.
This dual clock FIFO is register_based that uses token rings to generate

72
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

write and read grants with one hot encoding. The storage structure of this persistent enable is exerted on the current FIFO cell, and read or write
FIFO is made of elastic buffers. The control circuit of an elastic buffer operation is continuously performed on the same cell. As this could lead
imposes power and area overhead, however it has been shown that as the to a malfunction, the token ring's output is gated by the token ring's
bandwidth increases, this overhead dies away. enable signal.
Despite the complexity of the storage structure in the new FIFO, the
read/write controllers are simpler and the full/empty detectors perform 3.1.2. Data storage and modification module
accurately. With regard to the requirements of GALS structures for syn- Elastic buffers are used as data storage cells. Each elastic buffer has
chronization, this FIFO can operate in arbitrary phase and frequency. two input control signals as depicted on Fig. 2. One is V1 from the pro-
This elastic FIFO has N elastic buffers. It can store 2N data, and access ducer side, which shows the validity of written data. The other signal is
them by N bit read/write token rings. For better FIFO performance, S2, which is sent by the consumer and is used for reading from FIFO. V1 is
elastic buffers can operate with the faster clock. This comes in two fla- in write clock domain and generated by write token ring and is asserted
vours: A FIFO running with a faster write clock, where buffers run with for one write clock period for each data, similarly S2 is in read clock
the write clock, or a FIFO running with a faster read clock. domain. Data storage cells either operate with read or write clock. As one
of V1 or S2 signals is longer than one data storage cell clock period, a
3.1. Elastic FIFO structure malfunction may be caused and data may be written or read incorrectly.
For this reason, it is necessary to shorten the signal activation time to one
The FIFO is formed by six main modules: read/write token rings, data clock period using the modification module depicted in Fig. 4.
storage cells, full/empty detectors, and modification module. To achieve better performance, it is recommended data storage cells
run with the fastest read or write clock. Therefore, two FIFO structures
3.1.1. Token rings have been presented. These structures are similar, but only differ in data
Token rings which use one hot encoding prepare read/write positions storage cell's clock and the place of the modification module.
for the FIFO. Write token is sent as a valid signal for the corresponding When the producer is faster than the consumer, elastic buffers should
elastic buffer to stores new data. The read token is the inverted stop work with write clock. We call this design fast_wr elastic FIFO and the
signal. When the read token is asserted, stop signal is low and data can be modification circuit should be added to S2 path to shorten S2 activation
read. Whenever read token is zero, stop signal is asserted. Data will be time to one write clock. Whereas, when the consumer runs faster, storage
held in the buffer and also another data can be written in that buffer. cells should operate with read clock. V1 signal should be modified to
When a token ring is disabled, it holds its state. This means that a shrink V1 assertion to one read clock and we call it, fast_rd elastic FIFO.

Fig. 5. Dual clock elastic FIFO structure. (a) fast_wr structure. (b) fast_rd structure.

73
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Table 2
Area comparison of various dual clock FIFO designs.
technology 4  16 4  32 8  16 8  32

Area (μm )
2
gate count area (μm )2
gate count area (μm )
2
gate count area (μm2) gate count

Proposed 90 nm 2863 1145 4945 1978 5778 2311 10113 4045


100% 100% 100% 100%
Nguyen [42] 180 nm 13296 1178 24008 2126 25385 2348 48404 4287
102.88% 107.48% 101.60% 105.98%
Rahmani [46] 90 nm 3635 1454 6546 2618 7187 2875 13009 5204
126.98% 132.35% 124.40% 128.65%
DSPIN FIFO [40] 90 nm 3600 1440 6511 2604 7117 2847 12939 5176
125.76% 131.64% 123.19% 127.96%
Cumming [32] 90 nm 3226 1291 5779 2312 6237 2495 11298 4520
112.75% 116.88% 107.96% 111.74%
Ono [45] 90 nm 4266 1706 6866 2746 7004 2802 10906 4288
148.99% 138.82% 121.24% 106.00%

3.1.3. Full/empty detector 3.2. Elastic FIFO functionality


The elastic FIFO is governed by four control signals, V1, V2, S1 and
S2. The state of an EB can be specified as follows: We briefly explain the functionality of fast_wr/fast_rd elastic FIFOs.

- Buffer is empty when v2 is zero 3.2.1. Fast_wr elastic FIFO functionality


- Buffer has only one data if v2 signal is 1 and s1 is zero Fast_wr dual clock elastic FIFO structure, where the write clock is
- Buffer contains 2 data and is full if v2 and s1 are asserted faster than the read clock is depicted in Fig. 5 a. In this structure data
storage cells operate with the write clock.
A simple combinational circuit can determine how many of EBs are Valid_in signal can be considered as a write command for a new data
full and consequently whether the FIFO is full, or contrariwise which EBs entering the FIFO. Valid_in command is not necessarily synchronized with
are empty and thus whether the FIFO is empty. It is possible to design a the write clock. FIFO is checked whether it is full. If not, the enable
special custom cell circuit, which consumes less area. command for write token ring (En_tk_wr) is sent and write tokens for EBs
The only thing that should be considered is that empty signal has to be (V1 signal) are generated.
synchronized with the read clock in fast_wr structure and full signal Similarly for reading, after stop_in signal is deasserted, the FIFO makes
should be synchronized with the write clock in fast_rd design. sure that it is not empty, and then the read token ring circulates, and read
As discussed in Ref. [38], there may be some clock distance between grants are generated (Rd [i]). Read signals which are created by the read
the FIFO interface and the producer/consumer. Therefore, some FIFO clock are longer than the write clock. Hence, the read signal duration is
cells should be reserved for full/empty propagation time. shorten and S2 signal generated for the elastic buffer. The delay of
For correct functionality in FIFO, the write pointer (wrptr ) should al- modification of read grants to generate S2 signal does not impact the
ways be farther than the read pointer (rdptr ). That means, in a circular critical path.
buffer we have to always observe: Elastic buffers run with the write clock and it is necessary to syn-
chronize their output with the read clock. Therefore, as other designs do
rdptr  wrptr  rdptr þ N (1) [32,39,40], the output data will be placed on a bus which is controlled by
the read clock through a simple output buffer.
Where N is the FIFO's depth.
The number of settled data (stldata ) in FIFO can be calculated by the 3.2.2. Fast_rd elastic FIFO functionality
difference between write and read pointers. stldata expresses valid data in The fast_rd elastic FIFO structure is depicted in Fig. 5 b. Similar to
the buffer. fast_wr design, the write grant is generated (Wr[i]). This write signal
cannot be used directly, and needs to be shorten. V1 control signal is
N
wrptr  rdptr  stldata (2) generated by modifying the write token. This modification increases the
critical path delay and degrades the maximum operation frequency of the
elastic FIFO. The modification circuit delay is at most one operating clock
To report full/empty propagation time, some reserved cells should be period.
subtracted from the FIFO capacity. This way full/empty state can be If stop_in command is received, the read token ring gets disabled and
detected in constrained situation. all data cells are stopped, and wait for the stop situation to dies away.
For empty, according to the propagation time between FIFO and the This stop signal will propagate backward if all FIFO cells get full.
consumer, the FIFO's capacity should assume some reserved cells (rsrvrd) Whenever stop_in is deasserted and FIFO is not empty, the read grant (Rd
less than N. FIFO asserts empty signal when: [i]) is generated. Read signal is inverted to S2 signal prepared for elastic
buffers.
stldata < N  rsrvrd (3)
Read tokens let any cell put its data on the output bus through an
For full, based on the propagation time from FIFO to the producer, output buffer. The output buffer and bus are controlled with the read
some cells should be reserved (rsrvwr ) and full signal is asserted sooner: clock similar to elastic buffers of FIFO.

stldata > N  rsrvwr (4) 4. Simulation and results


These considerations are only suitable in burst mode. When FIFO does
not face a burst situation, it is not necessary to assume reserved cells, as In this Section, we have explored area, latency, and throughput of our
they degrade the FIFO utilization. novel dual clock elastic FIFO. We have compared the presented structure
with some other synchronous dual clock FIFOs [32,40,42,45,46]. As for
the comparison of our design in terms of process variation and power

74
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Table 3 Table 4
Transistor count collation of Elastic buffer and Flip Flop. Minimum required FIFO depth for 100% and 50% throughput.
Total Latches Control circuit Minimum FIFO depth for 50% throughput for 100% throughput

1 bit Elastic Buffer 156 38 118 Elastic FIFO 3 4


1 bit Flip flop 42 38 4 Nguyen [42] 4 7
30 bit elastic buffer 1258 1140 118 Rahmani [46] 4 5
30 bit Flip flop 1260 1140 120 (4*30) DSPIN FIFO [40] 5 6
Ono [45] 3 6
Cumming [32] 3 6

consumption, we have evaluated our design against the most similar


synchronous design called DSPIN FIFO [40], which is a register based
FIFO, and accessed by read/write token rings. Elastic buffer ¼ ðctrl þ ð2*latchÞ*BandwidthÞ*depth
(6)
DSPIN FIFO is the basis of Nguyen [42] and Rahmani [46]. Nguyen ¼ ð118 þ 38*BandwidthÞ*depth
[42] has modified DSPIN FIFO's synchronizers in order to improve per-
formance and decrease the delay in mesochronous mode. Rahmani [46] ðnand ¼ 4*tr: ; D  ff ¼ 42*tr: ; D  latch ¼ 19*tr:Þ
has made DSPIN FIFO reconfigurable by simply modifying the full/empty
detection modules. They have also added some modules to their design It can be calculated that in 4-deep FIFO, for bandwidth larger than 8
so it can be used in a dynamic voltage/frequency structure. These bits, elastic FIFO data storage cells will be smaller than synchronous FIFO
changes can be applied easily to our presented elastic design. ones.

4.2. Latency analysis


4.1. Area analysis
We have analyzed the fast write structure latency in this section; the
We have simulated the area of our presented design with various
similar discussion for the fast read structure can be expressed. As the
depths and widths under 90 nm technology compared with other struc-
dual clock FIFOs work with two different clock domains, the latency of
tures in Table 2. Since Nguyen [42] has reported their results under
these FIFOs depend on when the write and the read commands are
180 nm technology, and used gate count for comparison, we have also
asserted.
added the same parameter to the table. In Table 2, we have normalized
In our proposed dual clock elastic FIFO, as soon as the write command
gate count values to our structure. As shown, our design is smaller than
(Valid_in) is asserted, in the same clock, the new data can be stored in the
other designs for various depth and data width.
first available elastic buffer of the FIFO, thus the write latency of a new
Our design has smaller area because the presented elastic FIFO does
data is as follows:
not need multi-bit synchronizers. Moreover, the data storage which oc-
cupies an important area is smaller in elastic design. To explore data writelatency ¼ Δτwr 85 ps < Δτwr < clkwr (7)
storage area, the transistor counts of both elastic buffer and synchronous
register are provided in Table 3. An elastic buffer similar to a flip flop Where the Δτwr is the time distance between write command assertion
consists of two latches. The control circuit of a flip flop is much simpler and the next rising edge of the write clock. 85ps is the minimum required
than that of an elastic buffer. According to Table 3, the control circuit time for write command to be available as a valid signal for elastic
overhead of 1-bit elastic buffer is about 30 times bigger than that of a buffers.
single flip flop. However, control signals in an elastic structure are sent For reading, the data will be available on the output of the FIFO, one
via a channel along with data. Only one control channel is used for data clock after the read command has been asserted (Stop_in deasserted).
path with arbitrary width. Therefore, in elastic buffers for all data width, Therefore, the latency of reading is:
the control circuit overhead remains identical, while in flip flops, the
readlatency ¼ Δτrd þ clkrd 150 ps < Δτrd < clkrd (8)
control overhead repeats for any bit in data path. Thereby, the area
overhead of flip flops increases proportionally to their bandwidth. Where the Δτrd is the time between read command assertion and the
Hence, for data width larger than 30 bits, an elastic buffer area will be next rising edge of the read clock. 150 ps is the minimum required time
smaller than a flip flop. According to flip flop [47] and elastic buffer for read command to be generated for elastic buffers.
areas, a simple calculation shows why elastic data storage is smaller than As an example, Fig. 6 shows the signals waveform for data write and
synchronous register. read. When the write command is asserted, the input data (’01’) is
written in the EB1 in the same cycle. When the read command is asserted
Sync reg ¼ ððD  ff þ 3*nandÞ*BandwidthÞ*depth (stop_in deasserted) the data in the EB1 (‘01’) will be available on the
¼ ðð42 þ 3*4ÞBandwidthÞ*depth (5) output at the end of next cycle.
¼ 54*Bandwidth*depth

Fig. 6. Signals waveform for read and write operations.

75
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Table 5

Elastic FIFO
64 ¼ 2048/32
Minimum read and write clock period (ps) for Elastic FIFO (fast_Wr) and DSPIN

8.04E06
13834822
FIFO for different control gate size (W/L).

3001.6
2680
1.12
1.309
3251

4255
W/L 4 ¼ 128/ 8 ¼ 256/ 16 ¼ 512/ 32 ¼ 1024/
32 32 32 32

DSPIN FIFO
1200/1500
DSPIN Write 850 450 300 250

1.33E07
FIFO period

4119.0
1.267
3251
Read 1050 600 350 300

32 ¼ 1024/32
period

13211251
Elastic Write 1150 650 350 300

Elastic FIFO
FIFO period

3251

4063
1.25
Read 1450 700 400 350

5.74E06
2576.8
1.155
period

2231
4.3. Throughput analysis

16 ¼ 512/32

DSPIN FIFO
1100/1200
13390924

7.30E06
3162.3
1.369
1.267

2310
The throughput of dual clock FIFOs can be reported as a function of

3251

4119
FIFO depth. As the dual clock FIFOs impose a delay to the data path, the
throughput of the network could be possibly degraded. Deeper FIFOs do

Elastic FIFO
not decrease the throughput, as buffering data could cover the latency,

4.62E06
2394.4
and data can be read from the FIFO in every cycle [48]. Simulations show

8 ¼ 256/32

1931
13824253

1.24
the minimum buffer requirement of our presented design for having 50%

1.308
3251

4252
and 100% throughput is 5 and 7 buffers, respectively.

DSPIN FIFO
Our presented FIFO uses elastic buffers as storage elements. Each

900/1000

6.44E06
elastic buffer can store double data, therefore, our design only requires 3

3219.6
1.609
2001
and 4 elastic buffers to have 50% and 100% throughput, respectively.

DSPIN FIFO

4 ¼ 128/32
Table 4 compares buffer requirements of different designs. The minimum

15388465
depth of our proposed structure for 100% throughput is less than the

Elastic FIFO
1.456
3251

4733
other structures, and for 50% throughput the minimum depth is less than

3.40E06
2087.6
DSPIN FIFO based structures [40,42,46], and similar to Cumming [32]

1631
1.28
and Ono [45] structures. Note that for having 100% throughput dual
clock FIFO, it is necessary to have the same clock frequency for read and

DSPIN FIFO
64 ¼ 2048/32
write clock.

700/800

3.12E06
17178667

2404.2
1.848
1301
Presented Elastic FIFO and DSPIN FIFO Delay, Power, PDP and EDP comparison with 4-deep, 34 bit-width.

2.516
2613

4.4. Power and variation analysis 6574

Elastic FIFO
We have implemented our presented structure using HSPICE to study

2.95E06
1999.4
1.351
power consumption as well as variation. Since, the proposed structure

1480
32 ¼ 1024/32
(a) Where the write frequency is 833 MHz and the read frequency is 666 MHz for different W/L ratio

open up a new design space, other presented ideas for custom designs
10672333

such as [42,46] can be based on our elastic dual clock FIFO. Moreover, to
4067.2

DSPIN FIFO
2624

analyze elastic design power consumption and variation dependability,


1.55

600/700

2.84E06
2471.1
we have compared our design with the most similar synchronous design
2.147
1151

namely DSPIN FIFO [40]. For a comprehensive evaluation, we have


designed both FIFOs with 4-word deep and 34 bits data width in HSPICE
16 ¼ 512/32

using 32 nm PTM library [49]. We have chosen 34 bits because of the


Elastic FIFO
8044288

2.58E06

original design of DSPIN FIFO. Also, we have chosen 32 nm library for


1944.5
1.461
1331
2680

3001

simulation due to variation requirements.


1.12

(b) With control circuit W/L ¼ 512/32 ratio for different frequencies

Expanding the bandwidth can change the FIFO's delay, as the control
circuits should drive more buffers. Thus, we have modified the control
DSPIN FIFO

circuit gates making them stronger by changing W/L ratio. Increasing W/


500/600

3.09E06
2941.7

L ratio for the control circuit, changes the maximum operating frequency
2.799
8 ¼ 256/32

1051
8122331

of FIFOs. The minimum range for read/write clock period with 50ps
1.025
2815

2885

approximation error for fast_wr elastic FIFO and DSPIN FIFO has been
Elastic FIFO

reported in Table 5.
2.20E06

For the same W/L size and 34-bit data width, DSPIN FIFO can run
1865.5
1.581
1180

with nearly 19% higher frequency than the elastic FIFO on average. The
Elastic FIFO

4 ¼ 128/32

reason is the prominent role of the control circuit in the elastic FIFO.
11197617

However, for smaller bandwidth, simulated for 5 bit width, the elastic
DSPIN FIFO
1.172
3091

3622

400/500

FIFO runs twice faster than DSPIN FIFO.


3.09E06
2941.74
2.799
1051

To explore W/L effect on power consumption and FIFO delays, we


have compared both FIFOs with a fixed frequency and specific simulation
time with two different test scenarios. Scenarios have been designed in a
Wr/rd period
Power (mW)

Power (mW)

way that FIFOs serve the same data count. Table 6 a reports delay, power
Delay (ps)

Delay (ps)

consumption, power delay product (PDP) and energy delay product


Table 6

W/L

PDP
EDP

PDP
EDP

(EDP) for DSPIN and elastic FIFOs for 20 ns simulation time, where the
write clock runs with 1200 ps clock period and 1500 ps read clock period

76
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Fig. 7. Comparison of synchronous DSPIN FIFO and elastic FIFO with fixed read clock period (1500 ps ~666 MHz) and write clock period (1200 ps ~ 833 MHz) but
different control circuit size. (a) Power consumption comparison. (b) PDP comparison.

Fig. 8. Elastic FIFO and DSPIN FIFO comparison based on frequency degradation. (a) Delay trend. (b) Power trend. (c) PDP comparison. (d) EDP comparison.

for 4-deep 34-bit width FIFOs. with fixed frequency and control circuit size in terms of delay, power,
The DSPIN synchronous FIFO delay does not depend on the control PDP, and EDP. Results show decreasing the read/write frequency, in-
circuit size; whereas the elastic FIFO delay decreases with control gates creases the FIFO delay, while it decreases the power consumption. Fig. 8
enlargement. However, increasing the gate size causes more power depict different parameters trends.
consumption as depicted in Fig. 7a. The delays are almost the same, however for high frequency opera-
We have calculated the PDPs, and compared them in Fig. 7b. Based on tions, DSPIN FIFO imposes less delay. While, for frequencies under 1 GHz
PDP results, the elastic FIFO, W ¼ 256 nm and W ¼ 512 nm shows better the elastic FIFO operates faster and exhibits less delay. The elastic FIFO
performances, whereas DSPIN FIFO has its best performance with consumes less power. Power reduction trends based on frequency
W ¼ 512 nm and W ¼ 1024 nm, therefore we have chosen w ¼ 512 nm degradation has been compared in Fig. 8 c. Synchronous DSPIN FIFO
for both designs in our later simulations. consumes 28% more power in average than the elastic FIFO. For a better
To determine the frequency impact on delay and power consumption, comparison, we have considered PDP parameter, where that of DSPIN
we have compared both FIFOs with a fixed size and in fair conditions. We FIFO is 23% larger than elastic structure's. These results demonstrate
have assumed L ¼ 32 nm, and W ¼ 512 nm for control circuits, and have elastic circuit is a better choice for GALS NoC interface structure.
simulated FIFOs for 20 ns. Results for different write and read frequencies EDP has been calculated as well and depicted in Fig. 8 d. Results
have been presented in Table 6 b. DSPIN and elastic FIFOs are compared confirm the elastic FIFO usage instead of DSPIN FIFO for GALS NoC

Table 7
Simulation parameters for variation exploration.
(a) circuit level parameters

Technology 32 nm Vdd 1V Leff 12.6 nm Simulation length 10ns


process TT Vth 0.16 V Wn ¼ Wp/2 128 nm Control circuits Wn ¼ Wp/2 512 nm
temperature 25  C Tox 1 nm Ln ¼ Lp 32 nm

(b) Various read/write clock domain (ps)

Maximum (nominal) þ10% þ20% wr33%-rd42%

DSPIN FIFO Write period 300 330 360 400


Read period 350 385 420 500
Elastic FIFO Write period 350 385 420 468
Read period 400 440 480 572

77
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

Fig. 9. Comparison of elastic and DSPIN FIFOs performance under Vth variation. (a) Power, Delay, and PDP comparison in presence of 30% Vth variation based on
different frequencies (10%, 20%, 33%). (b) Distribution of PDP comparison based on different variation situation (20%, 30%, 40%) where FIFOs run 10% slower
maximum frequency.

interface. All along these discussions, the important point has been that parameters are stated in Table 7 a.
the elastic FIFO capacity is twice DSPIN FIFO. In Network on chip per- We have chosen Wn ¼ 512 nm for control circuit's gates size, so that
formance research area, the FIFO capacity plays an undeniable role for the maximum frequency for DSPIN FIFO will be 300 ps for the write clock
the congestion control [50]. period, and 350 ps for the read clock. The elastic FIFO can operate at least
As a result of technology shrinkage unpredictable design changes may with 350 ps write clock period, and 400 ps read clock period, as stated in
occur due to process variability. The process variation modifies delay Table 5.
characteristics of circuits after manufacturing. Thereby, variation has FIFOs performance parameters have been studied, when the clocks
become one of the most important challenges to be considered in circuit run 10%, 20%, and 33% slower than the maximum nominal frequency in
design as feature size scale down from 65 nm to 16 nm [51]. presence of Vth variation. Table 7 b expresses different read and write
We have explored Die-to-Die process variation impact on the FIFOs clock periods in test scenarios.
performance and functionality. According to studies conducted in Refs. We have simulated both FIFOs in nine different scenarios where Vth as
[51,52], the most effective variation parameter in deep submicron variation parameter, fluctuates 20%, 30%, and 40% from the nominal
technology is the threshold voltage (Vth), which has a direct impact on amount in three different operating frequencies. Monte Carlo simulations
performance and functionality. Other parameters such as transistors' are performed with 256 iterations for different FIFOs. We have analyzed
width and length, or gate oxide thickness come second in process power consumptions, FIFO delays, and PDP changes.
variation. Results show the elastic FIFO can preserve functionality 5% more
Variation of threshold voltage can be as large as 80 mV in 20 nm than DSPIN synchronous FIFO in average in presence of variation. The
technology [53]. Although, we have simulated our design in 32 nm FIFO's functionality is verified based on the input scenario. The output
technology, we have explored Vth variations up to 40% of the nominal data is checked to see whether the output values are in the intended
amount for today technology needs [54]. order. For example, if 2(010), 3(011), 4(100) enter the FIFO in this order,
For this purpose, we have simulated and analyzed two 34-bit 4-deep the output order should be the same and the signal transitions must be
FIFOs near nominal maximum read/write frequencies. Simulation checked for correct output data. For instance when Vth variation is 30%

78
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

DSPIN FIFO preserve its functionality 81.5% while elastic FIFO work well are compared with the most similar synchronous design, called DSPIN
in 87.5%. Results show increased variation of Vth degrades the syn- FIFO. They confirm the presented FIFO is a better choice for GALS
chronous DSPIN FIFO functionality more than elastic FIFO, whereas the structure. The Elastic FIFO consumes less power, while delay are slightly
DSPIN FIFO functionality is slightly better when the circuit suffers less the same. The elastic design has more delay in high frequencies
variation. (1 GHz–2 GHz), however, in lower frequencies the synchronous deign
To study Vth variation effect on FIFOs performance and compare delay increases dramatically, while the elastic design keeps its smooth
designs, Coefficient of Variation (Cv) parameter is used. Cv shows how a increase trend. The PDP parameter proves the efficiency of the elastic
parameter varies based on variation [55]. Coefficient of variation is FIFO.
calculated as explained below:
Pn Acknowledgment
i¼1 xi
μðxÞ ¼ (9)
n This research was in part supported by a grant from Institute for
Pn Research in Fundamental Sciences (IPM) (No. 3-1396-9).
 μðxÞÞ2
i¼1 ðxi
varðxÞ ¼ (10)
n1 References
pffiffiffiffiffiffiffiffiffiffiffiffiffi
σ ðxÞ ¼ varðxÞ (11) [1] L. Benini, G. De Micheli, Networks on chips: a new SoC paradigm, Computer (Long.
Beach. Calif) 35 (1) (2002) 70–78.
[2] D.M. Chapiro, Globally-asynchronous Locally-synchronous Systems, Stanford
σ ðxÞ University, 1984.
Cv ¼ (12) [3] E. Beigne, P. Vivet, Design of on-chip and off-chip interfaces for a GALS NoC
μðxÞ
architecture, in: Proceedings - International Symposium on Asynchronous Circuits
and Systems, vol. 2006, 2006, pp. 172–181.
We have obtained vast results from the evaluation of delay, power
[4] A. Strano, D. Ludovici, D. Bertozzi, A library of dual-clock FIFOs for cost-effective
and PDP of FIFOs in presence of variation. We have compared the FIFO's and flexible MPSoC design, in: 2010 International Conference on Embedded
performance, where both FIFOs have correct functionality. Results show Computer Systems: Architectures, Modeling and Simulation, 2010, pp. 20–27.
that generally under variation, the mean value (μ) for the elastic FIFO's [5] M. Jhamb, R.K. Sharma, A.K. Gupta, A novel FIFO design for data transfer in mixed
timing systems, Int. J. Electr. Comput. Energy Electron. Commun. Eng. 8 (3) (2014)
power is less than that of DSPIN synchronous FIFO, while the mean delay 609–614.
of DSPIN FIFO is less than that of the presented elastic FIFO. These are [6] A. Chakraborty, M.R. Greenstreet, Efficient self-timed interfaces for crossing clock
mainly because of the frequency difference, where the elastic FIFO is domains, in: Ninth International Symposium on Asynchronous Circuits and
Systems, 2003. Proceedings, 2003, pp. 78–88.
examined with slower frequency. We have thoroughly studied Vth vari- [7] A. Yakovlev, P. Vivet, M. Renaudin, Advances in asynchronous logic: from
ation impact on performance. A small amount of results has been re- principles to GALS & NoC, recent industry applications, and commercial CAD tools,
ported here. When the threshold voltage variation changes by 30% of the in: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013,
2013, pp. 1715–1724.
nominal amount, the elastic FIFO performance variation becomes less [8] H. Han, K.S. Stevens, Clocked and asynchronous FIFO characterization and
than DSPIN FIFO. Power variation increases with frequency degradation, comparison, in: 2009 17th IFIP International Conference on Very Large Scale
while delay variation decreases. PDP is a function of power and delay, Integration (VLSI-SoC), 2009, pp. 101–108.
[9] J. Cortadella, M. Kishinevsky, B. Grundmann, Synthesis of synchronous elastic
and thereby because of more power growth and less delay decrease, PDP
architectures, in: 2006 43rd ACM/IEEE Design Automation Conference, 2006,
increases while the frequency decreases. Trends for delay, power and pp. 657–662.
PDP variations based on different frequencies have been showed in Fig. 9 [10] J. Cortadella, L. Lavagno, D. Amiri, J. Casanova, C. Macian, F. Martorell, J.A. Moya,
L. Necchi, D. Sokolov, E. Tuncer, Narrowing the margins with elastic clocks, in:
a. Also, for 40% Vth variation, similar to 30% Vth variation case, the
2010 IEEE International Conference on Integrated Circuit Design and Technology,
performance variation of the elastic FIFO is less than that of DSPIN FIFO. 2010, pp. 146–150.
However, for 20% Vth variation DSPIN FIFO shows almost the same [11] L.P. Carloni, K.L. McMillan, A. Saldanha, A.L. Sangiovanni-Vincentelli,
variability compared to the elastic FIFO. A methodology for correct-by-construction latency insensitive design, in: IEEE/
ACM International Conference on Computer-aided Design. Digest of Technical
From another point of view, we can observe how higher Vth variation Papers (Cat. No.99CH37051), 1999, pp. 309–315.
impacts the FIFO's performance, when both FIFOs run 10% slower than [12] S. Krstic, J. Cortadella, M. Kishinevsky, J. O'Leary, Synchronous elastic networks,
their maximum frequency. Results in Fig. 9 b, which show PDP distri- in: 2006 Formal Methods in Computer Aided Design, vol. 2, 2006, pp. 19–30.
[13] V.S. Vij, R.P. Gudla, K.S. Stevens, Interfacing synchronous and asynchronous
bution, confirm the elastic FIFO performs better in higher variation domains for open core protocol, in: 2014 27th International Conference on VLSI
situations. Design and 2014 13th International Conference on Embedded Systems, 2014,
pp. 282–287.
[14] K. Swaminathan, G. Lakshminarayanan, F. Lang, M. Fahmi, S.-B. Ko, Design of a low
5. Conclusion power network interface for Network on chip, in: 2013 26th IEEE Canadian
Conference on Electrical and Computer Engineering (CCECE), vol. 2, 2013, pp. 1–4
A dual clock elastic FIFO has been presented, which read and write no. December 2014.
[15] J. Cortadella, M. Kishinevsky, B. Grundmann, SELF: specification and design of a
operations are capable to run with arbitrary phase and frequency. This
synchronous elastic architecture for DSM systems, Int. Work. Timing Issues Specif.
FIFO is suitable to be used as GALS NoC interface. This FIFO easily Synth. Digit. Syst. (2006).
connects to any synchronous or asynchronous interface, and can be [16] J. Cortadella, M. Galceran-Oms, M. Kishinevsky, Elastic systems, in: Eighth ACM/
IEEE International Conference on Formal Methods and Models for Codesign
designed with the commercial CAD tools.
(MEMOCODE 2010), 2010, pp. 149–158.
The presented elastic FIFO can store double data with less area [17] J. Carmona, J. Cortadella, M. Kishinevsky, A. Taubin, Elastic circuits, IEEE Trans.
overhead compared to other FIFOs. It has a simple structure, and does not Comput. Des. Integr. Circuits Syst. 28 (10) (Oct. 2009) 1437–1455.
need multi-bit synchronization or special coding. Full/empty detection in [18] M.R. Casu, L. Macchiarulo, Adaptive latency-insensitive protocols, IEEE Des. Test
Comput. 24 (5) (Sep. 2007) 442–452.
this FIFO is performed more accurately compared to other FIFOs, because [19] L.P. Carloni, A.L. Sangiovanni-Vincentelli, Coping with latency in SOC design, IEEE
of its simple combinational logic, which has small delay overhead. La- Micro 22 (5) (Sep. 2002) 24–35.
tency and throughput of design has been explored. Minimum buffer [20] M. Galceran-Oms, J. Cortadella, D. Bufistov, M. Kishinevsky, Automatic
microarchitectural pipelining, in: 2010 Des. Autom. Test Eur. Conf. Exhib. (DATE
requirement of the proposed design for 100% throughput is less than the 2010), Mar. 2010, pp. 961–964.
other structures. [21] T. Kam, M. Kishinevsky, J. Cortadella, M. Galceran-Oms, Correct-by-construction
Two structures for the presented elastic FIFO are simulated using microarchitectural pipelining, in: 2008 IEEE/ACM Int. Conf. Comput. Des, vol. 3,
Nov. 2008, pp. 434–441.
HSPICE. Delay, power consumption, PDP and EDP has been studied. [22] J. You, Y. Xu, H. Han, K.S. Stevens, Performance evaluation of elastic GALS
Moreover, the threshold voltage variation impact on functionality and interfaces and network fabric, Electron. Notes Theor. Comput. Sci. 200 (1) (Feb.
performance has been analyzed for 34-bit 4-deep FIFO structure. Results 2008) 17–32.

79
S.M.T. Adl, S. Mohammadi Microelectronics Journal 76 (2018) 69–80

[23] M. Galceran-Oms, J. Cortadella, M. Kishinevsky, Speculation in elastic systems, in: [46] A. Rahmani, P. Liljeberg, J. Plosila, H. Tenhunen, Design and implementation of
Proceedings of the 46th Annual Design Automation Conference on ZZZ - DAC ’09, reconfigurable FIFOs for voltage/frequency island-based, Microprocess. Microsyst.
vol. 1, 2009, p. 292. 37 (4–5) (2013) 432–445.
[24] D.E. Bufistov, J. Cortadella, M. Galceran-oms, J. Júlvez, M. Kishinevsky, Retiming [47] M.M. Mano, COMPUTER SYSTEM Computer System, Prentice Hall, 1992.
and recycling for elastic systems with early evaluation, in: Design Automation [48] I. Miro Panades, Design and Implementation a Micro-network on Chip with Service
Conference, 2009. DAC ’09. 46th ACM/IEEE, vol. 1, 2009, pp. 288–291. Guarantee, Pierre-and-Marie-Curie University, 2008.
[25] M. Galceran-Oms, A. Gotmanov, J. Cortadella, M. Kishinevsky, Microarchitectural [49] Predictive Technology Model, 2016 [Online]. Available: http://ptm.asu.edu/.
transformations using elasticity, ACM J. Emerg. Technol. Comput. Syst. 7 (4) (Dec. [50] Q. Liu, R.D. Russell, RGBCC: a new congestion control mechanism for InfiniBand,
2011) 1–24. in: 2016 24th Euromicro International Conference on Parallel, Distributed, and
[26] M. Galceran-oms, Automatic Pipelining of Elastic Systems, Universitat Politecnica Network-based Processing (PDP), 2016, pp. 91–100.
De Catalunya, 2011. [51] C. Hernandez, A. Roca, F. Silla, J. Flich, J. Duato, Improving the performance of
[27] G. Dimitrakopoulos, A. Psarras, I. Seitanidis, Microarchitecture of Network-on-chip GALS-based NoCs in the presence of process variation, in: 2010 Fourth ACM/IEEE
Routers, Springer New York, New York, NY, 2015. International Symposium on Networks-on-chip, 2010, pp. 35–42.
[28] R. Mullins, S. Moore, Demystifying data-driven and pausible clocking schemes, in: [52] M. Mirzaei, M. Mosaffa, S. Mohammadi, Variation-aware approaches with power
13th IEEE Int. Symp. Asynchronous Circuits Syst, Mar. 2007, pp. 175–185. improvement in digital circuits, Integrat. VLSI J. 48 (1) (Jan. 2015) 83–100.
[29] K.Y. Yun, R.P. Donohue, Pausible clocking: a first step toward heterogeneous [53] D. Moon, J. Song, O. Kim, Effect of source/drain doping gradient on threshold
systems, in: Proceedings International Conference on Computer Design. VLSI in voltage variation in double-gate fin field effect transistors as determined by discrete
Computers and Processors, 1996, pp. 118–123. random doping, Jpn. J. Appl. Phys. 49 (104301) (2010).
[30] R. Ginosar, Fourteen ways to fool your synchronizer, in: Proc. - Int. Symp. [54] A.V. Kauppila, Analysis of Parameter Variation Impact on the Single Event Response
Asynchronous Circuits Syst, 2003, pp. 89–96. in Sub-100nm CMOS Storage Cells, Vanderbilt, 2012.
[31] P. Teehan, M. Greenstreet, G. Lemieux, A survey and taxonomy of GALS design [55] M. Alioto, S. Member, G. Palumbo, M. Pennisi, Understanding the effect of process
styles, IEEE Des. Test Comput. 24 (5) (Sep. 2007) 418–428. variations on the delay of static and domino logic, IEEE Trans. Very Large Scale
[32] P. Alfke, C.E. Cummings, Simulation and synthesis techniques for aynchronous FIFO Integr. Syst. 18 (5) (2010) 697–710.
design with asynchronous pointer comparisons, in: Snug-2002, 2002, pp. 1–17.
[33] A.V. Yakovlev, A.M. Koelmans, L. Lavagno, High-level modeling and design of
asynchronous interface logic, IEEE Des. Test Comput. 12 (1) (Jan. 1995) 32–40.
[34] E. Brunvand, Low latency self-timed flow-through FIFOs, in: Advanced Research in Seyed Mohamad Taghi Adl received his BSc. and MSc Degree
VLSI, 1995. Proceedings., Sixteenth Conference on, 1995, pp. 76–90. in computer engineering from Shahid Beheshti University and
[35] A. Chakraborty, M.R. Greenstreet, Efficient self-timed interfaces for crossing clock Isfahan University of technology in 2009 and 2011 respectively.
domains, in: Ninth International Symposium on Asynchronous Circuits and He is currently a Ph.D. student at University of Tehran. His
research interests include low power and asynchronous system
Systems, 2003. Proceedings, 2003, pp. 78–88.
[36] J. Ebergen, Squaring the FIFO in GasP, in: Proc. - Int. Symp. Asynchronous Circuits design, on chip interconnection in GALS NoC, process variation
Syst, 2001, pp. 194–199 no. 2. and elastic design.
[37] J.T. Yantchev, C.G. Huang, M.B. Josephs, I.M. Nedelchev, Low–latency
asynchronous FIFO buffers, in: Proceedings Second Working Conference on
Asynchronous Design Methodologies, vol. 19, IEEE Comput. Soc. Press, 2013,
pp. 24–31 no. 1.
[38] R.W. Apperson, Z. Yu, M.J. Meeuwsen, T. Mohsenin, B.M. Baas, A scalable dual-
clock FIFO for data transfers between arbitrary and Haltable clock domains, IEEE
Trans. Very Large Scale Integr. Syst. 15 (10) (Oct. 2007) 1125–1134.
[39] T. Chelcea, S.M. Nowick, Robust interfaces for mixed-timing systems, IEEE Trans.
Very Large Scale Integr. Syst. 12 (8) (Aug. 2004) 857–873.
[40] I. Miro Panades, A. Greiner, Bi-synchronous FIFO for synchronous circuit Siamak Mohammadi received his BSc, MSc and Ph.D. degrees
communication well suited for network-on-chip in GALS architectures, in: First from the University of Paris Sud Orsay, France in 1990, 1992
International Symposium on Networks-on-chip (NOCS’07), 2007, pp. 83–94. and 1996, respectively, all in electrical engineering. During his
[41] I. Miro-Panades, F. Clermidy, P. Vivet, A. Greiner, Physical implementation of the Ph.D. he was supported by a grant from the Ministry of Educa-
DSPIN network-on-chip in the FAUST architecture, in: Second ACM/IEEE tion of France. From 1997 to 1999 he was a Research Associate
International Symposium on Networks-on-chip (nocs 2008), 2008, pp. 139–148. with the Department of Computer Science, University of Man-
[42] T.-T. Nguyen, X.-T. Tran, A novel asynchronous first-in-first-out adapting to multi- chester, England. In 1999 he moved to Canada and worked at
synchronous network-on-chips, in: 2014 International Conference on Advanced Cogency Semiconductor Inc. in Toronto until 2003 and then at
Technologies for Communications (ATC 2014), vol. 2015–Febru, 2014, ATI Technologies Inc. until 2005. Currently he is an Assistant
pp. 365–370. Professor in School of Electrical and Computer engineering, at
[43] M. Paschou, A. Psarras, C. Nicopoulos, G. Dimitrakopoulos, CrossOver: clock the University of Tehran, Iran. He has over 15 years experience
domain crossing under virtual-channel flow control, in: Design, Automation & Test in VLSI digital area, ASIC design and verification, asynchronous
in Europe Conference & Exhibition (DATE), 2016. design, and on-chip interconnects in GALS NoCs. He has
[44] A. Psarras, M. Paschou, C. Nicopoulos, G. Dimitrakopoulos, A dual-clock multiple- contributed to the design of the first asynchronous ARM
queue shared buffer, IEEE Trans. Comput. 66 (10) (Oct. 2017) 1809–1815. microprocessor, as well as several PowerLine networking and
[45] T. Ono, M. Greenstreet, A modular synchronizing FIFO for NoCs, in: 2009 3rd ACM/ RFID chips.
IEEE International Symposium on Networks-on-chip, 2009, pp. 224–233.

80

You might also like