IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
1091
Applying CDMA Technique to Network-on-Chip
Xin Wang, Tapani Ahonen, and Jari Nurmi
Abstract—The issues of applying the code-division multiple
access (CDMA) technique to an on-chip packet switched communication network are discussed in this paper. A packet switched
network-on-chip (NoC) that applies the CDMA technique is realized in register-transfer level (RTL) using VHDL. The realized
CDMA NoC supports the globally-asynchronous locally-synchronous (GALS) communication scheme by applying both
synchronous and asynchronous designs. In a packet switched
NoC, which applies a point-to-point connection scheme, e.g., a
ring topology NoC, data transfer latency varies largely if the
packets are transferred to different destinations or to the same
destination through different routes in the network. The CDMA
NoC can eliminate the data transfer latency variations by sharing
the data communication media among multiple users concurrently. A six-node GALS CDMA on-chip network is modeled and
simulated. The characteristics of the CDMA NoC are examined
by comparing them with the characteristics of an on-chip bidirectional ring topology network. The simulation results reveal that
the data transfer latency in the CDMA NoC is a constant value for
a certain length of packet and is equivalent to the best case data
transfer latency in the bidirectional ring network when data path
width is set to 32 bits.
Index Terms—Code-division multiple access (CDMA), integrated circuit (IC) design, network-on-chip (NoC).
I. INTRODUCTION
A
S MORE and more components are integrated into an
on-chip system, communication issues become complicated. Network-on-chip (NoC) is proposed to solve the on-chip
communication problem by separating the concerns of communication from computation. The idea of NoC is to construct
an on-chip communication network to perform data transfers among a large number of system components. The NoC
structures that have been proposed can be roughly sorted into
two categories, circuit switched network and packet switched
network, according to their data switching modes. SoCBUS
architecture [1], a mesh on-chip network, is an example of a
circuit switched network that uses packet connected circuit
scheme to allocate time or space slices on the switch links
among the terminals in the network. Æthereal NoC [2] and
Proteo NoC [3] are examples of the packet switched category.
Æthereal NoC applies the combined guaranteed service and
best-effort routers to transfer data packets in the network.
In Proteo NoC, the components in the system are connected
through network nodes and hubs. The network topology and
data links in Proteo NoC can be customized and optimized for
a specific application. Circuit-switched networks will face the
problem of scalability and parallelism if they are applied in a
Manuscript received May 30, 2006; revised April 15, 2007.
The authors are with the Institute of Digital and Computer Systems, Tampere
University of Technology, 33101 Tampere, Finland (e-mail:
[email protected];
[email protected];
[email protected]).
Digital Object Identifier 10.1109/TVLSI.2007.903914
future on-chip system which contains hundreds of functional
intellectual property (IP) blocks. The packet switched network
can overcome the shortcomings of the circuit switched network
by dividing data streams into packets and routing packets to
their destinations node by node. However, in a packet switched
network that applies multihop point-to-point (PTP) connection
scheme as in [2] and [3], the packet transfer latency will vary
largely when data packets are transferred to different destinations or to the same destination via different routes in the
network. Hence, the upper bound of the packet transfer latency
is determined by the worst case scenario.
In order to eliminate variance of data transfer latency and
complexity incurred by routing issues in a PTP connected NoC,
an on-chip network which applies a code-division multiple access (CDMA) technique is introduced in this paper. As one
of the spread-spectrum techniques, the CDMA technique [4]
has been widely used in wireless communication systems because it has great bandwidth efficiency and multiple access capability. The CDMA technique applies a set of orthogonal codes
to encode the data from different users before transmission in
a shared communication media. Therefore, it permits multiple
users to use the communication media concurrently by separating data from different users in the code domain. Hence, the
CDMA NoC proposed in this paper can transfer data packets
from different sources to their destinations directly and concurrently. Consequently, the large variance of data transfer latencies in a PTP connected NoC is eliminated in the CDMA
NoC. The constant data transfer latency in the CDMA NoC is
helpful for providing a guaranteed communication service for
an on-chip system.
The rest of this paper is arranged as follows. In Section II,
issues with applying CDMA technique into an on-chip network
will be discussed. Section III presents the structure of the
CDMA NoC. The realization of the basic components in the
CDMA NoC is presented in Section IV. A six-node CDMA
NoC is presented in Section V in order to examine characteristics of the CDMA NoC by comparing it with a PTP connected
NoC. Finally, conclusions are drawn in Section VI.
II. APPLYING CDMA TECHNIQUE TO NOC
The principle of the CDMA technique is illustrated in Fig. 1.
At the sending end, the data from different senders are encoded
using a set of orthogonal spreading codes. The encoded data
from different senders are added together for transmission
without interfering with each other because of the orthogonal
property of spreading codes. The orthogonal property means
that the normalized autocorrelation value and the cross-correlation value of spreading codes are 1 and 0, respectively.
Autocorrelation of spreading codes refers to the sum of the
products of a spreading code with itself, while cross-correlation
refers to the sum of the products of two different spreading
1063-8210/$25.00 © 2007 IEEE
1092
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
Fig. 1. CDMA technique principle.
Fig. 3. Data encoding example.
Fig. 2. Digital CDMA encoding scheme.
codes. Because of the orthogonal property, at the receiving
end, the data can be decoded from the received sum signals by
multiplying the received signals with the spreading code used
for encoding. The following three subsections will discuss the
issues related to apply the CDMA technique in an NoC.
A. Digital Encoding and Decoding Scheme
Several on-chip bus schemes that apply the CDMA technique
have been presented in [5]–[8]. Those schemes are implemented
by analog circuits, namely, the encoded data are represented
by the continuous voltage or capacitance value of the circuits.
Therefore, the data transfers in the analog bus are challenged
by the coupling noise, clock skew, and the variations of capacitance and resistance caused by circuit implementation [8]. In
order to avoid the challenges faced by the analog circuit implementation, digital encoding and decoding schemes developed
for the CDMA NoC are illustrated in Figs. 2 and 4, respectively.
In the encoding scheme illustrated in Fig. 2, data from different
senders fed into the encoder bit by bit. Each data bit will be
spread into S bits by XOR logic operations with a unique S-bit
spreading code as illustrated in Fig. 2. Each bit of the S-bit encoded data generated by XOR operations is called a data chip.
Then, the data chips which come from different senders are
added together arithmetically according to their bit positions in
the S-bit sequences. Namely, all the first data chips from different senders are added together and all the second data chips
from different senders are added together, and so on. Therefore,
after the add operations, we will get S sum values of S-bit encoded data. Finally, as proposed in [9], binary equivalents of the
S sum values are transferred to the receiving end. An example of
encoding two data bits from two senders is illustrated in Fig. 3 in
order to illustrate the proposed encoding scheme in more detail.
Fig. 3(a) illustrates two original data bits from different senders
and two 8-bit spreading codes. The top two figures in Fig. 3(b)
illustrate the results after data encoding (XOR operations) for the
original data bits. The bottom figure in Fig. 3(b) presents the
Fig. 4. Digital CDMA decoding scheme.
eight sum values after add operations. Then the binary equivalents of each sum value will be transferred to the receiving end.
In this case, two binary bits are enough to represent the three
possible different decimal sum values, “0,” “1,” and “2.” For
example, if a decimal sum value “2” needs to be transferred, we
need to transfer two binary digits “10.”
The digital decoding scheme applied in the CDMA NoC is
depicted in Fig. 4. The decoding scheme accumulates the received sum values into two separate parts, a positive part and a
negative part, according to the bit value of the spreading code
used for decoding. For instance, as illustrated in Fig. 4, the received first sum value will be put into the positive accumulator if
the first bit of the spreading code for decoding is “0,” otherwise,
it will be put into the negative accumulator. The same selection
and accumulation operations are also performed on the other received sum values. The principle of this decoding scheme can be
explained as follows. If the original data bit to be transferred is
“1,” after the XOR operations in the encoding scheme illustrated
in Fig. 2, it can only contribute nonzero value to the sums of data
chips when a bit of spreading code is “0.” Similarly, the 0-value
original data bit can only contribute nonzero value to the sums
of data chips when a bit of spreading code is “1.” Therefore,
after accumulating the sum values according to the bit values
of the spreading code, either the positive part or negative part is
larger than the other if the spreading codes are orthogonal and
balance. Hence, the original data bit can be decoded by comparing the values between the two accumulators. Namely, if the
value of the positive accumulator is larger than the value in the
negative accumulator, the original data bit is “1”; otherwise, the
original data bit is “0.”
WANG et al.: APPLYING CDMA TECHNIQUE TO NoC
1093
B. Spreading Code Selection
As discussed in Section II-A, the proposed decoding scheme
requires the spreading codes used in the CDMA NoC to have
both the orthogonal and balance properties. The orthogonal
property has been explained in the first paragraph of Section II.
The balance property means that the number of bit “1” and
bit “0” in a spreading code should be equal. Several types of
spreading codes have been proposed for CDMA communication, such as Walsh code, M-sequence, Gold sequence, and
Kasami sequence, etc. [10]. However, only Walsh code [10]
has the required orthogonal and balance properties. Therefore,
Walsh code family is chosen as the spreading code library
for the CDMA NoC. In an S-bit (
, integer
)
sequences that have both
length Walsh code set, there are
the orthogonal and balance properties. Hence, the proposed
network nodes. The
CDMA NoC can have at most
length of applied Walsh code set should be kept as small as
possible according to the number network nodes. The purpose
is to reduce the number of data chips generated during data
encoding operations as illustrated in Fig. 2. For example, if
there are six nodes in the CDMA NoC, the 8-bit Walsh code set
should be used instead of a longer Walsh code set.
C. Spreading Code Protocol
In a CDMA network, if multiple users use the same spreading
code to encode their data packets for transmission simultaneously, the data to be transferred will interfere with each other
because of the loss of orthogonal property among the spreading
codes. This situation is called spreading code conflict, which
should be avoided. Spreading code protocol is a policy used to
decide how to assign and use the spreading codes in a CDMA
network in order to eliminate or reduce the possible spreading
code conflicts during the communication processes. Several
spreading code protocols have been presented for CDMA
packet radio network [11], [12] and will be shortly introduced
in the following six paragraphs.
1) Common Code Protocol (C protocol): All users in the network use the same spreading code to encode their data
packets to be transferred.
2) Receiver-Based Protocol (R protocol): Each user in the network is assigned a unique spreading code used by the other
users who want to send data to that user.
3) Transmitter-Based Protocol (T protocol): The unique
spreading code allocated to each user is used by the user
himself to transfer data to others.
4) Common-Transmitter-Based Protocol (C-T protocol): The
destination address portion of a data packet is encoded
using C protocol, whereas, the data portion of a packet is
encoded using T protocol.
5) Receiver-Transmitter-Based Protocol (R-T protocol): It is
the same as the C-T protocol except that the destination address portion of a data packet is encoded using R protocol.
6) Transmitter-Receiver-Based Protocol (T-R protocol): Two
unique spreading codes are assigned to each user in the network, and then a user will generate a new spreading code
from the assigned two unique codes for its data encoding.
Fig. 5. Proposed CDMA NoC structure.
Among the introduced spreading code protocols, only T protocol and T-R protocol are conflict-free if the users in the network send data to each other randomly. Because the T-R protocol has the drawback of using a large amount of spreading
codes and complicated decoding scheme, T protocol is preferred
in the CDMA NoC. However, if T protocol is applied in the network, a receiver cannot choose the proper spreading code for
decoding because it cannot know who is sending data to it. In
order to solve this problem, an arbiter-based T protocol (A-T
protocol) is developed for the CDMA NoC. In a CDMA NoC
which applies A-T protocol, each user is assigned with a unique
spreading code for data transfer. When a user wants to send
data to another user, he will send the destination information of
the data packet to the arbiter before starting data transmission.
Then, the arbiter will inform the requested receiver to prepare
the corresponding spreading code for data decoding according
to the sender. After the arbiter has got the acknowledge signal
from the receiver, it will send an acknowledge signal back to the
sender to grant its data transmission. If there is more than one
user who wants to send data to the same receiver, the arbiter will
grant only one sender to send data at a time. Therefore, by applying the proposed A-T protocol, spreading code conflicts in
the CDMA NoC can be eliminated.
III. CDMA NOC STRUCTURE
The proposed CDMA NoC is a packet switched network
that consists of “Network Node,” “CDMA Transmitter,” and
“Network Arbiter” blocks as illustrated in Fig. 5. The functional
IP blocks (functional hosts) are connected to the CDMA NoC
through individual “Network Node” blocks. The CDMA communications in the network are performed by “CDMA Transmitter” and “Network Arbiter” blocks. Because the different
functional hosts may work at different clock frequencies as illustrated in Fig. 5, coordinating the data transfers among different
clock domains would be a problem. A globally-asynchronous
locally-synchronous (GALS) scheme [13] has been proposed
as a solution for this problem. Applying the GALS scheme to
the CDMA NoC means that the communications between each
functional host and its network node use local clock frequency,
while the communications between network nodes through the
CDMA network are asynchronous. In order to support the GALS
scheme, both synchronous and asynchronous circuits are applied
in the design. The three types of components in the CDMA
NoC will be presented in the following three subsections.
1094
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
Fig. 7. Bit-synchronous transfer scheme.
Fig. 6. Block diagram of the network node in CDMA NoC.
A. Network Node
The block diagram of the “Network Node” in the CDMA
NoC is illustrated in Fig. 6, where the arrows represent the flows
of data packets. In Fig. 6, the “Network IF” block, which belongs to the functional host, is an interface block for connecting
a functional host with a “Network Node” through VCI [14] or
OCP interface standard [15]. GALS scheme is realized in “Network Node” block by using synchronous design in the “Node
IF” subblock and using asynchronous design in the other subblocks. The function of the subblocks in a “Network Node” will
be described in the following four paragraphs.
1) Node IF: This block is used to receive data from the
“Network IF” block of a functional host through the
applied VCI or OCP standard. Then it will assemble the
received data into packet format and send the packet to
“Tx Packet Buffer,” or disassemble the received packet
from “Rx Packet Buffer” and send the extracted data to
the functional host.
2) Tx/Rx Packet Buffer: These two blocks are buffers that
consist of the asynchronous first-input–first-output (FIFO)
presented in [16]. “Tx Packet Buffer” is used to store the
data packets from “Node IF” block, and then deliver the
packets to “Packet Sender” block. The “Rx Packet Buffer”
stores and delivers the received packets from “Packet Receiver” to “Node IF.”
3) Packet Sender: If “Tx Packet Buffer” is not empty, “Packet
Sender” will fetch a data packet from the buffer by an asynchronous handshake protocol. Then it will extract the destination information from the fetched packet and send the
destination address to “Network Arbiter.” After “Packet
Sender” gets the grant signal from the arbiter, it will start
to send the data packet to “CDMA Transmitter.”
4) Packet Receiver: After system reset, this block will wait
for the sender information from “Network Arbiter” to
select the proper spreading code for decoding. After the
spreading code for decoding is ready, the receiver will
send an acknowledge signal back to “Network Arbiter”
and wait to receive and decode the data from “CDMA
Transmitter,” and then send the decoded data to “Rx Packet
Buffer” in packet format.
B. Network Arbiter
“Network Arbiter” block is the core component to implement the A-T spreading code protocol presented in Section II-C.
By applying A-T spreading code protocol, every sender node
cannot start to send data packets to “CDMA Transmitter” until
it gets the grant signal from “Network Arbiter.” “Network Arbiter” takes charge of informing the requested receiver node to
prepare the proper spreading code for decoding and sending a
grant signal back to the sender node. In the case that there are
more than one sender nodes requesting to send data to the same
receiver node simultaneously or at different times, the arbiter
will apply a “round-robin” arbitration scheme or the “first-come
first-served” principle, respectively, to guarantee that there is
only one sender sending data to one specific receiver at a time.
However, if different sender nodes request to send data to different receiver nodes, these requests would not block each other
and will be handled in parallel in the “Network Arbiter.” The
“Network Arbiter” in the CDMA NoC is different from the arbiter used in a conventional bus. The reason is that the “Network Arbiter” here is only used to set up spreading codes for
receiving and it handles the requests in parallel in the time domain. However, a conventional bus arbiter is used to allocate the
usage of the common communication media among the users in
the time-division manner.
C. CDMA Transmitter
The “CDMA Transmitter” block takes care of receiving
data packets from network nodes and encoding the data to be
transferred with the corresponding unique spreading code of
the sender node. Although this block is realized using asynchronous circuits, it applies a bit-synchronous transfer scheme.
It means that the data from different nodes will be encoded
and transmitted synchronously in terms of data bits rather
than any clock signals. In Fig. 7, the principle of the referred
bit-synchronous transfer scheme is illustrated by a situation
that network nodes “A” and “B” send data packets to “CDMA
Transmitter” simultaneously and node “C” sends a data packet
later than “A” and “B.” In this situation, the data packet from
node “A” will be encoded and transmitted together with the
data packet from node “B” synchronously in terms of each data
bit. When the data packet from node “C” arrives at a later time
point, the transmitter will handle the data bit of “Packet C”
together with the data bits of packet “A” and “B” at the next
start point of the time slot for bit encoding and transmitting
processes. The dot-line frame at the head of the “Packet C” in
Fig. 7 is used to illustrate the waiting duration if the “Packet C”
arrived in the middle of the time slot for handling the previous
data bit. The time slot for handling a data bit is formed by a
four-phase handshake process. The bit-synchronous transfer
scheme can avoid the interferences caused by the phase offsets
among the orthogonal spreading codes if the data bits from different nodes are encoded and transmitted asynchronously with
each other. Because the nodes in the network can request data
transfer randomly and independently of each other, “CDMA
WANG et al.: APPLYING CDMA TECHNIQUE TO NoC
1095
TABLE I
AREA COST OF CDMA NOC COMPONENTS
Fig. 8. C-element control pipeline.
Fig. 9. Micropipeline control logic.
B. Data Path Configuration
Transmitter” applies the “first come, first served” mechanism to
ensure that the data encoding and transmission are performed
as soon as there is data transfer request.
IV. REALIZATION
Two issues related with realizing the CDMA NoC are addressed in this section. One issue is about asynchronous design
realization. Another is the configuration of the data path in the
CDMA NoC.
A. Asynchronous Design
As illustrated in Fig. 5 and addressed in Section III-A, the
asynchronous blocks in the CDMA NoC include the “CDMA
Transmitter,” “Network Arbiter,” “Tx/Rx Packet Buffer,” and
“Packet Receiver/Sender” blocks. The important part of the
asynchronous design of these blocks is the control logic. Since
the “CDMA Transmitter” and “Network Arbiter” blocks are
data-path centric blocks, the control logic used in these blocks
is composed by a straightforward C-element pipeline as illustrated in Fig. 8. Each stage in the C-element pipeline is enabled
by the enable signals generated from data completion detection
circuits. The control token will be passed from one stage to
the next one through each C-element in the pipeline. The
control logic used in the “Tx/Rx Packet Buffer” and “Packet
Receiver/Sender” blocks bases on the micropipeline control
logic presented in [17] and illustrated in Fig. 9. The principle
of micropipeline control logic is to use the output from the
current stage to enable or disable the input of previous stage.
The “delay” components illustrated in Fig. 9 are realized by
logic gates of generating or receiving four-phase handshake
signals for control tasks in the asynchronous blocks in the
CDMA NoC. An example with more details about applying
micropipeline control logics to asynchronous designs can be
found in [18].
In order to suit the conventional synchronous design tools
and other synchronous designs in the CDMA NoC, all the asynchronous blocks of the CDMA NoC are realized in RTL using
VHDL together with the synchronous blocks. The basic principle is to model the basic components, C-element, latches, and
combinational logic gates, in RTL using VHDL, and then build
the asynchronous circuits using these RTL component models
in a hierarchical way.
Figs. 2 and 4 illustrate the principle of data encoding and
decoding schemes used in the CDMA NoC by an example
of processing and delivering one data chip of encoded data
from the sender to the receiver at one time. Since one original
data bit will be spread into S bits after encoding, the degree
of data transfer parallelism between the “CDMA Transmitter”
and “Packet Sender/Receiver” blocks affects the data transfer
latency in the CDMA NoC largely. Namely, increasing the
number of data bit encoded and delivered via “CDMA Transmitter” at one time can reduce the data transfer latency in the
CDMA NoC and vice versa. However, increasing the data
processing and delivering parallelism will incur larger area
cost. Hence, in order to figure the tradeoff character between
the parallelism and the area cost, the “Packet Sender,” “CDMA
Transmitter,” and “Packet Receiver” blocks have been realized
with four different data path configurations. According to the
number of data bit transferred from a “Packet Sender” to a
“Packet Receiver” through “CDMA Transmitter,” the configurations are named as 1-, 8-, 16-, and 32-bit schemes.
C. Synthesis Results
The components of the CDMA NoC are synthesized using
a 0.18- m standard cell library. The Basic VCI (BVCI) interface standard [14] is applied in the realization of “Node IF”
block. The data width and buffer depth in the “Tx/Rx Packet
Buffer” blocks are set to 32 bits and 4 packets, respectively. In
order to facilitate the simulation work later on, six network node
and 8-bit Walsh codes are applied for synthesizing the “CDMA
Transmitter,” “Network Arbiter,” and “Packet Sender/Receiver”
blocks. The area cost of the components of the CDMA NoC
under different data path configurations are listed in Table I. The
area cost figures in Table I are presented as the number of equivalent gates. 85 K gates/mm is used to calculate the number of
equivalent gates for the 0.18- m standard cell library.
From Table I we can see that when the data path width is increased from 1 to 32 bits, the area cost of “Packet Receiver” and
“CDMA Transmitter” becomes 13 and 17 times larger. The area
increase is due to the duplications of the encoding and decoding
logic in the “CDMA Transmitter” and “Packet Receiver” blocks
for increasing the data path width. By comparing the ratio of
increased data path width, the increased area cost of the components is reasonable. To be noticed in Table I is that the area
cost of the 32-bit version of “Packet Sender” block is smaller
1096
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
Fig. 12. Six-node PTP NoC simulation network.
Fig. 10. Six-node CDMA NoC simulation network.
Fig. 11. Data packet format specification.
than others. The reason is that the data width of the output of
“Tx Packet Buffer” block is 32 bits, thus the “Packet Sender”
block need some control logic to adjust the fetched packet cells
to be sent out according to the data path width if it is smaller
than 32 bits. However, when the data path width is increased to
32 bits, the output data width adjusting logic is not needed in the
“Packet Sender” block. The initiator type “Node IF” has larger
area than the target type because it needs a buffer to store the
header cell of received packets for supporting split-transaction
feature in the BVCI standard.
V. COMPARING WITH A PTP NOC
In order to examine the characteristics and performance of
the CDMA NoC thoroughly, a simulation network that applies
the CDMA NoC scheme is built and compared with a PTP NoC
presented in [19].
A. Simulation Network Setup
The simulation network that applies the CDMA NoC is illustrated in Fig. 10. It contains six network nodes which work in
different clock frequencies as illustrated in Fig. 10. The BVCI
interface standard is applied in the network. Three hosts act as
initiators and the other three act as targets, as denoted by the labels “I” and “T,” respectively, in the “Network IF” blocks. The
initiator hosts can generate requests to any target hosts, while
the target hosts can generate responses only for the received requests passively. The network nodes are connected to each other
through “CDMA Transmitter” and “Network Arbiter” blocks.
The spreading codes used in the network are six 8-bit Walsh
codes. The basic data unit transferred in the network is data
packets composed by one header cell and several data cells as
illustrated in Fig. 11. The number of data cells in a packet varies
from one to three, while the width of each packet cell is fixed
at 32 bits. The “functional host” blocks and their “Network IF”
blocks are not realized with any real IP blocks; they are simulated by adding stimulus signals on each “Network Node” block
according to the BVCI standard. A four-phase dual-rail handshake protocol is applied in the CDMA network to transfer data
between network nodes. The PTP network illustrated in Fig. 12
has the same mentioned network configurations as the CDMA
network except that the network nodes in the PTP network are
connected with each other through bidirectional ring topology.
Therefore, the characteristics of the CDMA NoC can be examined more clearly by comparing the two networks in different
aspects in the following four subsections.
B. Comparison of Data Transfer Principles
In the PTP connected network illustrated in Fig. 12, the data
traffic load is distributed into the links among the network nodes.
This distributed traffic scheme has the benefits of flexibility
and scalability, whereas the main disadvantage is that the data
transfer latency between two network nodes can be largely different when data are transferred to different destinations or to
the same destination via different routes.
Although data transfers in the PTP network can be parallel
if they take place in different links among the network nodes,
concurrent data transfers over a single link is impossible in the
PTP NoC because a link between two network nodes is shared
in a time-division manner. Therefore, by applying CDMA technique, the main advantage of the CDMA NoC is the feature of
concurrent data transfers. Hence, the data transfer latency in the
CDMA NoC is a constant value which in turn helps the CDMA
NoC to provide a guaranteed service for the on-chip system.
Another advantage of the CDMA NoC is that it can easily
support multicast data transfers by requesting multiple receiver
nodes to use the same spreading code for receiving. In the PTP
NoC, the multicast transfer can be realized only by sending multiple copies of a data packet to its multiple destinations, unless
extra logic is added in each network node to copy the multicast
packet to both the functional host and the output link to the next
node. This would increase the traffic load in the PTP network,
or complicate the network implementation. One more benefit
of applying the CDMA NoC is that the header cell in a packet
needs not to be transferred in the network after a sending node
gets the grant signal from the “Network Arbiter” since the receiving node already knew the sender information through the
A-T protocol presented in Section II-C. However, in the PTP
NoC, the header cell in a packet needs to be transferred in the
network for packet routing.
WANG et al.: APPLYING CDMA TECHNIQUE TO NoC
1097
TABLE III
SYNCHRONOUS TRANSFER LATENCY
Fig. 13. Network node structure of the PTP NoC.
TABLE II
DATA TRANSACTION SPECIFICATION
Fig. 14. ATL portions of the CDMA NoC.
C. Comparison of Network Node Structures
The network node structure of the PTP NoC presented in [19]
is illustrated in Fig. 13. It contains two same “Communication
Layer” blocks for supporting the bidirectional ring topology. By
comparing with the network node illustrated in Fig. 6, the network node of the CDMA NoC has less complexity. The main
reason is that the network node of the CDMA NoC does not need
to handle any bypass packets or the packet routing issues because of its one-hop data transfer scheme. Therefore, the “Communication Controller” and “Packet Distributor” blocks illustrated in Fig. 13 are not needed in the node of the CDMA NoC.
Since the CDMA NoC applies centralized traffic scheme, its
network node does not need multiple “Communication Layer”
blocks and “Layer MUX” block in the node of the PTP NoC
illustrated in Fig. 13. When the data transfer parallelism needs
to be increased in the PTP NoC, more “Communication Layer”
blocks in a network node are needed in order to set up more
links with other nodes, whereas the network node structure in
the CDMA NoC does not need to change in this situation because of the parallel data transfer scheme.
D. Comparison of Data Transfer Latencies
The CDMA network illustrated in Fig. 10 and the PTP
NoC illustrated in Fig. 12 are both synthesized using the
same 0.18- m technology library. Gate-level simulations are
performed on both simulation networks. The data transactions
performed during the simulations are listed in Table II. Each
data transaction consists of one request packet from an initiator
host to a target host and one corresponding response packet
from the target host to the initiator host.
Because the GALS scheme is applied both in the CDMA
network and the PTP network, the data transfer latency in the
two simulation networks can be separated into two parts, synchronous transfer latency (STL) and asynchronous transfer latency (ATL). The STL refers to the data transfer latency between
a functional host and the network node attached to it. STL depends on the local clock and the type of interface. The measured
STL values of the CDMA network are listed in Table III. The
constant values in Table III are caused by the handshakes in the
asynchronous domain. They are independent of the local clock
rate but belong to the synchronous transfer processes. Therefore, they are counted as a part of STL. From Table III, we can
see that an initiator type of network node takes more clock cycles for local data transfers. The reason is that the initiator node
needs to store or read the header cell to or from a buffer as mentioned in Section IV-C. Since the same “Node IF” block design
is applied in both simulation networks, the STL of the PTP network has the same value as listed in Table III.
The ATL refers to the data transfer latency of transferring
data packets from one network node to the other node through
a NoC structure using asynchronous handshake protocols. The
ATL values in the PTP and CDMA networks consist of different
portions which will be discussed separately in the following
subsections.
1) ATL in the CDMA NoC: The ATL of the CDMA network consists of three portions: packet loading latency (PLL),
packet transfer latency (PTL), and packet storing latency (PSL).
The concept of those ATL portions is illustrated in Fig. 14 with
an example where “Network Node 0” sends one data packet
to “Network Node 2.” The black arrows in Fig. 14 represent
the packet transfer direction. The different portions of ATL are
marked by grey arrows in Fig. 14 and explained in the following
three paragraphs.
a) PLL: This is the time used by the “Packet Sender” block to
fetch a data packet from “Tx Packet Buffer” and prepare
to send the packet to “CDMA Transmitter.”
b) PTL: This latency refers to the time used to transfer one
data packet from the “Packet Sender” of the sender node
to the “Packet Receiver” of the receiver node through
the “CDMA Transmitter” and “Network Arbiter” blocks
using a handshake protocol.
c) PSL: After the receiver node receives a data packet, it
will spend a certain amount of time to store the received
1098
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
TABLE IV
ATL PORTION VALUES OF THE CDMA NOC
TABLE VI
EQUIVALENT NUMBER OF INTERMEDIATE NODES IN THE PTP NOC
Fig. 15. ATL portions of the PTP NoC.
TABLE V
ATL PORTION VALUES OF THE PTP NOC
data packet into “Rx Packet Buffer.” This time duration is
measured as PSL.
The measured values of ATL portions of the CDMA NoC
under different data path configurations are listed in Table IV.
The ATL value of the CDMA NoC can be calculated by directly
adding the three portions under the same configuration.
2) ATL in the PTP NoC: The concept of the ATL portions
of the PTP NoC is illustrated in Fig. 15 with an example that
“Network Node 0” sends one packet to “Network Node 2” via
“Network Node 1.” The black and grey arrows in Fig. 15 represent the same meanings as the arrows in Fig. 14. The meaning
of ATL portions will be explained briefly in the following four
paragraphs.
a) PLL: It is the time used to load one “local packet” into “Tx
packet buffer” in the “Packet Sender” block as illustrated
in Fig. 13.
b) PTL: This latency refers to the time used to transfer one
data packet from the “Packet Sender”
(1)
of a network node to the “Packet Receiver” of an adjacent
node using a handshake protocol.
c) PBL: After a network node receives a packet from another
node, it will check its destination address. If it is a “bypass
packet,” it will be delivered into “Tx Packet Buffer.” The
time spent on this process is called PBL.
d) PSL: It is the time spent on storing one “incoming packet”
into “Rx Packet Buffer” block.
The ATL portion values of the PTP NoC are listed in Table V.
The formula of calculating the ATL of transferring one packet in
the PTP NoC is given in (1). refers to the number of intermediate nodes between the source node and destination node of a
packet. If a packet is transferred between two adjacent network
is 0.
nodes, then
3) Comparing the ATL Values: In Tables IV and V, we can
see that PTL values of the CDMA NoC and the PTP NoC increases as the packet length increases. This is because the data
cells in a packet are sent in a serial manner in the two networks.
Thus, more data cells need more transmission time. Whereas,
the PLL and PSL values of the CDMA NoC and the PTP NoC
are nearly not affected by the packet length. The reason is that
the data cells in a packet are loaded or stored in a parallel manner
in both networks.
The main difference between the ATL values of the two NoCs
is that the ATL value of the CDMA NoC is a constant value for
a certain data packet length, whereas the ATL value in the PTP
NoC is a variable depending on the packet traffic route. The ATL
portion PBL of the PTP NoC does not exist in the ATL of the
CDMA NoC because the data packets in the CDMA NoC are
transferred directly from their source nodes to their destination
nodes. The stable ATL value is an advantage of the CDMA NoC
since it is very helpful for providing guaranteed service in the
network.
The PTL values listed in Table IV show that the data path
width configuration affects the ATL of the CDMA NoC in a
linear manner. For instance, the PTL value of transferring a
three-data-cell packet is reduced around 30 times when the data
path width is increased from 1 to 32 bits. Since the data path
width in the PTP network illustrated in Fig. 12 is realized as
32 bits, only the ATL value of the CDMA NoC with 32-bit
data path width is comparable with the ATL value of the PTP
NoC. However, in order to compare the data transfer latency
characteristics of the two NoCs thoroughly, Table VI lists the
equivalent number of intermediate network nodes which would
be gone through by a data packet in the PTP NoC when the
same size packet is transferred in the CDMA NoC under different data path configurations. From Table VI, we can see that
when the data path widths in the CDMA NoC and the PTP NoC
are both 32 bits, the ATL of delivering a two-data-cell packet
in the CDMA NoC is equivalent to transferring the same packet
between two adjacent network nodes in the PTP NoC, which
means that the ATL of the CDMA NoC equals to the best case
ATL value in the PTP NoC. When transferring a one-data-cell
packet, the ATL in the CDMA NoC is even smaller than the
best case ATL in the PTP NoC as denoted by the negative value
in Table VI. The latency caused by the data encoding
of
and decoding scheme in the CDMA NoC is compensated by
its one-hop data transfer scheme. Hence, the CDMA NoC can
transfer data packets with the equivalent best case ATL of the
PTP NoC when the data path width is set to 32 bits.
WANG et al.: APPLYING CDMA TECHNIQUE TO NoC
TABLE VII
AREA AND POWER COSTS OF THE TWO NETWORKS
E. Comparison of Area and Power Costs
The two simulation networks illustrated in Figs. 10 and 12 are
synthesized using a 0.18- m technology library. The area costs
of the two simulation networks with different data path widths
are listed in Table VII for comparison purpose. According to
the data transactions performed in the gate level simulations and
listed in Table II, the dynamic power costs during simulations
and the energy costs of transferring 32 data bits in the CDMA
NoC and the PTP NoC are also listed in Table VII.
From the figures in Table VII, we can see that when the data
path width is increased from 1 to 32 bits in the CDMA NoC, the
area cost of the CDMA network becomes 2.4 times larger because more logic are used to perform parallel data encoding and
decoding. With 16- and 32-bit data path widths, the CDMA NoC
loses its area cost advantage by comparing with the PTP NoC.
In terms of the dynamic power costs listed in Table VII, a 1-bit
CDMA NoC should not be applied due to the much larger power
consumption by comparing with the CDMA NoCs under other
data path width configurations and the PTP NoC. The reason
of the large power cost of the 1-bit CDMA NoC is that the
1-bit CDMA NoC needs much more switching activities than
the other versions of the CDMA NoC due to the over-serialized data transfer scheme. To be noticed in Table VII is that
the 16- and 32-bit CDMA NoCs have almost the same power
consumptions. The reason is that the power consumption increase caused by the data path width increasing is compensated
by reducing the control logic for data output adjust operations
in each “Packet Sender” in the 32-bit CDMA NoC as explained
in Section IV-C. By comparing the dynamic power costs and
the energy costs of transferring 32 bits in Table VII, we can see
that the PTP NoC has similar dynamic power cost with the 16and 32-bit CDMA NoCs, while the energy figures are slightly
larger than the figures in the two CDMA NoCs. This is because
the PTP NoC takes more time to perform the data transactions
listed in Table II due to its multiple hop data routing scheme.
However, the CDMA NoC can perform the same data transactions with shorter time since its one-hop concurrent data transfer
scheme. Therefore, the average energy spent on transferring 32
data bits in the CDMA NoC, except the 1-bit CDMA NoC, is
smaller than the energy cost in the PTP NoC.
VI. CONCLUSION
An on-chip packet switched communication network that
applies the CDMA technique and supports the GALS communication scheme was presented. The presented CDMA NoC
1099
uses an asynchronous scheme to perform the global data transfers between network nodes, and uses synchronous scheme
to deal with the local data transfers between a functional host
and the network node attached to it. A CDMA encoding and
decoding scheme which suits digital-circuit implementation
was presented. The main advantage of the presented CDMA
NoC is that it can perform data transfer concurrently by applying CDMA technique in the network. Therefore, the large
data transfer latency variance caused by the packet routing in a
PTP NoC is eliminated in the CDMA NoC. The constant data
transfer latency in the CDMA NoC is helpful for providing
guaranteed communication services to an on-chip system.
Another advantage of the CDMA NoC is that it can perform
multicast data transfers easily by utilizing the multiple access
feature of CDMA technique.
Both the asynchronous and synchronous circuits of the
CDMA NoC with different data path widths are realized in
RTL using VHDL in order to suit the conventional synchronous
design flow and tools. Two six-node on-chip networks were
constructed to compare the CDMA NoC with a PTP NoC. One
network applies the CDMA NoC, while the other applies a
bidirectional ring PTP NoC. The two networks were simulated
and compared against each other. The simulation results reveal
that when the data path width of the two simulation networks is
set to 32 bits, the asynchronous transfer latency in the CDMA
NoC is equivalent to the best case data transfer latency in the
PTP NoC. The best case data transfer in the PTP NoC means that
packets are transferred between two adjacent nodes. It indicates
that the data transfers between any network nodes in the CDMA
NoC can be performed as quickly as transferring the same data
packets between two adjacent nodes in the PTP NoC.
By considering the tradeoff between transfer latency performance listed in Table VI and the costs listed in Table VII, a 16-bit
CDMA NoC is a good option for replacing the PTP NoC in an
on-chip system where universal data transfer latency is a desired
requirement. With a 16-bit data path width, the data transfer latency of the CDMA NoC is close to the best case transfer latency
in the PTP NoC while the area and dynamic power costs remain
similar. If the area and power costs have higher priority, the 8-bit
CDMA NoC can be applied because its area is 16.2% smaller
than the PTP NoC while its energy cost of transferring 32 bits
is 21.0% smaller than the cost in the PTP NoC.
REFERENCES
[1] D. Wiklund and D. Liu, “SoCBUS: Switched network on chip for
hard real time systems,” in Proc. Int. Parallel Distrib. Process. Symp.
(IPDPS), 2003, p. 8.
[2] K. Goossens, J. Dielissen, and A. Radulescu, “Æthereal network on
chip: Concepts, architectures, and implementations,” IEEE Des. Test
Comput., vol. 22, no. 5, pp. 414–421, Sep./Oct. 2005.
[3] D. Sigüenza-Tortosa, T. Ahonen, and J. Nurmi, “Issues in the development of a practical NoC: The proteo concept,” Integr., VLSI J., vol. 38,
no. 1, pp. 95–105, 2004.
[4] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communications. Reading, MA: Addison-Wesley, 1995.
[5] R. Yoshimura, T. B. Keat, T. Ogawa, S. Hatanaka, T. Matsuoka, and
K. Taniguchi, “DS-CDMA wired bus with simple interconnection
topology for parallel processing system LSIs,” in Dig. Tech. Papers
IEEE Int. Solid-State Circuits Conf., 2000, pp. 370–371.
[6] T. B. Keat, R. Yoshimura, T. Matsuoka, and K. Taniguchi, “A novel
dynamically programmable arithmetic array using code division multiple access bus,” in Proc. 8th IEEE Int. Conf. Electron., Circuits Syst.,
2001, pp. 913–916.
1100
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
[7] S. Shimizu, T. Matsuoka, and K. Taniguchi, “Parallel bus systems using
code-division multiple access technique,” in Proc. Int. Symp. Circuits
Syst., 2003, pp. 240–243.
[8] M. Takahashi, T. B. Keat, H. Iwamura, T. Matsuoka, and K. Taniguchi,
“A study of robustness and coupling-noise immunity on simultaneous
data transfer CDMA bus interface,” in Proc. IEEE Int. Symp. Circuits
Syst., 2002, pp. 611–614.
[9] R. H. Bell, Jr., K. Y. Chang, L. John, and E. E. Swartzlander, Jr.,
“CDMA as a multiprocessor interconnect strategy,” in Conf. Record
35th Asilomar Conf. Signals, Syst. Comput., 2001, pp. 1246–1250.
[10] E. H. Dinan and B. Jabbari, “Spreading codes for direct sequence
CDMA and wideband CDMA cellular networks,” IEEE Commun.
Mag., vol. 36, no. 9, pp. 48–54, Sep. 1998.
[11] E. S. Sousa and J. A. Silvester, “Spreading code protocols for
distributed spread-spectrum packet radio networks,” IEEE Trans.
Commun., vol. 36, no. 3, pp. 272–281, Mar. 1988.
[12] D. D. Lin and T. J. Lim, “Subspace-based active user identification for
a collision-free slotted ad hoc network,” IEEE Trans. Commun., vol.
52, no. 4, pp. 612–621, Apr. 2004.
[13] D. M. Chapiro, “Globally-asynchronous locally-synchronous systems,” Ph.D. dissertation, Dept. Comput. Sci., Stanford University,
Stanford, CA, 1984.
[14] VSI Alliance, Wakefield, MA, “Virtual component interface standard
version 2,” 2001. [Online]. Available: http://www.vsi.org/
[15] OCP-International Partnership. Beaverton, OR, “Open core protocol
specification,” 2001. [Online]. Available: http://www.ocpip.org/
[16] X. Wang and J. Nurmi, “A RTL asynchronous FIFO design using
modified micropipeline,” in Proc. 10th Biennial Baltic Electron. Conf.
(BEC), 2006, pp. 1–4.
[17] I. E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, no. 6, pp.
720–738, 1989.
[18] X. Wang, T. Ahonen, and J. Nurmi, “Prototyping a globally asynchronous locally synchronous network-on-chip on a conventional
FPGA device using synchronous design tools,” in Proc. Int. Conf.
Field Program. Logic Appl., 2006, pp. 1–6.
[19] X. Wang, D. Sigüenza-Tortosa, T. Ahonen, and J. Nurmi, “Asynchronous network node design for network-on-chip,” in Proc. Int.
Symp. Signal, Circuits, Syst., 2005, pp. 55–58.
Xin Wang received the M.Sc. degree in electronic
circuits and systems from Northwestern Polytechnical University, Xi’an, China in 2002.
Currently, he is a Researcher with the Institute of
Digital and Computer Systems, Tampere University
of Technology, Tampere, Finland. His research interests are focused on on-chip communication networks
and asynchronous circuits design.
View publication stats
Tapani Ahonen received the M.Sc. degree in electrical engineering and the Ph.D. degree in information technology from Tampere University of Technology, Tampere, Finland.
He is a Senior Research Scientist with Tampere
University of Technology. His research interests are
focused on varying aspects of system-on-chip design.
Jari Nurmi is received the Ph.D. degree from Tampere University of Technology (TUT), Tampere, Finland, in 1994.
He is a Professor of digital and computer systems with TUT. He has held various research,
education, and management positions at TUT and
in the industry since 1987. His current research
interests include system-on-chip integration, on-chip
communication, embedded and application-specific
processor architectures, and circuit implementations
of digital communication, positioning, and DSP
systems. He is leading a group of about 25 researchers at TUT. He is the author
or coauthor of about 160 international papers, the editor of Processor Design:
System-on-Chip Computing for ASICs and FPGAs (Springer, 2007), coeditor
of Interconnect-Centric Design For Advanced SoC and NoC (Kluwer, 2004),
and has supervised more than 90 M.Sc., Licentiate, and Doctoral theses.
Dr. Nurmi is currently the general chairman of the annual International Symposium on System-on-Chip (SoC) and of its predecessor SoC Seminar in Tampere since 1999 and a board member of SoC, FPL, and NORCHIP conference
series. He was the head of the national TELESOC graduate school 2001–2005.
He is a senior member in the IEEE Signal Processing Society, the Circuits and
Systems Society, the Computer Society, the Solid-State Circuits Society, and
the Communications Society. In 2004, he was a corecipient of the Nokia Educational Award, the recipient of the Tampere Congress Award in 2005, and
the Academy of Finland Senior Scientist research grant for the academic year
2007–2008.