ARM-CEVA LTE-Advanced White Paper Final
ARM-CEVA LTE-Advanced White Paper Final
ARM-CEVA LTE-Advanced White Paper Final
November 2012
Page 1 of 13
Introduction
LTE (Long Term Evolution) is already gaining momentum as the worlds most rapidly deployed cellular
technology, giving mobile wireless broadband services to millions of users worldwide. Consumers are increasingly
looking for always on, always connected mobile experience delivering high data rate services on small form factor
mobile devices whilst at the same time expecting long battery life to minimize recharge cycles. To meet this ever
growing demand for mobile data, the LTE standard has been extended to offer higher throughputs and greater
efficiencies for mobile operators to offer these services. LTE-Advanced represents the next generation mobile
broadband, and in turn throws the challenge to the designers to create highly power efficient mobile devices capable
of delivering these services. ARM, the leading supplier of embedded processors, physical IP and inter-connect
fabric, along with CEVA, Inc. the leading supplier of embedded DSP cores propose a joint analysis looking at the
design considerations that are required to realize the next generation of mobile wireless broadband devices.
This paper sets out by examining 3GPP release 10 standard (referred hereafter as LTE-A) which was ratified in
March 2011 and is in turn driving the latest generation of user equipment designs. After looking at the standard we
then develop an understanding of the particular design challenges in managing the constraints of throughput, low
latency and low power consumption by presenting an industry leading solution which combines the high performing
yet extremely power efficient technologies delivered today by ARM and CEVA.
Finally, and by way of a conclusion, we also look at wider system level design such as power saving modes, debug
and trace along with the support of multi-mode operation which has become an essential feature given the wide and
diverse adoption of wireless standards worldwide addressing not only LTE-A and LTE but also HSPA+, TDSCDMA and other wireless technologies.
What is LTE-Advanced?
The LTE (Long-Term Evolution) standard was first ratified by 3GPP in Release-8 at December 2008 and was
conceived to provide wireless broadband access using an entirely packet based protocol and was the basis for the
first wave of LTE equipment. LTE has now been adopted by over 347 carriers in 104 countries (Ref GSA)
including such territories as USA, Japan, Korea and China to name but a few,making it the fastest adopted wireless
technology in history.
The wide adoption of LTE is thanks in part to the flexibility of the standard to accommodate disperse requirements
from network operators worldwide. LTE is the first standard having the potential to become a unified global
standard for mobile by converging different 3G and 4G networks into a common 4G platform. With licensed
spectrum becoming an increasingly valuable commodity, LTE brings the ability to deploy mobile wireless
broadband in a wide range of spectrum blending. Coupled with its spectral aggregation flexibility, LTE was also
specified to include advanced signal processing techniques designed to increase its spectral efficiency of the
transmission channel i.e. the bits/second/Hz that the channel can carry with a reasonable error rate. Techniques such
as OFDMA and SC-OFDM modulations, advanced Forward Error Correction (FEC), various MIMO techniques
(Multi-antenna systems) and re-transmission schemes like ARQ and H-ARQ are all combined to give the system a
robust and efficient use of the limited available spectrum. These advanced technologies all demand high levels of
signal processing and as such demand careful design in order to minimize power consumption (battery life) and
maximize performance, both in terms of high throughput and reliable signal reception.
The continued evolution of LTE has been driven by consumer demand for higher bandwidth broadband connections
(e.g. watching streaming video), lower latency connections (e.g. gaming applications) along with the need to deploy
spectrum in a more optimized and efficient manner to allow network operators to maximize their return on
Copyright 2012 CEVA Inc. All rights reserved.
The ARM logo is a registered trademark of ARM Limited.
The CEVA logo is a registered trademark of CEVA,Inc.
All other trademarks are the property of their respective owners and are acknowledged
Page 2 of 13
investment. This trend is expected to continue during the next five years and Cisco projects an 18-Fold growth in
mobile internet data traffic from 2011 to 2016 [1].
LTE-Advanced relates to the latest version of 3GPP standard, release 10 and beyond. This standard builds upon the
existing LTE Release 8 standard and maintains backward compatibility. A number of new features have been added
to LTE-Advanced that allow the requirements outlined above to be met, and crucially it also conforms to the formal
definition of a 4G wireless technology as mandated by the ITU. The new features that are of particular interest for
the purposes of this paper are: carrier aggregation, multi-layer MIMO and system considerations for high throughput
such as HARQ buffer access and system interconnect. Both the carrier aggregation and multi-layer MIMO allow
dramatic increase in throughput and also bring new signal processing demands to the digital baseband.
There have been several public announcements in recent months from network operators stating their intent to
support LTE-Advanced features in the 2013 timeframe. These include AT&T Mobility and Sprint in the USA, with
KT Telecom in Korea and DoCoMo in Japan also considering adopting the technology as an upgrade to their
commercial LTE networks..
Table 1 below shows the 3GPP UE class definition as defined in Release 10 of the standard. As can be seen, there
are a broad range of classes that allow the equipment manufacturer to offer products depending upon end
applications and markets. It is generally regarded that although Cat-8 (UE Category 8) profile has the high
throughput headlines that capture the market attention, in reality it will be very difficult to deploy this in reality as it
requires up to 100MHz of bandwidth (LTE networks are currently deployed in 10MHz to 20MHz) something no
individual operator has access to today. Looking from a more pragmatic standpoint and for the purposes of this
paper we will instead elect to look at the UE Cat-7 requirements, a use case which is expected to be widely adopted.
UE
category
Data rate
(DL/UL)
[Mbps]
DL
Max. num. of
DL-SCH
TBbits
per TTI
Max. num. of
DL-SCH bits
per TB
per TTI
Total num. of
soft channel
bits
Max.
num. of
spatial
layers
UL
Max. num.
of UL-SCH
TB bits per
TTI
10/5
10,296
10,296
250,368
50/25
51,024
51,024
1,237,248
100/50
102,048
75,376
150/50
150,752
300/75
299,552
300/50
301,504
300/100
301,504
3000/1500
2,998,560
Max. num.
of UL-SCH
bits
per TB
per TTI
Support
for
64QAM
5,160
5,160
No
25,456
25,456
No
1,237,248
51,024
51,024
No
75,376
1,827,072
51,024
51,024
No
149,776
3,667,200
75,376
75,376
Yes
51,024
51,024
No
2 or 4
102,048
51,024
No
1,497,760
149,776
Yes
149,776
(4 layers)
75,376
(2 layers)
149,776
(4 layers)
75,376
(2 layers)
299,856
3,654,144
2 or 4
*1
3,654,144
35,982,720
Page 3 of 13
The above block diagram shows a simplified representation of how an LTE-Advanced modem would connect within
a Smartphone design and provides context setting for the modem design discussed in this white paper
The LTE-Advanced modem consists of receive and transmit signal processing chains which serve the radio interface
via a wideband RF transceiver IC. The signal processing is divided into layers as defined in the 3GPP specification,
with layer 1 providing all of the low level signal conditioning concerned with the successful transmission and
reception of the signal. Typical functions in Layer 1 include: forward error correction, interleaving and bit stream
manipulation, constellation-modulation, MIMO encoding, OFDM signal modulation, and RFIC signal conditioning.
All of the Layer 1 functions described fall within the domain of the CEVA processor with a need for control and
management functions to be implemented on an ARM CPU.
Upper layer processing is performed in the ARM Cortex-R7 processor and is represented by Layers 2 and 3 in the
above diagram. The ARM Cortex processor will typically perform functions such as Medium Access Control
(MAC), Packet Data Convergence Protocol (PDCP), Radio Link Control (RLC) and Radio Resource Management
(RRM). The ARM Cortex-R7 processor interfaces to the applications processor which is running the rich OS such
as Android.
Page 4 of 13
High Performance: Cortex-R7 processor provides 2.53 DMIPS/MHz, which meets the most demanding
baseband processing requirements.
Coherency: Cortex-R7 processor contain a Snoop Control Unit(SCU) which automatically maintains
coherency between modem data fed into memory and the processors data cache. This can save
considerable software overhead as well as provision for coherency between the two processors.
Low-Latency Peripheral Port (LLPP): An additional AXI bus port specifically purposed for fast control of
modem hardware without being blocked by large data transactions on the main AXI bus.
Low-Latency RAM (LLRAM): An area of memory used to hold critical software and data such as Interrupt
Service Routines (ISR) that can be executed almost immediately without waiting for main AXI bus
transactions to finish and/or for the ISR to be fetched into level-1 cache.
Tightly-Coupled Memory (TCM): A limited (128 KB) memory resource for the most critical code and data
that can be accessed without the latency incurred by an AXI bus port. This provides for the highest level of
deterministic response to real-time hardware such as an LTE L1 physical layer.
Integrated Generic Interrupt Controller (GIC): Allows flexible interrupt distribution and rapid interrupts
between the processors e.g. routing from air interface/CEVA domain to ARM.
Low-latency interrupt mode: An interrupt mode particular to the Cortex-R processor family which takes
interrupts in as few as 20 cycles e.g. for time critical air frame processing.
Asymmetric Multi-Processing (AMP): Whilst the Cortex-R7 processor supports Symmetric MultiProcessing (SMP), there is also provision for configuring the Quality of Service (QoS) within the SCU
block such that each processor can have priority of access to a select range of memory and I/O addresses,
and not be blocked by the other processor.
Page 5 of 13
Page 6 of 13
Page 7 of 13
and reducing overall power consumption by minimizing costly access to off-chip memories and allowing the cores
to spend longer durations in power saving modes.
When designing a SoC it is imperative that the designer pays particular attention to the memory and bus architecture
in order to avoid costly penalties in performance through bottlenecks in the design, or conversely making the
solution costly by adding inappropriate sized on chip memories which both add to die area and increase power
consumption. The Cortex-R7 processor Low Latency Peripheral Port (LLPP) can be used to provide an optimized
interface for compute layer 2 & 3 offload functions such as Cipher and Robust Header Compression (RoHC), both
of which need careful architectural consideration to deliver optimized performance without impacting overall
aggregate system throughput.
Through careful design, it is possible to realize an efficient balance of performance/cost/power with the ARM and
CEVA architecture by making use of a variety of both on chip and off chip memory types. The Local AXI bus gives
dedicated access to a low latency tightly coupled memory which can be used for enabling time critical, deterministic
tasks that cannot tolerate cache misses/variable latency. The main AXI bus gives access to system flash and
SDRAM blocks which are typically off-chip resources but often integrated into the baseband package by way of
stacked die to save PCB area. The Flash memory is used to boot the entire system, during start-up the Cortex-R7
will configure the CEVA sub-system and initialize all memories.
Table 2 below shows a summary of typical memory types that you would expect to see in an LTE-Advanced modem
design. As can be seen from this table, the H-ARQ cache and IQ receive buffers account for an increasing amount
of die area on the baseband. H-ARQ buffers are used for recombining received data and since the data is stored in
soft bits (log likelihood ratios of a 1 or a 0 rather than in binary bits then the memory requirements scale quickly.
As well as looking at compression techniques to reduce the size of the H-ARQ buffer, designs also consider locating
the buffer in off-chip SDRAM in order to reduce the size/cost of the digital baseband die. The combination of
CEVA and ARM IP helps to minimize processing latency through the system and also provide optimized bus
interconnects which can help realize such memory optimizations.
Memory
IQ Receive Buffer
HARQ Cache
On Chip
On Chip
Location
Size
450 KB for Cat-7
344 KB
Layer 2 Cache
On Chip
128KB ARM
TCM
On Chip
On Chip
DDR Memory
Flash Memory
HARQ Buffer
Off Chip
Off Chip
Off Chip
Page 8 of 13
Comment
Buffers RX IQ samples
LLR soft bits for
combining
Typical size, but can be
optimized through code
profiling
1Gb LP-DDR2
1Gb NAND
Cat-7 requirements located
in off-chip DDR.
LTE-Advanced SW Architecture
Diagram 5 shows a typical software mapping of the LTE-A modem. As can be seen from the diagram, the Layer
one processing is split into transmit and receive with a single CEVA XC4100 managing the transmit path and two
CEVA-XC4200 in the receiver. Layer 1 is used to encode/decode the data for over the air transmission, this is done
to maximize throughput through adaptive modulation and coding as well as maximizing robustness through a
number of schemes including forward error correction, interleaving and Hybrid-ARQ (HARQ). HARQ is a scheme
that manages the selective re-transmission of data that has not been correctly received, and in order to manage this
process it is necessary for the UE to hold the H-ARQ buffer. Due to the high data rates and low latency
requirements of LTE-A the buffers need to be quite large (see Table 2 for a system memory summary) and needs
careful management to minimize the cost of the final device.
Moving on from the Layer 1 we arrive in the ARM Cortex processor domain. A low level Layer 1 controller
services the Layer 1 scheduling. This function is extremely time critical and is typically running on an LTE subframe level of 0.5mS. Events are driven by the generic interrupt controller (GIC) which is sourced from Layer 1/air
frame events and in-turn interrupt the Cortex-R7 processor for associated Tx and Rx related processing. The
number of interrupt sources depends heavily on the Layer 1 implementation, but can range from 10s to 100s
feeding the Layer 1 controller. The purpose of the controller is to manage the flow of data in and out of the L1 as
well as providing all necessary control information that has flowed down from the upper stack. The real time
characteristics of the Cortex-R7 processor make it particularly suited for this task providing guaranteed run times for
the time critical tasks through use of tightly coupled memories and low latency pipeline architecture. The CortexR7 processor pipeline architecture and branch predictor help to optimize interrupt response times and give
deterministic behavior which is critical when you have hard real time constraints such as in wireless systems. Since
there is no memory management unit (MMU) it also removes the need for complex page table walk operations when
interrupts occur which would further delay responsiveness.
Asymmetric Multi-Processing (AMP) is a provision in the Cortex-R7 processor for configuring the Quality of
Service such that each processor can have priority of access to a select range of memory and I/O addresses, and not
be blocked by the other which allows certain functionality and cores to have priority over others. This functionality
is particularly important when executing time critical routines such as the low level layer 1controller function which
must process payload data in a time critical manner in accordance with the air interface frame rate.
Above the Layer 1 controller we then move through the respective protocol layers of the 3GPP specification. The
mapping of the layers is shown as an example to give an indication of how the ARM architecture, on which the
Cortex-R7 dual core processor is based, can be fully utilized to load balance tasks across the two processors and as
such help to guarantee the low level real time requirements in the software. The cache coherent interconnect of the
Cortex-R7 processor integrates the multi-processing architecture such that it presents a coherent programming
model, removing the traditional complexities of a multi-core environment. The cache-coherent interconnect
manages the Layer 1 and Layer 2 caches to maintain coherency across them independent to the respective memory
access of each core within the Cortex-R7 processor. The net result of this architecture provides a safe and robust
memory system whereby the programmer doesnt need to manage the cache coherency and can in turn enable
seamless task migration across the two cores to maintain optimal load balancing/power efficiency.
The software runs under an embedded Real Time Operating System (RTOS) such as ThreadX from Express
Logic[2] and Nucleus from Mentor Graphics[3] who both offer Cortex-R7 processor support. At the top of the stack
we have an application layer which provides an interface to the rest of the system, in the case of a USB dongle we
would expect to interface to a USB stack at this point but could also implement IP routing or applications such as
Voice Over LTE (VoLTE).
Page 9 of 13
VoLTE is a new technology that offers voice services over the packet based LTE network. Traditionally voice
services are served in a circuit switch manner over 2G and 3G networks, but as operators look to re-farm 2G and 3G
spectrum to LTE then in turn they also need a unified mechanism to deliver voice. The VoLTE standard is now in
early phase deployment with several operators including SKT in Korea who claim to be the worlds first to offer this
service. The advantage of VoLTE is that it allows voice and data to be served from a single LTE network
(removing the need for multi-mode support of legacy standards) and due to higher bandwidth capabilities then it
allows operators to offer higher quality audio often marketed as HD Voice. The inclusion of VoLTE in turn adds
software requirements to the LTE modem as it is necessary to manage the voice protocol S/W as well as the LTE
modem.
Page 10 of 13
1) Active mode: The UE is fully active with all or most blocks powered up. A typical use case scenario would
be video call, video streaming or TCP/IP data transfer. In this mode both the ARM and CEVA sub-systems
are powered on supporting uplink and downlink data transfers as well as the associated signaling.
2) VoLTE mode: VoLTE (voice over LTE) is an emerging standard that supports voice services over a packet
based radio bearer. VoLTE consists of a standardized voice codec/signaling layered onto the LTE air
interface. The support of voice results in small packet transmission and reception (small, infrequent data
transmission) which in turn allows the UE to perform power saving operations during the idle times. The
ARM control processor will manage the overall power saving scheme as it has knowledge of the
scheduling of the voice packets and will thus in turn move the CEVA in and out of power save accordingly.
Additionally, due to the multi-processing capabilities of the Cortex-R7 processor, the VoIP stack SW as
well as the LTE protocol SW can be implemented on the same device, hence allowing for wider system
power saving by powering down other processors such as an applications processor running a rich OS.
3) Idle Mode: In this scenario, the UE does not have any active data sessions, but is camped onto the network
and performing regular synchronization/location-update operations. Since the LTE standard is architecture
to incorporate power saving, the ARM control processor is able to cycle the UE in and out of power saving
modes accordingly to either listen to broadcast channels or transmit location update information. During
the power save mode the UE can be almost entirely shut down except for a small low power timer block
which is configured to wake the system at the appropriate times.
Both the ARM Cortex-R7 processor as well as the CEVA XC4000 series of cores are architected to achieve industry
leading power consumption by efficient pipeline architectures and low gate count implementations and by
incorporating advanced power saving mechanisms like the CEVA-XC Power Scaling Unit (PSU) along with the
ARM Cortex-R7 processors high performance and low power capabilities, such as the snoop control unit (SCU),
low latency RAM (LLRAM), tightly coupled memories (TCM) and asymmetric multi-processing (AMP).
Page 11 of 13
References
For more information on CEVA please visit http://www.ceva-dsp.com/CEVA-XC-Family.html
For more information on ARM please visit: http://www.arm.com
Global mobile Suppliers Association www.gsacom.com
3GPP Release 10 www.3gpp.org/Release-10
[1] http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11520862.html
[2] http://www.rtos.com/
[3] http://www.mentor.com/embedded-software/nucleus/
Copyright 2012 CEVA Inc. All rights reserved.
The ARM logo is a registered trademark of ARM Limited.
The CEVA logo is a registered trademark of CEVA,Inc.
All other trademarks are the property of their respective owners and are acknowledged
Page 12 of 13
Glossary
3GPP
ARM
ARQ
AXI
CEVA
FEC
GSA
H-ARQ
HSPA+
ITU
LTE
MIMO
MNO
OFDMA
PCB
SDRAM
SOC
TD-SCDMA
UE
VoLTE
WCDMA
Page 13 of 13