Ebook The Definitive Guide To Cloud Acceleration - HTM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 119

The Definitive Guide to

Cloud Acceleration
Dan Sullivan

sponsored by
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 1: Cloud Computing and Challenges to Delivering Services ................................................ 1


Common Definition of the Cloud .................................................................................................................. 1

Scalability ........................................................................................................................................................... 1
Self-Service ........................................................................................................................................................ 2
Pay-for-Service Model .................................................................................................................................. 4
Differences with Pre-Cloud Architectures ........................................................................................... 4

Categorizing Clouds ............................................................................................................................................ 6


Cloud Access Models ..................................................................................................................................... 6
Public Clouds ............................................................................................................................................... 6

Private Clouds ............................................................................................................................................. 7


Community Clouds: Part Public, Part Private ................................................................................ 8
Hybrid Clouds .............................................................................................................................................. 8
Cloud Service Models .................................................................................................................................. 10

Infrastructure as a Service ................................................................................................................... 10


Platform as a Service .............................................................................................................................. 11
Software as a Service ............................................................................................................................. 12

Application Response Time and Benefits of Cloud Acceleration .................................................. 13


Adverse Effects of Slow Application Response Time .................................................................... 13
Improving Application Response Time .............................................................................................. 14

Software-based Options ....................................................................................................................... 14

Hardware Options ................................................................................................................................... 14


Network Issues and Cloud Acceleration ........................................................................................ 15
Challenges to Cloud Acceleration ............................................................................................................... 15

Scalability and Geographic Reach ......................................................................................................... 15


Redundancy .................................................................................................................................................... 16
Consolidation of Services and Costs ................................................................................................ 17
Key Considerations for Deploying Cloud Applications ..................................................................... 17

i
The Definitive Guide to Cloud Acceleration Dan Sullivan

Summary ............................................................................................................................................................... 19
Chapter 2: How Websites and Web Applications Work ........................................................................ 20

Optimizing the Delivery of Static Content .............................................................................................. 21


Multiple Object Static Web Pages .......................................................................................................... 21
Determining Web Page Load Times ..................................................................................................... 24
Assessing the Scope of the Problem ..................................................................................................... 26

Amount of Static Content ..................................................................................................................... 27


Distance Between Source and Destination ................................................................................... 27
Popularity of Content ............................................................................................................................. 27

Distributed Caching and Persistent Storage ..................................................................................... 28


Browser Caching ...................................................................................................................................... 28
Web Server Caching ................................................................................................................................ 29
Proxy Caching ............................................................................................................................................ 30

Caching vs. Replication .......................................................................................................................... 31


Optimizing the Delivery of Dynamic Content ........................................................................................ 34
Assessing the Scope of Dynamic Content ........................................................................................... 36

Number of Requests for Dynamic Content ................................................................................... 36


Importance of Dynamic Content ....................................................................................................... 36
Distance to Origin .................................................................................................................................... 37

Reducing the Burden on the System of Origin ................................................................................. 37

TCP Optimizations ....................................................................................................................................... 38


Terminating SSL at Network Edge ........................................................................................................ 39
Reduce Packet Loss ..................................................................................................................................... 39

Targeted Content Delivery ............................................................................................................................ 40


Summary ............................................................................................................................................................... 40
Chapter 3: Why the Internet Can Be the Root Cause of Bottlenecks ................................................ 41
Two Fundamental Characteristics of the Internet that Influence Performance .................... 41

ii
The Definitive Guide to Cloud Acceleration Dan Sullivan

Technical Characteristic: Protocols ...................................................................................................... 41


Organizational Characteristic: Peering ............................................................................................... 42

Measuring Performance ................................................................................................................................. 45


Throughput ..................................................................................................................................................... 46
Latency .............................................................................................................................................................. 47
Packet Loss ...................................................................................................................................................... 48

Protocol Issues ................................................................................................................................................... 49


Web Application Performance and HTTP .......................................................................................... 49
Multiple Objects Per Web Page .............................................................................................................. 50

TCP and Implications for Web Application Performance ........................................................... 51


TCP Handshake ......................................................................................................................................... 52
Reliable and Ordered Transfer of Data .......................................................................................... 53
Improving Protocol Performance .......................................................................................................... 56

Application and Data Replication ..................................................................................................... 56


Maintaining Pools of Connections .................................................................................................... 57
Optimizing TCP Traffic .......................................................................................................................... 57

Peering: Linking Networks ........................................................................................................................... 58


Internet Exchange Points ..................................................................................................................... 59
Performance Issues Related to Peering .............................................................................................. 59

Potential for Sub-Optimal Routing ................................................................................................... 59

Congestion .................................................................................................................................................. 60
Summary ............................................................................................................................................................... 60
Chapter 4: Multiple Data Centers and Content Delivery ....................................................................... 61

Appeal of Deploying Multiple Data Centers ........................................................................................... 61


Application Maintenance .......................................................................................................................... 62
Data Loss .......................................................................................................................................................... 62
Data Loss Due to Hardware Failure ................................................................................................. 63

iii
The Definitive Guide to Cloud Acceleration Dan Sullivan

Data Loss Due to Software Failure and Human Error ............................................................. 63


Network Disruption .................................................................................................................................... 64

Disruption of Environmental Controls in Data Center ................................................................. 65


Reduced Latency ........................................................................................................................................... 66
Challenges to Maintaining Multiple Data Centers ............................................................................... 67
Costs of Data Centers/Rising Costs of Colocation .......................................................................... 67

Need for Specialized Expertise ............................................................................................................... 68


Software Errors ............................................................................................................................................. 68
Synchronization Issues .............................................................................................................................. 69

Unaddressed Content Delivery Challenges ....................................................................................... 69


Combining Data Centers, Content Delivery Network, and Application Acceleration .......... 70
Optimizing Network Traffic in the Middle Mile .............................................................................. 72
Caching .............................................................................................................................................................. 74

Load Balancing .......................................................................................................................................... 74


Monitoring Server Status ..................................................................................................................... 75
Fault Tolerant Clusters of Servers .................................................................................................... 75

Virtual IP Address and Network Failover ..................................................................................... 75


Benefits of Multiple Data Centers, Content Delivery Networks, and Application
Delivery Acceleration ................................................................................................................................. 75

China: Country-Specific Issues in Content Distribution ................................................................... 76


Technical Challenges to Delivering Content in China ................................................................... 77
The Great Firewall of China ..................................................................................................................... 78

Content Delivery Network Considerations in China ..................................................................... 79


Summary ............................................................................................................................................................... 79
Chapter 5: Architecture of Clouds and Content Delivery ...................................................................... 80
Public Cloud Providers and Virtualized IT Infrastructure .............................................................. 80

Essential Characteristics of Cloud Computing ................................................................................. 81

iv
The Definitive Guide to Cloud Acceleration Dan Sullivan

On-Demand Service ................................................................................................................................ 81


Broad Network Access .......................................................................................................................... 82

Resource Pooling ..................................................................................................................................... 82


Rapid Elasticity ......................................................................................................................................... 82
Measured Service .................................................................................................................................... 83
Cloud Computing Deployment Models ............................................................................................... 84

Cloud Service Models .................................................................................................................................. 85


Infrastructure as a Service ................................................................................................................... 86
Platform as a Service ................................................................................................................................... 87

Software as a Service .................................................................................................................................. 88


Application Design and Application Architecture ............................................................................... 90
Designing for Server Failover ................................................................................................................. 90
Application Server Replication ............................................................................................................... 91

Content Caching ............................................................................................................................................ 93


Network Optimization ................................................................................................................................ 94
Content Delivery Networks Complement Cloud Providers ............................................................ 95

Chapter 6: How to Choose a Cloud Application Acceleration Vendor ............................................. 96


Global Reach ........................................................................................................................................................ 97
Technical Dimension of Global Reach ................................................................................................. 97

Business Dimension of Global Reach ................................................................................................... 98

Dynamic Content Acceleration .................................................................................................................... 99


High Availability ......................................................................................................................................... 100
Faster Application Performance ......................................................................................................... 101

Better End User Experience .................................................................................................................. 101


Security Considerations .............................................................................................................................. 101
SSL Encryption and Cloud Acceleration .......................................................................................... 102
Need for Encryption vs. Cost ........................................................................................................... 102

v
The Definitive Guide to Cloud Acceleration Dan Sullivan

Accelerating Encryption .................................................................................................................... 102


Distributed Denial of Service (DDoS) Protection ........................................................................ 103

The Structure and Function of a DDoS Attack .......................................................................... 103


Targets of DDoS Attacks .................................................................................................................... 105
Responding to DDoS Attacks ........................................................................................................... 106
Data Security ............................................................................................................................................... 107

Authentication ............................................................................................................................................ 107


Architecture Considerations ..................................................................................................................... 107
Key Business Considerations .................................................................................................................... 108

Impact of Slow Applications ................................................................................................................. 109


Adverse Customer Experiences ...................................................................................................... 109
Summary ............................................................................................................................................................ 110

vi
The Definitive Guide to Cloud Acceleration Dan Sullivan

Copyright Statement
2013 Realtime Publishers. All rights reserved. This site contains materials that have
been created, developed, or commissioned by, and published with the permission of,
Realtime Publishers (the Materials) and this site and any such Materials are protected
by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtime Publishers its web site
sponsors. In no event shall Realtime Publishers or its web site sponsors be held liable for
technical or editorial errors or omissions contained in the Materials, including without
limitation, for any direct, indirect, incidental, special, exemplary or consequential
damages whatsoever resulting from the use of any information contained in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtime Publishers and the Realtime Publishers logo are registered in the US Patent &
Trademark Office. All other product or service names are the property of their respective
owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtime Publishers, please contact us via e-mail at
[email protected].

vii
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 1: Cloud Computing and


Challenges to Delivering Services
Cloud computing is an increasingly popular way to use computing and storage
technologies, and it is changing the way businesses deliver services. As with any
innovation, you have to adapt your methods and procedures to take full advantage of the
new technology. This guide examines how cloud computing and the architecture of the
Internet shape service delivery, the challenges presented to reaching a global customer
base, and techniques for accelerating content delivery. This chapter begins with an
overview of cloud computing as well as key considerations for delivering services through
the cloud.

Common Definition of the Cloud


Cloud computing is a model of delivering computing, storage, network and/or
infrastructure in a shared manner that allows for on-demand scalability, self-service, and
typically a pay-for-service pricing model.

Scalability
Scalability implies the ability to shift the amount of computing and storage as needed to
meet current needs. For example, if a business experiences a spike in demand for one of its
Web applications, the business might need to bring additional servers online to respond to
all requests in an acceptable time.
In a cloud, these additional servers are already physically present in a data center. A cloud
operating system (OS) is typically in place to deploy virtual images to additional servers
and reconfigure load balancers, if required, to include the additional servers in an
application cluster (see Figure 1.1).

1
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 1.1: Clouds provide for rapid scalability.

Scalability implies the ability to rapidly downsize resources as well. In the given example,
when the spike in traffic subsides, some of the servers would be released from the cluster
and returned to the pool of cloud resources for other applications or customers to use as
needed.
Storage services are treated in an analogous way in cloud computing. As more storage is
required, it is allocated from a shared pool of storage resources. When it is no longer
needed, storage is returned to the pool for others to use.

Self-Service
Prior to the advent of cloud computing, when an application administrator needed to scale
computers to an application cluster or upgrade a server, it meant submitting requests to
systems administrators and possibly provisioning additional hardware. Cloud computing
platforms provide end users with the ability to provision servers and storage as needed
through a cloud administration interface (see Figure 1.2).
Typically, these interfaces allow users to specify:

The size of virtual machines to deploy


The number of virtual machines
The location of the data center to deploy the virtual machines
The virtual image to deploy to each server

2
The Definitive Guide to Cloud Acceleration Dan Sullivan

As clouds are virtualized computing resources, cloud providers can offer a wide range of
machine configurations. For example, a small server might include 1 core, 2GB of memory,
and 200GB of local storage, while a higher-end server might include 8 cores, 32GB of
memory, and 1TB of local storage. Cloud users can choose the optimal configuration based
on costs and requirements. CPU and memory-intensive applications might require a large
and more costly server, while another application could be more cost effectively run on a
number of low CPU/low memory virtual machines.
Cloud providers also maintain a catalog of virtual images. These can include a variety of
OSs and preconfigured applications. If business analysts frequently work with a set of ad
hoc reporting, statistical analysis, and visualization tools, the cloud provider can deploy a
virtual image with these applications installed and configured so that they are readily
available when needed.


Figure 1.2: Self-service allows non-IT users to configure their own computing and
storage resources.

3
The Definitive Guide to Cloud Acceleration Dan Sullivan

Pay-for-Service Model
Another distinguishing feature of cloud computing is the pay-for-service model. Instead of
buying dedicated hardware for an application, application managers now have the option
of essentially renting resources when those resources are needed, and paying for only what
is used.
Servers are typically billed in hour or minute time increments. The per-unit-of-time charge
will vary with the virtual machine configuration and can range from pennies to dollars per
hour per machine. Storage is usually charged based on the amount of storage used and the
length of time data is stored.

Differences with Pre-Cloud Architectures


In many ways, cloud computing is not a new technology but rather a new way of using
existing technologies. The building blocks of cloudscommodity hardware, virtualization
platforms, widely used OSs and applications, and networking infrastructurewere all in
use prior to the development of cloud computing. In spite of the similar components, there
are significant differences between cloud computing architectures and pre-cloud
architectures.
Pre-cloud architectures often suffered from under utilization. Systems designers would
understandably configure servers for peak capacity so that applications would remain
responsive under heavy but expected loads. In other cases, applications would be deployed
to dedicated servers to keep them isolated from other applications and allow for OS
configuration specifically tuned for that one application. A disadvantage of these
approaches was that the business was paying for computing capacity it often did not use.
Server virtualization helped to reduce underutilization while maintaining OS isolation, see
Figure 1.3; however, virtualization was managed by systems administrators, unlike the
self-service approach of cloud computing.
Prior to the cloud, there was less sharing of computing resources. Hardware is often
purchased for a specific project or department, so it tends to be dedicated to that need.
There are few incentives to share the resource or the cost of maintaining it. Cloud
computing platforms track utilization and allow businesses to charge back to users for the
resources they use. Having a charge-back system is less a technical advance than an
organizational one. Now businesses can easily account for and bill for shared resources.

4
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 1.3: Prior to virtualization, it was common practice to dedicate a physical
server to a single application or task. Virtualization allows for multiple applications
to run on a single server while still maintaining OS isolation.

As previously mentioned, common characteristics of cloud computing include scalability,


self-service administration, and pay-for-service charges. This combination of features has
enabled more efficient use of computing and storage services and underlies more
innovative use of computing resources. Starting with these three essential characteristics
of cloud computing, three distinct deployment models have emerged.

5
The Definitive Guide to Cloud Acceleration Dan Sullivan

Categorizing Clouds
Cloud computing services can be categorized according to who is granted access to the
cloud and by the types of services offered by the cloud.

Cloud Access Models


Clouds can be categorized according to who is granted access. Three typical access models
are:

Public cloud
Private cloud
Hybrid cloud
Each of these deployment models has its benefits and drawbacks.

Public Clouds
Public clouds are essentially open to any user. Many cloud providers are well known in the
IT industry and include Amazon, Microsoft, Google, IBM, HP, and Rackspace. One of the
advantages of a public cloud is the low barrier to entry: virtually anyone with a credit card
can set up an account and provision resources.
Also, public cloud providers have the advantage of specializing in cloud services offerings.
They realize economies of scale, can invest in specialists to design and maintain their
infrastructure, and can raise the capital required to deploy substantial cloud services.
Common characteristics of public cloud providers include:

Maintain multiple data centers


Have redundant networks
Have sufficient compute and storage resources to meet demand
Provide standard service level agreements (SLAs)
Public cloud providers distinguish themselves more on specialized services than on price.
For example, a cloud provider might offer a high-performance computing cluster designed
with high-speed network interconnects for low latency and flash drives for improved I/O
performance. In other cases, a provider might offer a low-cost storage service for archiving,
private networks for added security, or accounting and billing services tailored to
enterprise customers.
Although public clouds may offer a combination of commodity and specialized services,
they do not always meet the needs of enterprise customers. For example, some public cloud
offerings might not meet the requirements of industry regulations such as the Payment
Card Industry Data Security Standard (PCI DSS). Retailer businesses and others using
payment cards would not be able to run applications or store data subject to PCI DSS in
those clouds and still remain in compliance.

6
The Definitive Guide to Cloud Acceleration Dan Sullivan

Some businesses may not allow confidential or sensitive data to reside on servers or
storage systems outside of corporate control due to concerns about data leaks and loss of
confidentiality. However, data can be readily encrypted before it leaves corporate control.
Depending on jurisdiction, businesses may be required to keep confidential and private
information within the jurisdiction or within a partner jurisdiction with equivalent privacy
protections.
Although the benefits of public cloud computing are well understood, for some business
cases, a private cloud may be a more appealing option.

Private Clouds
Private clouds are controlled by organizations behind their firewalls and limit access to the
cloud to organization members or partners. Large businesses and governments can have
the need for and resources to build and maintain private clouds. Fortunately, businesses do
not need to start from scratch to build a private cloud; IT vendors offer cloud computing
packages that include the hardware and software required for a private cloud.
The single most significant benefit of a private cloud is that the organization deploying it
maintains full control:

Determining who has access to cloud resources


Defining policies and procedures for allocating cloud resources
Specifying charge-backs for services
Implementing specialized software services, for example, a message queue, or
hardware, such as flash storage devices
Implementing monitoring and auditing procedures according to the organizations
particular needs
The obvious drawbacks of private clouds are the capital expenditure to acquire the
infrastructure and the ongoing costs of maintaining a private cloud. If resiliency is required
for your business cloud applications, you will probably need to maintain multiple data
centers.

One option for private clouds is to locate your infrastructure in a third-party data center.
This option affords some economies of scale and specialization of labor with regards to
managing the physical infrastructure and redundant network services. The business still
retains control over the computing and storage infrastructure, so many of the benefits of an
on-premise private cloud remain in place.

7
The Definitive Guide to Cloud Acceleration Dan Sullivan

Community Clouds: Part Public, Part Private


The commonly used public-private dichotomy does not cover all options with regards to
cloud access models. The community cloud, sometimes referred to as a gated community
model, has characteristics of both private and public clouds. Community cloud providers
screen potential customers before granting them access to clouds.
This setup is designed to ensure that only legitimate organizations that meet the vendors
criteria can make use of the community cloud. For example, a community cloud provider
specializing in healthcare might accept only healthcare provider and insurers as customers.
This model allows the vendor to specialize services to their target market, such as
providing more in-depth auditing information to meet Health Insurance Portability and
Accountability Act (HIPAA) compliance regulations.

Hybrid Clouds
A hybrid cloud, as the name implies, is a combination of private and public clouds. The
model was developed by the desire for the benefits of both private and public clouds. In a
hybrid cloud, jobs and data that need to stay within the corporate network can run on the
private cloud while other jobs and data can be shifted to a public cloud provider, as Figure
1.4 shows. This approach can reduce the demand for private cloud resources and therefore
reduce the capital expenditure needed to establish a private cloud.
Maintaining a hybrid cloud introduces challenges not encountered with the other models. If
the cloud OSs running in the private and public clouds are not compatible, you might find
yourself maintaining two catalogs of virtual images as well as two access control systems.
Accounting and billing might also require different systems and create additional work to
integrate. Using the same cloud OSfor example, OpenStackin both the public and
private clouds can reduce integration challenges. Compatible cloud OSs, such as the
Amazon AWS platform and Eucalyptus, are not the same but use common APIs that can
reduce the challenges to implementing a hybrid cloud.

8
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 1.4: Hybrid clouds combine private and public clouds and allow for workloads
to move between the two.

Public, private, and hybrid clouds can all be used to deploy services for the benefit of
customers, partners, and employees. The choice of the most appropriate access model will
vary according to security, compliance, performance, and cost constraints.

In addition to categorizing clouds by access model, it is common to distinguish public


clouds by the types of services offered.

9
The Definitive Guide to Cloud Acceleration Dan Sullivan

Cloud Service Models


Clouds are often grouped into one of three service categories:

Infrastructure as a Service (IaaS)


Platform as a Service (PaaS)
Software as a Service (SaaS)
These categories offer increasing levels of specialization and reduced levels of management
overhead.

Infrastructure as a Service
IaaS clouds offer access to virtual servers, storage, and related services. Cloud users
provision virtual servers and storage as needed, and manage all aspects of the
infrastructure at the OS level and above (see Figure 1.5). This option gives users substantial
control over the size of virtual servers used, the software installed, and the way storage
systems are utilized.
This model also imposes the most responsibility on the cloud users. For example, software
engineers using a public cloud for development would need to select an appropriate-size
machine, load a virtual image with an appropriate OS, install additional tools if needed, and
configure persistent storage.
IaaS solutions are good choices when you need to maximize control over the OS,
applications, and storage options. Alternatively, if you need less control over the
infrastructure, a PaaS cloud may be a suitable option.


Figure 1.5: Infrastructure as a Service provides primarily computing, storage, and
networking services.

10
The Definitive Guide to Cloud Acceleration Dan Sullivan

Platform as a Service
PaaS clouds provide access to application services while alleviating the need for device
management (see Figure 1.6). For example, a developer might use a PaaS cloud to run a
large number of tests on a new software. The developer can choose the appropriate
number of preconfigured servers and submit the job without needing to set up the servers
themselves.
PaaS can also reduce the time required to set up and manage application stacks. Instead of
setting up application and database servers, PaaS users can use the application and data
management platforms provided by the PaaS cloud. Google App Engine, for example, allows
software developers to run their Java or Python applications on Google infrastructure
without the need to manage virtual machines. Microsoft Windows Azure cloud includes a
relational database service, Azure SQL, which a business can use instead of managing its
own Microsoft SQL Server instance. The lines between IaaS and PaaS are sometimes
blurred, as IaaS providers offer services, such as databases and messaging services, as part
of their IaaS services.


Figure 1.6: Platform as a Service extends the IaaS level of services to include
application stack services.

11
The Definitive Guide to Cloud Acceleration Dan Sullivan

Software as a Service
The third category of cloud service type, SaaS, provides fully functional applications to end
users. Applications as different as word processing and customer relationship management
(CRM) are available from SaaS providers. A key advantage of the SaaS model is that users
do not have to manage any part of the infrastructure. Some applications will require end
users to configure access controls and program options and other application settings, but
the SaaS provider manages all aspects of the computing, storage, and network
infrastructure, as Figure 1.7 illustrates.


Figure 1.7: Software as a Service provides turnkey applications that minimize the
demands on end users to set up and configure the application.

12
The Definitive Guide to Cloud Acceleration Dan Sullivan

SaaS has created opportunities for both SaaS consumers and SaaS providers. Users of SaaS
services can reduce or eliminate the need to maintain specialized applications in-house or
in a cloud. For example, an architecture firm using a SaaS for managing its financials can
avoid having to run a financials package in-house and may be able to reduce the number of
staff dedicated to supporting the financial package. SaaS providers have opportunities to
create services that might not be efficiently implemented within a single organization. For
example, a SaaS that provides HIPAA-compliant records management services could find a
large market of small and midsize healthcare providers interested in their services. SaaS
providers may implement their applications in public, private, or hybrid clouds.

Application Response Time and Benefits of Cloud Acceleration


Cloud computing and the global reach of the Internet has created opportunities for
businesses to expand their markets and customer base. The scalability and elasticity of
cloud computing allows businesses to grow their computing systems according to their
business demand. This flexibility lessens the need to make capital expenditures for
hardware that might be needed in the future. It also allows operators to make decisions
about provisioning compute and storage services at a much more fine-grained level. If
there is a peak demand for a day or two, then additional servers can be provisioned in the
cloud. When demand then subsides, those servers can be released. Compute and storage
elasticity are essential parts of maintaining quality of service. They are not, however, the
only factors.

Adverse Effects of Slow Application Response Time


From a customers perspective, the quality of an application is determined in part by its
responsiveness. Applications that appear to run slowly are problematic from a users
perspective and can lead to user dissatisfaction and lost revenue. A number of studies have
demonstrated a correlation between application response time and discontinued use of a
Web-based application. According to a study by the Aberdeen Group, a 1-second delay in
page load times can result in:

11% fewer page views


16% decrease in customer satisfaction
7% loss in conversions
Another set of findings published by KissMetrics reveals that:

73% of mobile device users report encountering Web sites that were slow to load
47% of consumers expect Web pages to load in 2 seconds or less
40% abandon sites that take more than 3 seconds to load
79% of shoppers who are dissatisfied with the sites performance are less likely to
buy from that site again
Clearly, the responsiveness of an application can have a direct impact on customer
satisfaction, loyalty, and ultimately revenue.

13
The Definitive Guide to Cloud Acceleration Dan Sullivan

Improving Application Response Time


Many factors contribute to application responsiveness, such as the way the application
code is written, the way the database has been designed, and network throughput and
latency.

Software-based Options
One way to improve performance is to tune application code. This task can include:

Selecting more efficient algorithms


Analyzing code to identify time-consuming functions
Re-writing database queries to reduce the amount of data returned
Tuning database design by implementing additional indexes and other measures to
reduce I/O operations performed by the database
Improving software can yield significant improvements in some cases, but these
improvements can be costly and may require more time than other options to implement.

Hardware Options
The cloud also allows businesses to implement a well-known but sometimes questionable
practice of throwing more hardware at the problem. Rather than review and revise code,
it might be faster to simply scale up the servers that are running the code. One could scale
vertically by deploying the application to a server with more cores and memory and faster
storage devices. Alternatively, applications that lend themselves to distributed workloads
can scale horizontally. This action entails adding additional servers to a load-balanced
cluster and allowing the load balancer to distribute the work among more servers.
Both of these scenarios can help improve performance, assuming there are no bottlenecks
outside the servers (for example, the time required to perform I/O operations on a storage
array). If I/O performance is a problem, you might be able to improve performance by
switching to faster storage technology.

14
The Definitive Guide to Cloud Acceleration Dan Sullivan

Network Issues and Cloud Acceleration


Although tuning application code and database design can often improve the throughput of
servers, they do not always improve application response time. Network latency, or the
time delay in sending data between two networked devices, cannot be improved by
tweaking algorithms on the server or optimizing database queries. Within a data center,
cloud providers may offer higher performance networking infrastructure for specialized
tasks, such as high-performance computing. These specialized jobs may run on clusters
with 10Gb Ethernet while most common jobs run on servers interconnected with slower,
interfaces. For data that is sent outside the data center and over the Internet, additional
measures are required to reduce latency.
Cloud Acceleration
In this guide, the term cloud acceleration refers to cloud techniques for
improving the overall responsiveness of an application by reducing the time
it takes to deliver content to an end user. Without going too deeply into
technical details in this chapter, it is worth noting that cloud acceleration can
be implemented with a combination of content delivery networks for
distributing content around the globe and reduced network traffic using
specialized optimization.

Challenges to Cloud Acceleration


The remainder of this guide will delve into the technical details of cloud acceleration
techniques; for now, this chapter will briefly examine four challenges to implementing
cloud acceleration:

Scalability and geographic reach


Redundancy
Consolidation of services
Cost
Each of these challenges must be addressed to successfully implement a cloud acceleration
solution.

Scalability and Geographic Reach


Networking is constrained by physics as well as engineering. We will never tweak the laws
of physics to improve the speed with which we can transmit signals. Although an
organization can improve the engineering of its networking hardware, the business is still
dependent on the infrastructure used by Internet service providers (ISPs) around the
globe.

15
The Definitive Guide to Cloud Acceleration Dan Sullivan

Content delivery networks (CDNs) compensate for network limitations by maintaining


copies of data around the globe and responding to user requests for content by using the
closest facility to the end user and providing the best path between endpoints. A customer
in Amsterdam, for example, might be served from content stored in a data center in Paris,
while a customer in Shanghai receives the same content from a data center in Singapore
(see Figure 1.8).


Figure 1.8: Global data centers are essential for geographically distributing
replicated content.

Businesses can deploy and maintain their own data centers or infrastructure within co-
location facilities around the globe. Such a deployment would have to have sufficient global
reach to respond to customers, employees, and business partners wherever they may be.
These deployments would also have to include sufficient hardware to scale to meet the
peak demands each data center would encounter.

Redundancy
Redundancy is another consideration. Hardware fails. Software crashes. Networks lose
connectivity. If a data center were to fail, other data centers around the globe should be
configured to respond to traffic normally handled by the failed site.
Redundancy also entails maintaining up-to-date copies of content. Replication procedures
should be in place to ensure that content is distributed to all data sites in a timely manner.

16
The Definitive Guide to Cloud Acceleration Dan Sullivan

Consolidation of Services and Costs


If a business is going to all the effort and cost to deploy cloud acceleration systems, it is best
to capitalize on that investment by consolidating services and applications that can benefit.
As with private clouds, there is the potential for significant capital investment to establish
and maintain cloud acceleration infrastructure. Ongoing maintenance costs will add to the
overall operational expenses of the organization as well.
Reference
Later chapters will examine options for addressing these challenges.

Key Considerations for Deploying Cloud Applications


Along with the technical challenges to implementing cloud acceleration technologies, it is
important to consider other characteristics that influence how a business can improve
application responsiveness. One factor that determines the optimal cloud acceleration
technique is the use of generated versus reusable content. Reusable content, sometimes
referred to as static content, can be replicated and sent from Web servers without
additional processing by an application. Reusable content includes material such as
information from product catalogs, documents, and general information Web site pages.
Generated content is the result of some application process, such as querying a database to
retrieve a customers order history. Reusable content can be replicated to data centers
around the globe; dynamically generated content cannot. Instead, dynamically generated
content can benefit from optimization techniques that improve throughput and latency
between data centers.
Other factors one must contend with when providing services on a large geographic scale
are a function of the design of the Internet. For example, the Internet is comprised of
multiple ISPs working together to route data as needed across different ISPs networks.
Congestion at the physical interconnection of networks can adversely impact application
performance (see Figure 1.9). This and other issues that derive from the large-scale
architecture of the Internet drives the need for multiple data centers in geographically
dispersed arrangements.

17
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 1.9: The rate of data exchange between ISPs will depend on multiple factors,
including the topology of the network. Congestion at the links between ISPs can
contribute to high latency in global Web applications.

In addition to differences in infrastructure, ISPs may have different business perspectives


on linking with other ISPs. In the most basic scenario, ISPs view their relationships as
reciprocal and pass traffic between ISPs without compensation. In other cases, one ISP may
believe another ISP gains more from a peering relationship and therefore requires payment
to accept traffic from and send traffic to the other ISP. Competition between ISPs can limit
data exchange as well. Both technical and business considerations can affect the flow of
your application traffic around the globe. Although most businesses cannot directly
influence their ISPs business model and relationships with other ISPs, businesses can work
around the limitations imposed by peering arrangements by using cloud acceleration
techniques.
Cloud providers can also be a potential network bottlenecks. If their networking services
are insufficient for an organizations needs and the providers distribution of data centers is
not enough to compensate for network congestion and latency issues, alternative cloud
acceleration options may be required.

18
The Definitive Guide to Cloud Acceleration Dan Sullivan

Summary
Cloud computing is creating opportunities for businesses to expand their reach to a global
scale. The cost and complexity of deploying computing and storage services is lowered with
cloud computing. There is also greater flexibility to adapt to new business opportunities by
leveraging IaaS and PaaS platforms to create new applications and services. The increasing
adoption of SaaS platforms also presents an opportunity for businesses to offer their
services in a SaaS model. Businesses must pay particular attention to Web application
performance for all customers regardless of those customers locations. Adding servers and
storage will improve some but not all aspects of application responsiveness. Cloud
acceleration techniques may be required to ensure consistent and acceptable levels of
performance for all application users.

19
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 2: How Websites and Web


Applications Work
To understand the challenges to optimizing your Web sites and Web applications, it helps
to start with basics on how Web sites and Web applications work. This chapter will
consider differences between static and dynamic content and techniques for optimizing
each type of content. It will also discuss the limitations of these techniques.
Consider a simple example: If you were to type a URL into your browser, such as
www.example.com, you would start a series of events that ultimately leads to having a Web
page rendered in your browser. These events include:

Sending the domain to a domain name service (DNS) server to map the URL to an IP
address
Routing the Web page request, possibly over a series of different Internet providers,
to the Web server at the IP address provided by the DNS server
Retrieving or generating the content of the Web page from the Web server
Packaging the content of the Web page into a series of TCP packets that deliver the
content to the client device that is making the Web page request
Reconstructing the Web page from packets on the client device
Rendering the Web page for the user
Retrieving a Web page might sound like a simple operation, but there are clearly multiple
steps involved and each of them can introduce delays into the process.
If a DNS server is unreachable or slow to respond, the client device making the request
might need to query a different DNS server. This requirement delays the start of the
process to retrieve the Web page.
Traffic that moves between Internet service providers (ISPs) might be subject to
congestion when routing between services or to limits on the speed with which traffic from
competing ISPs is handled. This potential bottleneck can delay the transmission of data
between the client device and the Web server.

20
The Definitive Guide to Cloud Acceleration Dan Sullivan

The Web server itself can be the cause of delays. If the Web server is subject to heavy loads,
there might be a long queue of requests waiting to be processed. If Web page requests
require a substantial number of I/O operations on the Web server, the response time will
be longer than if those operations could be avoided.
The way the TCP protocol functions is even a potential cause of delays. TCP guarantees
delivery of packets. To meet this guarantee, TCP requires more communication steps to
ensure packets are delivered. If some packets are lost or delayed, they must be
retransmitted. The need for retransmitting and the delays it can introduce might be
minimal on local area networks (LANs), but moving data across global-scale networks will
more likely entail some lost packets. You must also consider the latency of long distance
networks. The physical limitations of network speed combined with network traffic and
ISP polices all influence the time required to transmit data between a client device and a
Web server. The speed of the last-mile connection to the client device and the configuration
of the client device can also influence the speed with which Web content is rendered.
The remainder of this chapter will delve into potential problems in more detail, in
particular:

Static content types


Dynamic content types
Targeted content delivery
In each case, the goal will be to understand aspects of Web application design and
architecture that adversely affect performance, then examine methods for addressing those
adverse effects.

Optimizing the Delivery of Static Content


When the Web was created in the early 1990s, the primary purpose was to link static
content across a network of servers. Tim Berners-Lee wrote the first Web browser and
Web server and effectively brought hypertext to the Internet. The key components were
URLs to describe Web site addresses, Hypertext Markup Language (HTML) to create Web
pages with static content, and Hypertext Transfer Protocol (HTTP) for requesting and
delivering content to devices.

Multiple Object Static Web Pages


It was only a matter of time before linking text documents led to more complex document
configurations. It is commonplace now for Web pages to contain text, images, and
multimedia content (see Figure 2.1).

21
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 2.1: Web pages routinely combine text, images, video, and sound.

Web pages constructed of multiple types of content are typically built from components
stored in multiple files. For example, the Web page depicted in Figure 2.1 includes 24
images. Those images are each stored in separate files on a Web server.
When the page is rendered on a client device, the client issues an HTTP GET command to
download each image. Each GET command requires a connection to the Web server and I/O
operations on the server to retrieve the image from persistent storage.
Note
Actually, the Web server might not have to retrieve the image from disk or
other persistent storage if it is cached. I will discuss that scenario shortly.

Once the server has retrieved the content for each of the 24 images, the Web server
transmits each image to the client device. Each transmission is subject to some of the
potential problems described earlier in the chapter, including congestion and lost packets.
These problems are especially problematic for Web pages that require all or most of the
content on page to be rendered before the page is useful.

22
The Definitive Guide to Cloud Acceleration Dan Sullivan

Consider a Web site offering weather information. A page that allows a user to enter a
location to check todays weather would be functional as soon as the search box and related
text are rendered. Other images, such as ad displays, could load while the user enters text
into the search box; there is little or no adverse effect on the user. If, however, a user loads
a page with multiple maps and satellite images, the user might have to wait for multiple
images to load to find the specific information they are seeking.
When a Web page has multiple objects that require separate connections, the number of
connections required determines the time required to fully load a page. The speed of a
download is determined by three factors:

Bandwidth
Latency
Packet loss
Each of these factors should be taken into account when analyzing Web page load times
(see Figure 2.2).

Figure 2.2: Loading a single Web page can require multiple files and therefore
multiple connections between a client and a server (Screenshot of Wireshark
Version 1.8.5).

23
The Definitive Guide to Cloud Acceleration Dan Sullivan

Determining Web Page Load Times


Bandwidth is a measure of the amount of data that can be transmitted over a network in a
given amount of time. Network bandwidths vary widely. 10 gigabit Ethernet can be found
in enterprises while tens of megabit bandwidths are common in residential services.
Bandwidth is only one of three factors influencing the time required to download a Web
page, so increasing bandwidth may not improve overall performance.
Latency is a measure of the amount of time required to send a packet of data from a source
to a destination and then to send a packet from the destination back to the source. Latency
is influenced by a number of factors, including: network traffic, the distance packets of data
must travel, and network device configurations.
Note
Latency is also known as round trip time (RTT) to distinguish it from one-
way latency, which is a measure of time required to send a packet between
two devices on a network.

Network traffic can adversely affect latency if the volume of incoming traffic is more than
network devices can process at the time. When buffers are full, a device may send a single
packet to the source system to stop transmitting data. This situation can increase the RTT
on that connection.
The greater the physical distance a data packet must travel, the longer it will take. In
addition, longer distances usually entail additional network devices, some of which could
be significantly slower than others. This setup can introduce bottlenecks in the network
that limit the overall speed to the speed of the slowest segment, as Figure 2.3 illustrates.

24
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 2.3: Bottlenecks in the network can slow data transmission to the speed of the
slowest segment of the network.

Changes in TCP protocol configurations can affect the performance of a network. For
example, high-bandwidth networks might not be taking full advantage of the bandwidth
available if devices on the network have TCP protocols configured to send smaller amounts
and then wait for an acknowledgment before sending more data.
Packet loss is another problem sometimes related to device configuration (see Figure 2.4).
Devices that receive data from a network connection have to buffer data so that the data
can be processed by an application. If the buffers become full while incoming data
continues to arrive, then packets can be lost. The TCP protocol is designed to guarantee
delivery of packets. Devices that send packets to a receiving device expect an
acknowledgement that packets have been received. Sending devices are configured to wait
a certain period of time, and if no acknowledgment is received in that time, the sender
retransmits the unacknowledged packets.

25
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 2.4: When applications cannot read from the connections data buffer fast
enough to keep up with incoming traffic, then packets can be lost. Data loss can occur
for similar reasons when there is congestion on network devices between the source
and destination devices.

Assessing the Scope of the Problem


The problem of slow page loading times is a product of multiple factors, including the
amount of static content, network bandwidth, latency, and packet loss. Consider each factor
as you try to develop a solution for slow page loading times.

26
The Definitive Guide to Cloud Acceleration Dan Sullivan

Amount of Static Content


The amount of static content is driven by the business requirements that lead to the
development of the Web site. A product catalog needs to provide sufficient information for
customers to understand the features and value of a product and distinguish it from
competitors products. This need might lead Web site designers to include multiple images
of each product in a variety of configurations. Asking a Web designer to limit the amount of
content on a page to improve page load speeds is an option of last resort. You should
assume the content is there for a business reason and should remain.

Distance Between Source and Destination


Bandwidth and packet loss are two network factors that are only partially in your control.
Your business will have control of internal networks and enterprise WANs, but once your
content is transmitted over the Internet, you are depending on the network infrastructure
provided by and managed by others.
If your content is routed over an ISP network with significant congestion and packet loss,
network protocols may be able to re-route traffic automatically, but you are still depending
on infrastructure outside of your control. As a result, your control over the speed with
which your static Web content loads is limited. The more you depend on ISPs
infrastructure, the greater the chance of encountering network problems that you have
limited ability to control.

Popularity of Content
In addition to network performance problems, the speed at which you can deliver content
is also influenced by the speed of Web servers. When large numbers of users are requesting
content from a Web server, there might be contention and congestion for resources. For
example, if every request for content results in multiple I/O operations to retrieve content
from disk, the performance level of the disk will be a limiting factor in the responsiveness
of your Web site. Retrieving content from disk is slower than retrieving content from RAM.
One way to improve response time is to store a copy of content in RAM. This method is one
type of caching technique that can improve Web site performance.
Before delving into the different types of caching, it is important to note that the benefits
from all forms of caching are a function of the popularity of content. The reason is that
when a piece of content is retrieved from persistent storage, a copy is saved in the cache.
The next time that content is requested, it can be retrieved from the cache more quickly
than it could be retrieved from disk.

27
The Definitive Guide to Cloud Acceleration Dan Sullivan

Very low-popularity content is far less likely to be cached than is popular content. As a
result, popular content can be delivered faster than unpopular content can. This reality is
certainly better than consistently slow page load times, but it can lead to inconsistent
performance in a users experience with your Web site.

Distributed Caching and Persistent Storage


There are different ways to cache content. There are at least three types of caching that are
used to improve Web application and Web site responsiveness:

Browser caching
Web server caching
Proxy caching
Each of these can improve Web site performance with static content but they work in
different ways and have different advantages and limitations.

Browser Caching
Web browsers can store local copies of content on a client device. The first time content is
downloaded, it can be stored in the cache. The next time it is needed, the content is loaded
from cache instead of downloading from the Web site (see Figure 2.5). This setup saves the
RTT to retrieve the content.
Browser caching helps with content that is used repeatedly during a session. Logo images
and CSS files that may be used throughout a site need to be downloaded only once during a
session.

28
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 2.5: Browser cache can improve page loading times on individual devices,
especially for content reused within a Web site.

Web Server Caching


Web server caching, as the name implies, operates at the server level (see Figure 2.6).
Content requested by any user can be stored in memory and made available for other
users. This functionality can reduce the disk I/O load on the Web server because popular
content can be retrieved from memory without requiring disk access operations.

Unlike browser caching, this form of caching does not eliminate the need for round-trip
communication between the client device and the Web server. It does, however, have the
additional benefit of caching content accessed by other users, which is not the case with
browser-based caching.

29
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 2.6: Web server caching pools the benefits of caching across multiple users.
Note: Green ovals represent pages retrieved; white ovals are pages stored but not
retrieved.

Proxy Caching
Proxy caching is a service offered by ISPs to reduce the time needed to load content on
client devices and to reduce traffic on their own networks. As Figure 2.7 shows, proxy
caching works by keeping copies of popular content on the ISPs servers. When a client
device makes a request for content, the proxy cache is queried and, if the content is found,
it is delivered to the client device from the ISP. This setup avoids the overhead and delay
encountered when the request has to be sent to the Web sites server.

30
The Definitive Guide to Cloud Acceleration Dan Sullivan

Proxy caching reduces RTT because the client needs only to wait for a response from the
ISP. This type of caching has advantages for the ISPs customers. It cuts down on the load
on the customers Web site and reduces the traffic on the customers networks. Customers
have limited control over the proxy cache because the ISP is likely using the cache for
multiple customers.


Figure 2.7: Proxy caching reduces RTT by shortening the distance network traffic
must traverse to respond to a request.

Caching vs. Replication


A common characteristic of Web server caching and proxy caching is that they depend on
temporary storage. When servers restart, the content of the cache is lost unless it is copied
to persistent storage first. This situation is not necessarily a problem from an ISPs
perspective because the demand for particular content is constantly changing, at least
when one considers demand across a wide range of customers. For individual businesses,
however, the most popular static content may be frequently requested over extended
periods of time.

31
The Definitive Guide to Cloud Acceleration Dan Sullivan

Another limitation on caching is the amount of cache available. RAM is a limited resource
on any server and only so much can be dedicated to caching Web content. When the cache
is full, no other content can be stored there or some of the content must be deleted to make
room for newer content. This requirement can be handled in several ways (see Figure 2.8):
A simple strategy is to simply remove the oldest content whenever the cache is full. Old
content, however, might be popular content, so a better approach in some cases is to delete
the least recently used content without respect for age of the content.


Figure 2.8: Cache content replacement policies optimize for different measures, such
as frequency of use or object size.

32
The Definitive Guide to Cloud Acceleration Dan Sullivan

Another strategy considers content size rather than just age or recent usage. The idea is
that if more small objects are kept in the cache, more objects can be stored. In turn, this
setup should increase the rate that objects are found in the cache. Size, age, and frequency
of access policies can be combined to optimize different objectives with caching.
The age of objects in the cache is another way to determine when an object should be
deleted from the cache. A parameter known as time to live (TTL) determines how long an
object is stored in the cache before it is deleted. A cache with a 1200 second TTL, for
example, would keep objects in the cache for at-most 20 minutes before deleting them.
An alternative to caching is replication, in which copies of content are stored persistently
on servers distributed across the Internet. The goal is to keep copies of content closer to
users to reduce the RTT need to satisfy requests for content. It also has the added benefit of
reducing load on the Web server hosting the original source of the content. Key
considerations with replication are the number and location of replicated servers and the
frequency with which content is updated.

If a large portion of your Web traffic originates in Europe, it makes sense to replicate
content to servers located there. Other factors should be considered as well. If you located a
replicated server in a data center in Amsterdam, for example, would customers and
business partners in Eastern Europe realize the same performance improvements as those
in Western Europe? If latency and congestion are problems with some ISPs in Eastern
Europe, you might want to deploy a replicated server in such as way as to minimize the
traffic over that ISP.

Replicated content will need to be refreshed to keep all content consistent across servers.
Frequent, incremental updates are warranted when content changes often. The type of
content you are replicating can also influence your decisions about update frequency. If
content contains legal or regulated information, such as forward-looking statements from a
publicly traded company, it is especially important to minimize the chance that users in
different parts of the world would see different versions of the content (see Figure 2.9).

33
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 2.9: Origin of traffic and network conditions influence the placement and
number of replicated servers.

Caching and replication can help with static content but many Web pages are dynamically
generated. For dynamic content, you need to consider other methods to optimize delivery.

Optimizing the Delivery of Dynamic Content


Dynamic content is generated by a variety of applications. Customers purchasing products
will receive content specific to their purchase choices. Analysts querying a business
intelligence system will receive pages with data or visualizations based on their queries.
Employees working with a human resources portal will have content generated according
to their roles and benefits within the organization. In all of these cases, caching the content
would serve no purpose.

Before considering ways to optimize dynamic content, lets explore the steps involved in
generating and delivering that content, taking a purchase transaction as an example.

34
The Definitive Guide to Cloud Acceleration Dan Sullivan

When a customer views a catalog of products, the content may be served from a cache or
replicated site. Once the customer selects a set of products for purchase, they will begin
viewing dynamically generated content. Each request must be served from the origin. As
Figure 2.10 illustrates, a purchase transaction requires:

Requests to generate a page listing the products selected for purchase


Customized options based on the customers profile; for example, options for free or
discounted shipping
Customized options based on the content of the shopping cart, such as related
products
Time-sensitive information such as details about delivery time estimates
This content is generated from multiple sources including product and inventory
databases, application servers running business logic services, and Web servers with
customized style sheets. Pieces of data from all of these sources are collected and combined
on an as-needed basis.


Figure 2.10: Multiple applications can contribute to the construction of a single
dynamically generated Web page.

35
The Definitive Guide to Cloud Acceleration Dan Sullivan

Assessing the Scope of Dynamic Content


When assessing the scope of problems related to delivering dynamic content, consider at
least three factors:

Number of requests for dynamic content


Importance of dynamic content
Distance to origin
The combination of these three factors can influence your strategy for addressing problems
with maintaining responsive Web applications.

Number of Requests for Dynamic Content


One of the first factors to consider is how much dynamic content is being generated and
retrieved by users. Monitoring network traffic is one way to detect the number of URLs
associated with dynamic content and the size of the objects downloaded in response to
requests for those URLs. If there is a wide variation in traffic over time, the monitoring
period should be sufficiently long to capture the scope of variation. Sampling network
traffic at various times can also provide sufficient information to assess the volume of
dynamic traffic on your network.
A drawback of the monitoring method is that it does not capture information about
expected changes in network traffic. For example, if a new customer-facing application will
be brought online in two months, the volume of dynamic content from that application will
not be reflected in monitoring data. In this case, it helps to have an inventory of
applications generating dynamic content as well as information about planned changes and
additions to that set of applications.

Importance of Dynamic Content


The importance of dynamic content is a function of the business driver behind that content.
Large volumes of dynamic content may be important because of the large number of
customers, clients, and collaborators who work with that content. The volume of dynamic
content is not the sole measure of its importance. For example, a small group of analysts
that work with dynamically generated content from a business intelligence platform may
depend on that content for high-value decision making. In addition to the size of dynamic
content, be sure to consider the importance of content with respect to business drivers.

36
The Definitive Guide to Cloud Acceleration Dan Sullivan

Distance to Origin
The distance to origin is another factor that can significantly impact the speed with which a
Web application can respond to a user. The greater the distance between a client and a
server, the longer the time required to transmit data. Such is the case regardless of other
network conditions, such as the rate of packet loss or congestion on the network.
Transmitting data on global scales on the Internet requires data to move over networks
managed by multiple ISPs. Those ISPs may have varying policies on handling traffic
traversing their networks and service level agreements (SLAs) with their own customers
that result in lower priority for non-customer traffic.
When there is congestion on a network, there is a risk of losing packets. The TCP protocol
expects acknowledgements for packets sent and waits a specified period of time for that
acknowledgement. If an acknowledgement is not received, the packet is resent. This setup
introduces a two-fold delay in the Web application: The time the TCP client waits for the
acknowledgement, and the time required to send the packet again.
Understanding the distance between client devices and the system of origin is a key
element in assessing the need for dynamic content optimization. When the combination of
the number of requests for dynamic content, the importance of the dynamic content, and
the distance to origin warrants optimized dynamic content, then it is time to consider how
to reduce the request burden on the system of origin.

Reducing the Burden on the System of Origin


The system of origin is the application, or set of applications, that is responsible for
generating dynamic content. These systems can be complicated combinations of multiple
services each responsible for a part of dynamic content creation and related activities.
Consider, for example, an application for processing sales transactions. It can include:

A Web server for generating HTML and related content


An application server running business logic, such as verifying customers credit
An inventory database providing information on the quantity and availability of
products
A sales transaction database for recording details of sales
A cross-selling platform that offers suggestions for additional purchases related to
items in the customers shopping cart
These are all core functions related to the sales transaction. In addition to these, the system
of origin may have to contend with other operations, such as encrypting and decrypting
data as it is sent and received and retransmitting lost packets.

37
The Definitive Guide to Cloud Acceleration Dan Sullivan

There are multiple ways to improve the performance of the system of origin:

Web server caching can be used to reduce disk I/O operations related to the static
content portion of the application
Databases can be tuned to improve query response time and minimize the number
of disk read and write operations
Application code can be analyzed to identify time-consuming operations
An application cluster can be expanded to distribute the workload over a larger
number of servers
These, however, will not directly address problems with delivering dynamic content to
distant client devices. Some additional performance tuning options include:

TCP optimizations to reduce connection/transmission times


Terminating SSL at the network edge to offload the system of origin
Reducing packet loss for fewer transmission delays
These options can address networking issues not handled by the other tuning methods
described earlier.

TCP Optimizations
TCP optimizations are techniques used to improve the overall performance of the TCP
protocol. The protocol was developed early in the history of internetworking and was
designed to provide reliable transmission over potentially unreliable networks. Todays
network can realize much faster transmission speeds than those available when TCP was
first designed.
A number of different kinds of optimizations are available:

Options for changing the window size


Selective acknowledgement
Detection of spurious retransmission timeouts
The TCP receive window size determines how much data can be sent to a device before the
sender must receive an acknowledgement. The TCP specification defines a maximum size
of 64Kb for the window size. This limitation can prevent full use of a high-bandwidth
network. Changes to the TCP protocols now allow for larger receiver window sizes and
therefore more efficient use of high-bandwidth networks.

38
The Definitive Guide to Cloud Acceleration Dan Sullivan

The selective acknowledgment optimization can help reduce the amount of data
retransmitted when packets are lost. TCP was originally designed to use something called a
cumulative acknowledgement, which provided limited information about lost packets. This
functionality can leave the sender waiting additional RTT periods to find out about
additional lost packets. Some senders might operate under the assumption that additional
packets were lost and retransmit other packets in the segment. Doing so can lead to
unnecessary retransmission of packets. Selective acknowledgement provides more
information about lost packets and helps reduce unnecessary retransmission of data.

Detecting a spurious retransmission begins by retransmitting the first packet in an


unacknowledged segment. The sender then monitors the following acknowledgements to
detect patterns that would indicate a spurious retransmission. If a spurious retransmission
is detected, the additional packets in the segment are not retransmitted. These and other
TCP optimizations can help to better utilize the high-bandwidth networks available today.

Terminating SSL at Network Edge


Encrypting large amounts of data can be computationally demanding. If data is encrypted
and decrypted on the same device that is generating dynamic content and responding to
Web requests, the system overall responsiveness can decrease.
An alternative to encrypting and decrypting on the system of origin is to perform those
operations at the network edge. This setup takes advantage of the fact that once data
reaches your, or your providers, trusted network, it can be decrypted. Similarly, instead of
depending on the system of origin to encrypt data, that process can be offloaded to a
network device.

Reduce Packet Loss


Reducing packet loss can improve Web application responsiveness. This reduction can be
done with a combination of reducing congestion on the network and routing optimizations.
Improving Web application performance when large amounts of dynamic content are
involved is challenging. A combination of techniques that reduce the load on the system of
origin, optimize TCP, and reduce the distance that traffic must travel over high-congestion
networks can all help address the problem.

39
The Definitive Guide to Cloud Acceleration Dan Sullivan

Targeted Content Delivery


Although the focus of this chapter has been on static and dynamic content in general, it is
worth noting the problems described earlier can be even more difficult when you consider
the need for targeted content delivery. This delivery requirement can come in a few
different forms:

Device-specific content
Browser-specific content
Geography-specific content
Each of these factors can lead to larger volumes of static content and require more complex
applications for generating dynamic content. Businesses should consider the need for
device-, browser-, and geography-specific content as they assess their current Web
applications and need for acceleration.

Summary
Maintaining consistent Web application performance for all users is a challenge. Various
forms of caching can improve performance in some cases but some cases are better served
by replicating content to servers closer to end users. Dynamic content requires other
optimization techniques to improve overall TCP performance, reduce packet loss, and cut
total RTT.

40
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 3: Why the Internet Can Be the


Root Cause of Bottlenecks
Developers, software designers, and architects can spend a significant amount of time and
effort tuning their applications and infrastructure and their networks can still suffer from
performance problems. This reality does not necessarily mean these IT professionals are
bad at what they do; it can mean that the problem lies outside their area of control. One of
the biggest potential bottlenecks software developers have to contend with is the Internet
itself.
The term Internet is an accurate description of the resource we have come to rely on. It is
literally an interconnected network of networks. There is no single, monolithic network
encompassing the globe. There is no single organizational entity that monitors, maintains,
and expands the Internet. (Although there are bodies that set technical standards and
administrative organizations that manage functions such as domain naming, these bodies
do not control the infrastructure that constitutes the Internet).

Two Fundamental Characteristics of the Internet that Influence


Performance
Two fundamental characteristics of the Internet are central to understanding how the
Internet can be a bottleneck to Web application performance. The networks that constitute
the Internet are interoperable. That is, network traffic on one network can move across
another network without changing or translating data from one representation to another.
The reason is that networked devices on the Internet run a shared set of protocols.

Technical Characteristic: Protocols


Consider how a customer working from her office in Singapore can work with an
application running on a cluster of servers in a data center in the eastern United States. The
customer uses a browser to navigate to a Web application and creates transactions using
that application. The data that is entered, the clicks used to navigate the application, and
the Web pages that are generated in response are all organized, packaged, and transmitted
halfway around the globe using the same standard protocols.

41
The Definitive Guide to Cloud Acceleration Dan Sullivan

The data that needs to be sent from the customer in Singapore to the data center in the
United States is packaged into Transmission Control Protocol (TCP) packets. TCP is one of
many Internet protocols, but it is one of the most important from an application
perspective. TCP guarantees delivery of data packets in the same order in which they were
sent. The importance of this feature is obvious.

Imagine if the data entered for a transaction sent halfway across the globe was broken into
pieces, transmitted, and arrived at the destination in the wrong order. When the packets
were reassembled at the destination, you might find that the order quantity was swapped
with the customers shipping address. TCP prevents this kind of problem by implementing
controls that ensure packets are properly ordered when they are reassembled at their
destination.
Other protocols are required as well to get data from one network device to another. Data
that is transmitted over long distances can move across multiple paths. For example, data
arriving from the west coast of the United States could be sent to an east cost data center
over paths across the northern or southern United States or could be sent across a less
straightforward route (at least from a geographical perspective). Routing protocols are
used to determine how to send packets of data from one point in the Internet to the next.

Organizational Characteristic: Peering


Theoretically, routing protocols will transmit packets along the most efficient path between
two points. Here is one of those situations in which theory and practice diverge. The most
efficient route, in terms of the time required to transmit a packet or the distance between
points on the network, may not be the route that is actually taken. The reason is that the
most efficient route may follow paths that use networks from several network providers.
Some network providers have agreements to carry the traffic of the other. This
arrangement, known as peering, is the second important factor that can influence Web
application performance. Assume for a moment that you need to route a packet from a
device on Network A to a device on Network D. The most efficient path from the source to
the destination device is over network B; however, Network A and Network B do not have a
peering agreement. Network A does have a peering agreement with Network C, which also
links to Network D, so the packet from Network A is routed to Network D over Network C
(see Figure 3.1).

42
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 3.1: The most efficient route, from a technical perspective, may not be taken.
Organizational and business relationships between network providers influence the
path taken between source and destination devices.

Peering is a business arrangement. Network providers obviously want to provide their


customers with access to any Internet-addressable device. They would not be much of an
Internet service provider (ISP) if they could not do that. At the same time, they cannot
deploy their own network infrastructure across the globe in order to access every Internet-
addressable device. The only practical method to ensure complete access to all devices on
the Internet is to make arrangements with other network providers.
In some cases, a peering relationship can be mutually beneficial. For example, if a country
had two major network providers and each provided services to 50% of Internet users in
the country, both would benefit from the ability to route traffic to the other providers
network. (Assuming, of course, the characteristics of traffic on both networks is roughly
equal). Such is not always the case.

Some network providers have created large advanced networks with substantial
geographic reach. These network providers can make arrangements with each other to
allow the transit of each others traffic free of charge. Sometimes the cross-network routing
arrangement is not equally beneficial. For example, a small network provider may have a
large portion of its traffic sent outside the network while a large provider has only a small
amount of data routed to smaller providers. This kind of asymmetry undermines the logic
for free exchange of traffic.

43
The Definitive Guide to Cloud Acceleration Dan Sullivan

When there is significant asymmetry between providers, the smaller provider will likely
have to pay some kind of fee, known as a transit charge, to the larger provider. Obviously,
one of the ways these smaller providers control cost is by limiting as much as possible the
amount they have to pay in transit fees. Lets go back to the example with network
providers A, B, C and D.

The most efficient route from a device on Network A to a device on Network D is through
Network B. Network B charges 50% more in transit charges than Network C. Network A is
a midsize network provider and cannot secure peering agreements without paying fees to
some of the larger network providers. From a technical perspective, Network A could route
its traffic to Network D over Network B; from a business perspective, that option is cost
prohibitive. As a result, a customer on Network A sending and receiving traffic from
Network D depend on sub-optimal routing.

Network Provider Terminology


Network providers come in a wide range of sizes and implement a variety of
business relationships with each other. One way to coarsely group network
providers is according to a commonly-used three-tier scheme:
Tier 1 providers: Network providers that can reach all other networks
on the Internet without paying transit charges are known as Tier 1
providers.
Tier 2 providers: Tier 2 providers have some fee-free peering
agreements but also pay transit fees to other network providers.
Tier 3 providers: Tier 3 providers pay transit fees to other network
providers in order to route traffic to all Internet-addressable devices.
These definitions are not hard and fast. Users do not always know the
business arrangements between network providers. Fee-free agreements
may include other conditions that an accountant might consider a form of
transfer of revenue, but this categorization scheme ignores those
distinctions. This scheme is simply a rough way to divide Internet providers
into large, medium, and small.
Protocols and peering agreements are two fundamental characteristics of the Internet that
influence Web application performance. The rest of this chapter will delve into more detail
about both protocols and peering agreements to understand how they affect application
performance and what you can do to improve it. Before moving on to the details of
protocols and peering, lets establish a few measurements that are important for
quantifying network performance.

44
The Definitive Guide to Cloud Acceleration Dan Sullivan

Measuring Performance
If your customers are complaining about poor performance or abandoning shopping carts
at high rates, application performance might be a problem. Many factors influence the
overall speed at which your application processes transactions:

Algorithms used to process data


Implementation of algorithms in code
Database design and query tuning
Application and server configuration
Network throughput
Latency
Packet loss
Software developers and database administrators can address the first four potential
issues in an application test environment. It is a fairly straightforward process to test parts
of large systems when you have access to source code and documentation. You can, for
example, time various modules to help identify which modules are taking the most time to
execute. Solving the problem once it is identified is not always trivial but at least you have a
sense of where the problem lies.
The last three issues in the list are related to network performance. Diagnosing problems
on the network is somewhat different than debugging and analyzing code. Your goal is not
to use your favorite integrated development environment to analyze code implementing
TCP on your server but instead to understand why network operations take as long as they
do.

Like software development, analyzing network problems is partly a matter of dividing and
conquering. The goal is to find the bottleneck (or bottlenecks) on the network between
your servers and client devices. Is the problem at a router in your data center? Is your ISP
meeting agreed-upon service levels with regard to your network traffic? Does the problem
lie with networking infrastructure closer to the client devices? To find out where network
problems occur, it helps to have standard measures that allow you to quantify network
performance.

Customers or other end users might describe problems with application performance in
qualitative terms such as slow and takes too long. Sometimes they might even
experience errors and can provide qualitative error messages, such as the connection
timed out. These kinds of descriptions are symptoms of an underlying problem but they
lack specificity and precision.

45
The Definitive Guide to Cloud Acceleration Dan Sullivan

This discussion will focus on three quantitative measures when considering network
performance:

Throughput
Latency
Packet loss
These measures describe characteristics of networks that can influence the overall
performance of your applications.

Throughput
Throughput is a measure of data packets that are transmitted over a network. It is usually
described in bits per second. We often describe network throughput in terms of a single
quantity, such a 100Mbits per second. In practice, network throughput can vary over time
with changes in network conditions.

A number of factors can influence throughput. Physical characteristics such as the noise on
the line can interfere with signals and reduce throughput. The speed at which network
equipment can process data packets can also impact throughput. The Internet Protocol (IP)
packets, for example, are comprised of header data and payloads. The network devices use
data in the header to determine how to process a packet. Throughput of the network will
depend, to some degree, on how fast network devices can process this header data.

We can draw an analogy between network throughput and traffic on a highway. A single
lane highway may allow only one car at a time at any point on the highway, a two lane
highway can have two cars at a time, and so one. By adding more lanes, you can increase
the number of cars that can travel over the highway.

When cars travel at the top speed on the network and there are cars in every lane, you can
reach peak throughput of the highway. Similarly, with networks, you can reach peak
throughput when you fully utilize the networks carrying capacity. Peak throughput cannot
always be maintained, though. In the highway analogy, a car accident or slow moving car
can reduce the speed of other drivers. In the case of the network, signal degradation or
delays processing data packets can reduce throughput below peak throughput. As this
discussion is primarily concerned with Web application performance, it is helpful to work
with average or sustained throughput over a period of time rather then peak throughput.

46
The Definitive Guide to Cloud Acceleration Dan Sullivan

Latency
In addition to throughput, which measures the amount of data you can transmit over a
particular period of time, you should consider the time required to send a packet from a
source system to a destination system. Latency is a measure of the amount of time required
to send a packet to another device and back to the sending device. Latency can also be
measured in terms of a one-way trip from a source to the destination, but, for this
discussion, latency will refer to round-trip latency unless otherwise noted.
Returning to the highway driving analogy, you can think of latency as the time required to
make a round-trip from one point on the highway to another point and then return to the
starting point. Obviously, the speed at which the car moves will determine the time
required for the round-trip, but other factors can influence latency. If there is congestion on
the network, as with highways, latency can increase.
If a car needs to leave fast-moving traffic on a highway to transit a road with slower
maximum speeds, the latency will increase over what would have been the latency had the
car stayed on the highway. This situation is analogous to a data packet that is transmitted
over a high-speed network but then routed over a slower network en route to its
destination device.


Figure 3.2: Ping can be used to measure latency between two devices. This example
highlights the latency of packets sent from the west coast of the United States
(Oregon) to a server on the east coast (Delaware), averaging 87.1ms and ranging
from 79.68ms to 129.24ms.

47
The Definitive Guide to Cloud Acceleration Dan Sullivan

Just to clarify, latency is the time needed to send a packet of data on a round-trip between
two networked devices. This idea should not be confused with the time required to
complete a transaction on a Web application. Network latency is one factor in the time
required to complete a Web transaction; the time required to complete a transaction also
includes the time needed by database servers, applications servers, and other components
in the application stack. In addition to throughput and latency, you should also consider the
rate of packet loss on a network.

Packet Loss
Packet loss occurs when a device sends a packet that is never received by the target device.
This loss is easily detected by TCP. Once a TCP connection is established between two
devices, the devices allocate memory as a buffer to receive and hold data packets. The
buffer is needed because packets may arrive out of order. For example, packet 1 may be
routed over a network that suddenly experiences a spike in congestion while packet 2 is
routed over another network. When packet 2 arrives at the destination, it is held in the
buffer until packet 1 arrives. (There are other conditions that determine how the buffer is
used and when it is cleared, but those conditions can be safely ignored in this illustration.)
If packet 1 does not arrive within a reasonable period of time, the packet is considered lost.
Lost packets have to be retransmitted, which consumes additional network bandwidth. In
addition to transmitting the packet twice, there is additional traffic between the two
devices to coordinate the retransmission. The receiving device also waits the period of time
specified in the TCP configuration before determining a packet has been lost. All of these
factors combine to increase the time needed to send a message from the source to the
destination.

Packet loss is one of the factors that decreases throughput. Packet loss can be caused by
signal degradation if the single must travel long distances or is subject to physical
interference. Congestion on a network device can also cause packet loss. Routers, for
example, have only so much memory to buffer data arriving at the router. When the traffic
arriving at the device exceeds the capacity of the device to buffer and process the data,
packets may be lost.

In some cases, packet loss does not cause retransmission of packets. Some applications can
make use of the User Datagram Protocol (UDP), which unlike TCP, does not guarantee
delivery or delivery in order. This setup obviously will not meet the needs of transaction
processing systems, but there are use cases where UDP is appropriate.

48
The Definitive Guide to Cloud Acceleration Dan Sullivan

When transmitted data has a limited useful life and is replaced with newly generated data,
then UDP is appropriate. For example, if you were to synchronize clocks on multiple
servers by sending a timestamp from a time server, you could lose a single timestamp
without adversely affecting the other systems. A new datagram with a new timestamp
would be generated soon after the first datagram was lost. Transmitting video or audio
over UDP makes sense as well. If many packets are lost, the quality of the image or sound
may degrade but the benefits of lower overhead compared with TCP make UDP a viable
alternative.

From this discussion, you can see that there are at least three measures to consider when
analyzing the impact of the network on Web applications. Throughput is a measure of the
number of packets, and therefore the number of successfully delivered bits, sent between
devices. Latency is a measure of time required for a round-trip transmission between
devices, while packet loss is a measure of the amount of data that is lost on a line.
Applications that use TCP (most transactional business applications) not only have to
retransmit lost packets but also incur additional overhead to coordinate the
retransmission. Next, lets consider how protocols, especially Hypertext Transfer Protocol
(HTTP) and TCP, can contribute to network bottleneck issues.

Protocol Issues
The Internet utilizes a substantial number of protocols to implement the many services we
all use. Some of these operate largely behind the scenes without demanding much attention
from software developers who create Web applications. The Border Gateway Protocol
(BGP), for instance, is needed to route traffic between independent domains and to
exchange information about network routes. The Domain Name Service (DNS) is probably
more widely recognized as the service that maps from human readable domain names (for
example, www.example.com) to IP addresses. As you consider Web application
performance issues, you should consider how these services can adversely impact
performance. DNS, for example, has been a significant contributor to adverse performance.
However, it does help to understand some of the implementation details around HTTP and
TCP. The way you use HTTP and configure TCP can have noticeable impact on Web
application performance.

Web Application Performance and HTTP


HTTP was created about 20 years ago as a means of sharing documents at the physics
research center CERN. From its earliest uses, HTTP has been used to create feature-rich
documents, and later, applications. The first browser supported both text and images (see
Figure 3.3).

49
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 3.3: The original Web browser running on the NeXT computer at CERN, the
birthplace of the Web (Source: http://info.cern.ch/).

In many ways, the fundamentals of transmitting Web content has not changed much from
the early days of HTTP adoption. The way we use HTTP and generate content has certainly
changed with new development technologies, advances in HTML, and the ability to deploy
functional applications in a browser. The fact that we can use the same protocol (with some
modifications over the years) to implement todays applications attests to the utility of
HTTP. There are, unfortunately, limitations of HTTP that can still create performance
problems for developers and Web designers.

Multiple Objects Per Web Page


Navigating the Web is a simple matter and masks much of the complexity behind rendering
a typical Web page. A quick survey of several major retailer Web home pages found pages
with 40 to more than 100 images rendered on each page. Each of these images must be
downloaded over a TCP connection between the Web server and the client device. In
addition, Web pages often provide search services or other application-like features on
their Web sites. Some of these are services provided by third parties that require
connections to those third parties servers.

50
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 3.4: Web pages and applications include multiple components, ranging from
text and images to application services. Each requires a connection to download the
component from its server to the client device.

The fact that multiple connections are needed to download all the components of a Web
page helps to draw attention to the process of establishing a connection and the overhead
that entails.

TCP and Implications for Web Application Performance


As noted earlier, TCP has many features that support the development of Web applications
and documents. Guaranteed delivery of packets in the order they were sent is a basic but
essential building block for the Web. Ensuring this level of delivery comes with a cost,
though. There is some amount of overhead associated with TCP. The overhead starts when
a new connection is established.

51
The Definitive Guide to Cloud Acceleration Dan Sullivan

TCP Handshake
Before devices can exchange packets using TCP, the two devices must first establish a
connection. This arrangement is essentially an initial agreement to exchange data and an
acknowledgement that both devices are reachable. This exchange is known as the TCP
handshake.


Figure 3.5: The TCP handshake protocol requires an initial exchange of
synchronizing information before data is transmitted.

The TCP handshake is a three-step process that requires the exchange of packets to
acknowledge different phases of the process known as:

SYN
SYN-ACK
ACK
Each of these phases must occur before the next phase in the list can occur; if all phases
complete successfully, a transfer of data can begin.

52
The Definitive Guide to Cloud Acceleration Dan Sullivan

The TCP handshake begins with one device initiating a connection to another device. For
example, when a customer uses a browser to download a Web page from a business Web
site, a connection is created to download the initial HTML for the page. (There will
probably be other connections as well but this discussion will describe this process for a
single connection.)

The first step of the process entails the initiating device sending a SYN packet to the
listening device; for example, a laptop sending a SYN message to a Web server to download
a Web page. Part of the SYN packet contains a random number representing a sequence
number, which is needed in later steps of the process. After the SYN message is sent, the
initiator is said to be in SYN-SENT state and it is waiting for a reply. Prior to receiving the
SYN packet, the listening device is said to be in the LISTEN state.
Once the listening devicefor example, the serverreceives the SYN packet, it responds
by replying with a SYN-ACK packet. This packet includes the sequence number sent in the
SYN packet incremented by 1. The listening device generates another random number that
is the second sequence number, which is included in the SYN-ACK packet. The listening
device is then in the SYN-RECEIVED state.
The third and final set occurs when the initiating device responds to the SYN-ACK packet
with a final ACK packet. This packet includes the first sequence number incremented by 1
as well as the second sequence number incremented by 1. After this point, both the
initiator and the listening devices are in the ESTABLISHED state and data transfer can
begin.

The connection can persist as long as needed. When the connection is no longer needed, a
3-step termination process can close down the connection. The termination process uses a
similar 3-step process with FIN, FIN-ACK, and ACK packet exchanges.

Reliable and Ordered Transfer of Data


Once a connection is established with the TCP handshake, data exchange begins. As with
the SYN, SYN-ACK, and ACK messages, a sequence number is included with data packets.
Sequence numbers are used to ensure that data is reconstituted in the same order it was
sent.
This inclusion is particularly important when the packets may be routed over different
paths and could arrive at the destination in a different order from which they are sent. For
example, if someone is loading a large file to a cloud storage service thousands of miles
away, the data may take paths over two or more network service providers. Sequence
numbers are used to determine the order in which packets were sent and, in some cases,
determine whether packets are lost.

53
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 3.6: TCP packets transmitted in the same connection may use different paths
to reach the destination device. Sequence numbers in each packet are used to
reconstruct the data stream in the same order it was sent.

Consider a situation in which a device receives a series of packets with numbers 100, 101,
102, and 104. Packet 103 has not been received. This situation might simply be a case of
the situation depicted in Figure 3.6. Some of the packets used a path that allowed for faster
delivery than did packet 103. In that case, it might be just a matter of a small amount of
time before packet 103 arrives.

When all the packets in a range of packets are received, the receiving device transmits an
acknowledgment to the sender. Senders can use these acknowledgments to determine
whether packets are lost.
TCP requires that you decide how long you are willing to wait for a packet to arrive. When
packets are traveling over high-latency networks, such as satellite networks, you might
want to have long waiting periods. There would be no advantage to asking the sending
device to retransmit a packet that has not had sufficient time to reach the destination
device. In fact, that could create additional, unnecessary traffic that could consume
bandwidth and increase congestion on the network.

54
The Definitive Guide to Cloud Acceleration Dan Sullivan

At the same time, you do not want to wait too long for packets to arrive. If packets are
dropped, they will never arrive. For example, a packet may have been dropped at a router
that was experiencing congestion and unable to process all the traffic arriving at the router.
In other cases, the transmission signal may have degraded to the point that the packet was
corrupted.

When packets are actually dropped, it is better to infer the need to retransmit sooner
rather than later. The destination device will buffer incoming data waiting for missing
packets to arrive. Retransmitting requires sending messages from the destination to the
sender and then from the sender to the destination device, so the wait for the
retransmitted packet will be the time of a round-trip between the two devices.

This discussion has delved into design and configuration details of TCP, but it is important
to remember this discussion is motivated by Web application performance. TCP design
issues are important and interesting, but unless you are a network designer or computer
scientist studying protocols, the primary reason for reviewing TCP is to help improve the
performance of your Web applications.
Lets summarize how Web application performance is related to TCP. Here is a set of events
triggered by a Web application user requesting a Web page:

The Web application generates Web pages that are downloaded to client devices.
Web pages are composed of multiple elements, including layout code, scripts,
images, audio files, etc. Each component that is downloaded separately requires a
TCP connection.
TCP connections are initiated with a three-step TCP handshake before data
transmission can begin.
Once data transmission begins, the connection remains open until a termination
process is complete.
While the connection between two devices is open, TCP packets are transmitted
between the devices.
TCP buffers incoming packets as needed to ensure a data stream is reconstructed in
the order in which it was sent
Listening devices wait for packets to arrive and then send acknowledgments to the
sending device.
If the sending device does not receive an acknowledgement for the delivery of
packets within a predefined time period, the missing packets are retransmitted.

55
The Definitive Guide to Cloud Acceleration Dan Sullivan

You can see from this list of events that a single Web page can lead to the need for multiple
connections, that multiple connections have to manage multiple packets, and that packets
may be routed over slow, noisy, or congested networks. Packets routed over those poor
quality networks can be lost requiring retransmission. Therefore, to improve the
performance of your Web application, you might need to improve the performance of TCP
transmissions.

Improving Protocol Performance


There are a number of ways to improve or avoid common network performance problems.
This discussion will focus on three:

Application and data replication


Maintaining a pool of connections
Optimizing TCP traffic
These are complementary strategies that can all be applied to the same Web application to
help improve performance.

Application and Data Replication


Physics has the final say in how fast data can travel over a network. No matter what you do
to improve signal quality, correct for errors, and avoid network congestion, the distance
between devices will limit data transmission, which in turn can limit application
performance. You cannot tune or engineer your way out of this problem, but you can avoid
it (or at least minimize it).
One way to reduce the distance between your Web applications and your users is to
replicate your data and your applications over geographically distributed data centers.
Application users would be routed to the closest replication site when they accessed the
application or data. This approach not only reduces latency inherent in long distance
transmission but also can help avoid congestion on distant networks and limit exposure to
the disadvantages of peering relationships between ISPs (more on that in a minute).

Consider the latency of an application hosted in North America and serving a global
customer base. Within North America, customers might experience around a 40ms round-
trip time. The same type of user in Europe might experience a 75ms round-trip time, while
customers in East Asia might experience a 120ms round-trip time. If that same application
were replicated in Europe and Asia, customers in those regions could expect to see round-
trip latency to drop significantly.
Just hosting an application in different regions will not necessarily improve application
performance. Geographically close locations can still contend with high latencies and
packet loss rates if the quality of network between the locations is poor.

56
The Definitive Guide to Cloud Acceleration Dan Sullivan

Maintaining Pools of Connections


In addition to working around the limits of the network, you can tune devices to improve
performance. You have seen that a single Web page can require multiple connections, for
example, to download multiple image files. The process of creating a connection takes some
amount of time and when a large number of connections are used between a client and a
server, that time can add up to a substantial amount. One way to reduce that time is to
maintain a pool of connections. Rather than destroy a connection when it is no longer
needed, you can maintain the data structures associated with the connection and reuse
them when another component needs to be transferred. This kind of recycling of
connections reduces the overhead of creating and destroying connections and therefore
reduces the time that passes between the initiation of a request for a connection and the
time data is transmitted between the devices.

Optimizing TCP Traffic


TCP implements a well-defined set of operations and specifies particular sequences of
events for those operations, but there is still room for tuning the protocol. TCP depends on
a number of parameters and optional features that can be used to improve throughput and
overall performance of Web applications.

There are several characteristics of TCP connections that can be configured to improve
performance including:

Adjusting the size of buffers receiving dataA 64K buffer may be sufficient for some
applications but long round-trip times may warrant larger buffers.
Adjusting parameters that determine when to retransmit a packetTCP has a
parameter that determines how much data a destination device will receive before it
needs to acknowledge those packets. If this parameter is set too low, you might find
that you are not using all the bandwidth available to you because the TCP clients are
waiting for an acknowledgement before sending more data.
Using selective acknowledgementsTCP was originally designed to use a
cumulative packet loss acknowledgement scheme, which is not efficient on
networks with high packet loss rates. It can limit a sender to receiving information
about a single packet in the span of one round-trip time; in some cases, the sender
may retransmit more than necessary rather than wait for a series of round-trip time
periods to determine whether all packets were received. An alternative method,
known as selective acknowledgments, allows a receiving device to acknowledge all
received packets so that the sender can determine which packets need to be resent.
The TCP stack provided with some operating systems (OSs) might implement some of
these optimizations. Specialized tools are available for transmitting large files over
specially configured connection to improve performance. Network service providers may
also implement TCP optimizations between their data centers to improve performance.

57
The Definitive Guide to Cloud Acceleration Dan Sullivan

Web application performance is influenced by the HTTP and TCP protocols. The way
components of a Web page are downloaded, the overhead of establishing connections, and
the way packets are reliably delivered can all impact Web application performance.
Developers, designers, and system architects do have options though. Application and data
replication can overcome the inherent limitations of long distance transmissions.
Connection pooling and other device-specific optimizations can help reduce overhead at
the endpoints. Even TCP can be optimized to improve network performance.
Now that we have covered some of the lower level details about how the Internet can
impact Web application performance, it is time to turn our attention to higher-level
constructs: network providers.

Peering: Linking Networks


We noted earlier that the Internet is a collection of networks created and maintained by
network providers around the globe. Some network providers have access to all other
networks without incurring charges (Tier 1 providers); some have access without charges
to some but not other networks (Tier 2 providers); and others have to pay to access other
networks (Tier 3 providers). The relationship between providers that you might have
never heard of can have an impact on your Web application performance.


Figure 3.7: Peering creates a set of relationships between ISPs. These relationships
determine at a high level how data is transmitted across the Internet (Source: By
User:Ludovic.ferre (Internet Connectivity Distribution&Core.svg) [CC-BY-SA-3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons).

58
The Definitive Guide to Cloud Acceleration Dan Sullivan

Internet Exchange Points


As the number of network providers increases, direct connections between providers
becomes problematic. If there were only two Internet providers, a single connection would
link both. If there were four network providers directly connected to each other, then you
would need six connections between them. The number of direct connections grows
rapidly: 10 networks would require 45 direct connections to be fully interconnect and 100
networks would require 4950. (The formula to calculate this is [n*(n-1)]/2 where n is the
number of network providers).
Instead of implementing point-to-point connections through third-party networks, the
Internet was developed with exchange points where multiple ISPs can connect to each
other at a single location. These points are known as Internet exchange points (IXPs). The
IXP implements the physical infrastructure required to connect networks, and BGP is used
to logically manage the exchange of data across networks. Network providers with peering
agreements will accept traffic from their peering partners but not from other providers,
and they enforce that policy with BGP.

Performance Issues Related to Peering


Peering is perhaps the most pragmatic, efficient way to implement the Internet, but even
with that recognition, there are limitations to this scheme. These limitations can adversely
affect your Web application performance.

Potential for Sub-Optimal Routing


Routing between network providers is governed by agreements between those providers.
The optimal route between two points may be across a path that includes ISPs that do not
have peering agreements. (This discussion includes agreements to pay transit fees for
access to a network; this may not be part of the strict definition of peering but well
continue with this slightly relaxed definition.)
If your application is hosted on servers in one continent and you are servicing customers
around the globe, you will almost certainly be transmitting your data across multiple
providers. Is the route that data takes the most optimal route? In some cases, it probably is
and in some cases, it might not be. If your data or applications are replicated in multiple
sites with a hosting provider with appropriate peering agreements, you might avoid some
of the performance issues stemming from peering arrangements.
For application designers and network engineers, peering agreements are only slightly
more forgiving than the laws of physics. They dictate the limits of what is possible with
regards to routing on a global scale.

59
The Definitive Guide to Cloud Acceleration Dan Sullivan

Congestion
Traffic congestion is a risk in any network. When a large number of networks come
together in a single location, that risk grows with the amount of traffic relative to the
infrastructure in place to receive that traffic. To avoid the risk of congestion, you could
reduce the amount of traffic on the Internet, but that is unlikely. In fact, you are more likely
to face contention for available bandwidth as more and more devices become network-
enabled. (See, for example, the McKinsey & Company report entitled The Internet of
Things.)
Once again, you encounter a situation that you cannot eliminate but you might be able to
avoid. By replicating data and applications, you can avoid having to transmit data over
congested paths on the Internet. When replication is not an option, and even in some cases
where it is, you can still improve performance using various TCP optimization and end
point optimizations such as connection pooling.

Summary
In spite of all the time and effort you might put into tuning your Web application, the
Internet can become a bottleneck for your Web application performance. Both the HTTP
and TCP protocols have an impact on throughput. Network infrastructure and
configuration dictates latency. Aspects such as noise and congestion factor into packet loss
rates. These are difficult issues to contend with especially when you deploy an application
used on a global scale. Fortunately, there are strategies for dealing with these performance
issues, which the next chapter will address.

60
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 4: Multiple Data Centers and


Content Delivery
Deploying applications to a global user base involves several technical as well as many
potential legal issues. When you support a user base on multiple continents, you have all
the requirements of a more localized user basereliable access to the applicationas well
additional considerations, such as concerns about latency and the impact of network
performance on application response time. Although content moves easily to any point on
the Internet, local regulations and laws introduce a layer of complexity to the deployment
of applications and services internationally. Content that is readily available in Western
Europe or North America may be restricted in China, for example. As you consider how to
optimize your application performance and content delivery for a global user base, keep in
mind legal requirements as well. Ideally, you will be able to deploy an application and
network solution that addresses both technical and legal requirements within an
integrated service.
This chapter examines several topics related to data centers and content delivery:

Appeal of deploying multiple data centers


Challenges to maintaining multiple data centers
Data centers combined with content delivery networks
Country-specific issues to multiple data centers and content delivery, particularly in
China
Replicating data across multiple data centers is part of a solution for cloud application
acceleration, but more than simply the addition of data centers is required to meet the
needs of a global user base.

Appeal of Deploying Multiple Data Centers


There are many benefits to deploying multiple data centers to support enterprise
applications, particularly redundancy and failover advantages as well as the potential for
reduced latency. Resiliency is a crucial feature of enterprise applications. Customers,
employees, contractors, and business collaborators expect to have access to the
applications and data they need. At the same time, applications are subject to a wide range
of potential disruptive events:

Application maintenance
Data loss
Network disruption
Disruption in environmental controls in data centers

61
The Definitive Guide to Cloud Acceleration Dan Sullivan

Hosting your data and application in multiple data centers can help mitigate the impact of
each of these events (see Figure 4.1).


Figure 4.1: The implementation of multiple data centers offers a number of
advantages, including resiliency to several potential problems (Image source:
CDNetworks).

Application Maintenance
Applications have to be shut down for maintenance, sometimes due to needed upgrades or
patches to an operating system (OS) or application code. In other cases, equipment in the
data center needs to be replaced or reconfigured, making them unavailable to support your
enterprise application. If the application and data are replicated to another data center,
user traffic can be routed to the alternative data center, allowing users to continue working
with the system.
This type of replication can also be done locally, within a data center. Failover servers and
redundant storage arrays can allow for resiliency within the application. The most obvious
risk with this approach, however, is that a data centerwide disruptive event would render
the failover servers and storage inaccessible.

Data Loss
Data loss can occur as a result of hardware failures, software errors, and user mistakes.
Depending on the type of failure, having data and applications replicated in multiple data
centers can aid in recovery.

62
The Definitive Guide to Cloud Acceleration Dan Sullivan

Data Loss Due to Hardware Failure


Consider the case of hardware failing within one data center and corrupting a set of data.
The data might have been altered or may be unrecoverable because of a hardware failure.
In such cases, it is highly unlikely that the hardware in other data centers experienced the
exact same type of failure and lost the same data.
Note
This scenario ignores the highly improbable but theoretically possible case of
a design flaw in the hardware that fails in exactly the same way, at the same
time, and with the same data.

If the data had been replicated in at least one other data center, the data could be recovered
and restored in the data center that is experiencing a failure. This scenario would be
comparable to restoring data from a backup. There would, of course, be a time delay
between the time of the failure and the time that data is restored. In cases where the time
between failure and restoration must be as short as possible, application designers can
replicate both data and applications between data centers. In this circumstance, in the
event of a hardware failure, users would be routed to an application running in the
alternative data center. Users might experience degraded performance due to an increased
number of users on the application servers; they might also face longer network latency if
the alternative data center is significantly farther away then the location of the failed
hardware data center.

Data Loss Due to Software Failure and Human Error


Data loss due to software failure and human error can be more difficult to address than
data loss due to hardware failure. The additional challenge stems from the fact that similar,
if not identical, software runs in multiple data centers when applications are replicated.
The primary reason to run multiple instances of applications in multiple data centers is to
have identical, redundant systems that can take over the workload of the other in the event
of a failure. This setup requires identical or near identical configurations, which creates the
root of a potential problem: a significant software error in every instance of an application.

Software errors can corrupt data in many ways. A database operation designed to update a
particular set of records might unintentionally update more than the target data set. An
error in writing data to disk can cause a pointer to a data structure to be lost, leading to
unrecoverable data. A miscalculation could write the wrong data to a database. In each of
these cases, a data loss occurs and, unlike hardware-related data loss, the same type of
error is likely to occur in replicated instances as well.

63
The Definitive Guide to Cloud Acceleration Dan Sullivan

Application designers can plan for many types of potential human errors, from invalid data
entry to changes that violate business rules. It is difficult to distinguish an intentional
change from an unintentional change when data validation rules, business rules, or other
criteria for assessing users actions are not violated. Changes made by users that do not
violate application rules are considered valid; by default, they will be accepted and
eventually replicated to other instances of the data stores.
There are advantages to having multiple data centers to mitigate the risk of data loss.
However, it is important to understand the limitations of this strategys ability to recover
from different causes of data loss.

Network Disruption
Network disruption at a data center level can adversely impact a large number of users.
The importance of reliable access to the Internet for a data center cannot be overstated.
Data centers typically contract with multiple Internet providers for access to the Internet. If
one of the providers experiences a network disruption, traffic can be routed over the other
providers connections. The assumption, of course, is that the redundant services of
multiple providers will not fail at the same time.
This assumption is reasonable for the most part except when you consider major
disruptions due to natural disasters. Severe storms that disrupt power for days or
earthquakes that damage cables can leave entire data centers disconnected from the
Internet for extended periods.

Avoiding areas prone to natural disasters can be a challenge. As Figure 4.2 shows, areas of
high risk for seismic activity exist in the United States on the West coast, in the Midwest,
and in small areas of the Southeast. The Midwest and Gulf Coast are, in general, at low risk
of seismic activity but are prone to tornadoes and hurricanes, respectively. For this reason,
using multiple data centers located in areas with different risk profiles is a reasonable
approach to mitigating the risk of data centerlevel network disruptions.

64
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 4.2: Seismic hazard map of the United States indicates locations of highest risk
on the West Coast and parts of the Midwest and the Southeast (Source:
Earthquake.usgs.gov).

Disruption of Environmental Controls in Data Center


There is an old proverb dating back at least to the 14th century that starts For want of a
nail, the shoe was lost. For want of a shoe, the horse was lost and ends with a line about
the loss of a kingdom. The gist of the proverb is that small events can have large
consequences. This same kind of potential problem exists in data centers with regards to
environmental controls.

Managing environmental controls, especially cooling, is an important part of data center


operation management. When you consider risks to data center operations, it is easy to
focus on the major components, such as server hardware and network connections.
However, it is important to not forget the less visible but still important elements such as
environmental controlsthe proverbial nails that hold together data centers. Failures in
environmental controls can lead to a partial shutdown of hardware until environmental
controls systems are functioning again.

Redundant data centers can help mitigate several risks of service disruption, ranging from
application maintenance and software failure to natural disasters that destroy data centers
and loss of environment controls that diminish operating capacity. The advantages of
multiple data centers are not limited to minimizing the impact of disruptions.

65
The Definitive Guide to Cloud Acceleration Dan Sullivan

Reduced Latency
Latency, or the round-trip time from a server to a client device and back to the server, can
vary significantly for different users. As Figure 4.3 shows, a customer in the Pacific
Northwest accessing an application hosted in New York will experience latencies almost
three times that of a customer in Florida.
One way to address these wide variations in latency is to deploy applications to multiple
data centers and architect applications to serve customers from the closest data center.


Figure 4.3: Latencies within a country can be substantially different for customers in
different locations when those customers are served from a single data center
(Image source: CDNetworks).

Clearly there are advantages to having multiple data centers host your applications.
Multiple data centers provide for redundancy and improve the resiliency of applications.
Hardware failures, network disruptions, hardware-based data loss, and risks of natural
disasters can all be addressed to some degree with multiple data centers. It would seem
obvious that we should all deploy large-scale, mission-critical applications to multiple data
centers, but there are drawbacks. It is often said that there are no free lunches in
economics. Similarly, there are no free solutions to IT.

66
The Definitive Guide to Cloud Acceleration Dan Sullivan

Challenges to Maintaining Multiple Data Centers


Some of the challenges to maintaining data centers are well known. Others, especially with
respect to addressing content delivery challenges, are not as obvious. In the interest of a
broad assessment of the challenges facing businesses and organizations considering
multiple data center deployments, the following list highlights factors to consider:

Costs of data centers/rising costs of colocation


Need for specialized expertise
Software errors
Synchronization issues
Unaddressed content delivery challenges
Outlining each of these challenges will reveal how deploying multiple data centers might
not be sufficient to meet global content delivery needs.

Costs of Data Centers/Rising Costs of Colocation


The costs of data centers can be divided into the initial costs to construct and equip and the
ongoing costs to operate. Construction costs will vary by location for any type of building
construction, but data centers have specialized requirements. For example, compared with
typical buildings, buildings that house data centers require higher levels of security,
substantial cooling systems, redundant power supply lines, and depending on the area,
remediation for regional natural disaster risks. For example, data centers built in areas
with substantial risk of seismic activity may require earthquake resistant structures.


Figure 4.4: Data centers, such as this one housed in Oregon in the US, are major
investments to both build and operate (Source: By Tom Raftery (Flickr) [CC-BY-SA-
2.0 (http://creativecommons.org/licenses/by-sa/2.0), via Wikimedia Commons).

67
The Definitive Guide to Cloud Acceleration Dan Sullivan

Operating costs in a typical data center are dominated by the costs of servers and power.
Networking equipment, power distribution systems, cooling systems, and other
infrastructure are also major cost categories.

These costs have to be weighed against the benefits of deploying multiple data centers,
which are, as previously described, are substantial. There are alternative ways to achieve
some of the same benefits, especially with regards to content delivery, without incurring
the substantial step-wise costs of adding data centers.

Need for Specialized Expertise


A fully staffed data center requires an unusual combination of skills. The physical plant
aspects of a data center require individuals knowledgeable in heating ventilation and
cooling (HVAC), power systems, and physical security. Network professionals working in a
data center need the skills to configure complex networks, analyze performance data, and
tune configurations. A data center might support the use of multiple hypervisors and host
operating systems (OSs), which requires knowledgeable systems administrators.
IT managers will be in demand as well. Teams of professionals may be needed to diagnose
and resolve problems in a data center. For example, power fluctuations can adversely affect
cooling systems, which in turn can cause problems for servers that first come to the notice
of systems administrators. Coordinating ad hoc teams to address one-time problems is just
part of a mangers responsibility. Ongoing operations require services such as a Help desk
as well as the ability to communicate with business process owners who might have
limited knowledge of data center operations.
Even when businesses have the resources to absorb the costs of data center construction,
maintenance, and management, the deployment of multiple data centers does not always
address global reach network issues.

Software Errors
Complex systems, such as data centers and cloud infrastructures, are subject to failure.
Even when systems are designed for resiliency, they can experience problems due to
software errors or unanticipated cascading effects that propagate through a complex
system.
Consider some of the major cloud service outages in the past several years as examples of
what can go wrong:

In December 2012, Amazon Web Services experienced an outage in its US eastern


data center due to a problem with the Elastic Load Balancing service.
A software bug forced Google to resort to backups to restore mail for 150,000 users
whose data had been lost in one of a series of outages between 2008 and 2009.
In February 2013, Microsoft Azure storage service was unavailable due to problems
with expired digital certificates

68
The Definitive Guide to Cloud Acceleration Dan Sullivan

Amazon, Google, and Microsoft are all major cloud providers with sufficient resources to
deploy and manage multiple data centers. Yet even these major providers experience
substantial disruptions due to software errors.

Synchronization Issues
When data has to be available in multiple data centers, you have to contend with
synchronization issues, including both technical and cost considerations. Synchronizing
large volumes of data can consume substantial amounts of bandwidth. Like the data sent
back and forth to end users of applications, data sent to other data centers is subject to
network congestion, long latencies, and lost packets. Sub-optimal network conditions can
lead to extended synchronization times.
In a worst-case scenario, delays in synchronizing data can adversely affect application
performance. For example, a server with stale data could report inaccurate information to a
user, while another user issuing the same query but receiving data from a server in a
different data center might get the correct, up-to-date information.

Unaddressed Content Delivery Challenges


In addition to the challenges of deploying multiple data centers, businesses still have to
contend with the fact that not all their content delivery needs are met by this setup. In spite
of deploying the best technology deployed in state-of-art data centers that are managed by
highly skilled professional, these businesses still have to address issues like non-
synchronized content, lost packets, and inefficient TCP/IP traffic.
Some content requires a single source. For example, airline tickets must have a single
source record so that a seat on a flight is not sold more than once. In these types of
applications, multiple data centers can only be used if specialized techniques, such as two-
phase commits, are employed.
Lost packets are often a product of congested networks. Congestion can be episodicfor
example, during periods of peak demand for bandwidthor it might be chronicfor
example, due to peering agreements that lead to less than optimal amounts of traffic across
networks. Deploying applications to multiple data centers can help avoid or minimize the
number of hops a packet must travel, but when data packets travel over congested
networks, the packets are at risk of being lost.

Perhaps one of the most significant unaddressed challenges of deploying multiple data
centers is the fact that traffic might still be using non-optimized TCP configurations.
These unaddressed needs and disadvantages to multiple data centers are not presented as
deterrents from employing multiple data centers. Instead, the point is made to use a more
balanced approach that combines the benefits of multiple data centers with the benefits of
a content delivery network that maximizes the pros of the two while limiting the costs and
disadvantages of each.

69
The Definitive Guide to Cloud Acceleration Dan Sullivan

Combining Data Centers, Content Delivery Network, and Application


Acceleration
Constructing and running multiple data centers addresses some of the significant
challenges businesses face, especially the need for redundancy. Data centers can help
improve latency in some cases, but the overall benefit may not be sufficient for that alone
to justify the capital and operational costs of data centers. Instead of relying on a single
method to address multiple problems (for example, latency and redundancy), an optimal
solution is based on combining a small number of data centers with the use of content
delivery networks and application acceleration techniques.

Multiple data centers provide redundancy but at a substantial cost and increased
complexity. Nonetheless, having at least two data centers is difficult to avoid if you are
looking to maintain application availability in the event of catastrophic failure at a single
data center. One can reasonably ask whether one backup data center is enough. There
could conceivably be catastrophic failures at two data centers or a catastrophic failure at
one and a less significant but still performance-degrading event at the other data center.
The right solution depends on your requirements and tolerance for risk.

If your risk profile allows for two data centers rather than more, you can reduce the overall
capital and operational expenses for data centers. Two data centers employing a geo-load-
balancing method can share the application traffic between the two data centers. This
setup will likely reduce the latency for some users in close proximity to the data center but
in general, data centers do not solve the latency problem. As this scenario considers only
two data centers, the number of users benefiting from this arrangement is limited.

70
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 4.5: Multiple data centers increases redundancy but at additional cost and
management complexity.

If you combine two data centers with caching of static content along with dynamic
application traffic optimization techniques focused on TCP and HTTP, you can improve end
user experience by enabling lower latency along with improved availability.

71
The Definitive Guide to Cloud Acceleration Dan Sullivan

Optimizing Network Traffic in the Middle Mile


Network traffic moves from servers in the data center and across the Internet until it
reaches the target device. The segment of the network from the Internet Service Providers
(ISPs) facility to the end users device is known as the last mile. The speed of the network
in the last mile is dependent on the type of medium in place locally to transmit signals. The
segment of the network between the source data center and an the edge of the network
prior to the last mile is known as the middle mile.


Figure 4.6: The middle mile between data centers and the edge of the last mile can be
optimized to improve TCP performance. The first mile is the segment originating at
the data center while the last mile is the segment ending at the end user.

The distance that packets must travel in the middle mile can be significant. Anything that
can reduce the number of packets sent can help improve performance. These techniques
include:

Data compression to reduce the amount of payload data


TCP optimizations to reduce the number of round trips made
HTTP optimizations to reduce connection management overhead
Content delivery networks and application delivery network services can use these and
other techniques to optimize traffic between data centers.

72
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 4.7: Static content is cached at data centers. When a user requests content, it
is served from the closest CDN data center. If the content is not currently in the
cache, it is requested from the origin data center.

Data compression can help reduce the number of packets that must be sent between the
data center and client devices by reducing the size of the payload. TCP breaks a stream of
data into individual units of length that depend on the medium; for example, Ethernet
packets can be as long as 1500 bytes in length, including header information. The media
defines the maximum transmission unit on a network, so a valuable technique is to
compress the payload data before it is transmitted.

TCP has a number of configuration parameters that allow for tuning. Characteristics such
as buffer size and the settings controlling the retransmission of lost packets can be adjusted
to maximize throughput. For example, one tuning technique employs a more efficient way
of detecting a lost packet and reduces the number of packets that might be retransmitted
unnecessarily.
These types of techniques are especially important when optimizing non-static, application
traffic. Static pages can be cached by content delivery networks and served to users within
their region. Application-generated content, such as responses to database queries, search
results, or reporting and analysis tool output, will vary from one user to another.

73
The Definitive Guide to Cloud Acceleration Dan Sullivan

Caching
Acceleration techniques reduce latency and packet loss between data centers, but there are
additional benefits of caching.


Figure 4.8: Dynamic content cannot be cached and must be sourced from the data
center hosting the application generating the content. Acceleration improves
throughput and allows for a more responsive application experience. Here, the user
has accelerated access to dynamic content from a distant data center and fast access
to static content from a closer data center.

Load Balancing
By using a content delivery network, the workload for serving content is distributed to
multiple points of presence around the globe. Users in different parts of the world viewing
the same Web page will see the same content even though that content is served from
different locations. This load balancing is known as geo-load balancing and is one type of
load balancing supported by content delivery network providers.

74
The Definitive Guide to Cloud Acceleration Dan Sullivan

In addition to geo-load balancing, content delivery network providers can load balance
within a point of presence. Clusters of servers can be deployed to serve static content so
that a single Web server does not become a bottleneck. It would be unfortunate if after
globally distributing your content and deploying TCP optimizations and other acceleration
techniques, a single Web server slows the overall throughput of your application.

Monitoring Server Status


Content delivery network providers can monitor server status and load and adjust the
server resources dedicated to your application as needed. For example, during periods of
peak demand, additional servers can be added to a load-balanced cluster and when
demand subsides, those extra servers can be returned to a resource pool of cloud
resources.

Fault Tolerant Clusters of Servers


Clusters of servers used for content delivery can be configured as a fault tolerant cluster. In
the event of a failure in one of the servers, the other servers will automatically process
content requests that under normal operating conditions would be processed by the failed
server.

Virtual IP Address and Network Failover


Servers are not the only potential point of failure. Network failures that leave some servers
inaccessible could disrupt content distribution. Using virtual IP addresses allows content
delivery network providers to create a more fault tolerant network and route traffic to
accessible servers.

Benefits of Multiple Data Centers, Content Delivery Networks, and Application


Delivery Acceleration
The combination of multiple data centers, content delivery networks, and application
delivery acceleration addresses significant challenges to delivering static and dynamic
content in a reliable, low latency way, including:

Redundant data centers provide for reliable access to content in the event of a
failure at another data center
Acceleration techniques reduce latency and packet loss resulting in more responsive
applications from the end users perspective
Acceleration techniques reduce the time required to keep static content
synchronized across data centers
Content delivery networks maintain copies of static content in multiple data centers,
reducing the distance between end users and content
Content delivery providers can offer load balanced and fault tolerant clusters of
servers
Content delivery providers can monitor server status and the load on systems and
adjust resources as needed to maintain acceptable performance levels

75
The Definitive Guide to Cloud Acceleration Dan Sullivan

These benefits apply to most content delivery use cases but they do not capture all the
challenges an organization faces when distributing content globally. China, in particular,
presents additional requirements that are worth considering as you evaluate content
delivery and application delivery acceleration providers.

China: Country-Specific Issues in Content Distribution


It is difficult to describe China without using superlatives. China has the worlds largest
population and occupies the second largest landmass of any country. It has the largest
number of Internet users in Asia. Since 2000, its gross domestic product has grown
between 8% and 11.4% per year (Data source: http://www.chinability.com/GDP.htm). The
business opportunities presented by China is well documented and is a key driver
motivating businesses to provide their goods and services to Chinese markets. An
important part of accessing markets and delivering goods and services is providing access
to information and applications.


Figure 4.9: China presents many business opportunities but regulation and cultural
differences need to be considered (Source: By Cacahuate, amendments by Peter
Fitzgerald and ClausHansen (Own work based on the map of China by PhiLiP) [CC-
BY-SA-3.0-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons).

76
The Definitive Guide to Cloud Acceleration Dan Sullivan

Technical Challenges to Delivering Content in China


Content delivery considerations are only a part of a strategy for doing business in China,
but they should not be dismissed. China is geographically large, which can mean long
latencies between distant cities.

As with Internet service in any part of the globe, peering relationships can substantially
affect latency and packet loss. ISPs in China may have peering agreements with other ISPs
that lead to less than optimal routing of network traffic. This can lead to longer latencies
and increased packet loss due to congestion. Due to issues with peering, businesses may
experience less than 100% availability of network devices.

Internet Users in Asia


June, 2012

China
India
Japan
Indonesia
South Korea
Millions of Internet Users
Philippines
Vietnam
Pakistan
Thailand
Malaysia

0 100 200 300 400 500 600



Figure 4.10: The number of Internet users in China far exceeds those in any other
country in Asia (Data source: Internet World Stats,
http://www.internetworldstats.com/stats3.htm).

One way to address these technical issues is to have multiple points of presence in China.
This setup allows for static content caching to more sites and therefore closer to more
users. Having multiple points of presence can help reduce the number of networks that
must be traversed to deliver content, and thus avoid some of the negative consequences of
poor peering. In addition to these technical challenges, there are regulatory and legal issues
that must be considered when delivering content in China.

77
The Definitive Guide to Cloud Acceleration Dan Sullivan

The Great Firewall of China


China regulates and restricts Internet content using a set of controls commonly known as
the Great Firewall of China. The GreatFirewallofChina.org estimates that as many as 30,000
civil servants review Internet content and block material deemed undesirable. In some
cases, entire sites are blocked if the material is categorized as undesirable.


Figure 4.11: Web content and sites are regulated in China and sites readily available
to users outside of China are inaccessible within that country (Source:
GreatFirewallofChina.org).

The Great Firewall of China has two implications for delivering content in China: technical
and legal. Downloading content within the Great Firewall of China can be significantly
slower than doing so outside the firewall. In addition to long distances and poor peering,
businesses need to consider the impact of censoring technologies on network performance.

78
The Definitive Guide to Cloud Acceleration Dan Sullivan

From a legal perspective, businesses might find themselves self-censoring content


delivered in China. This activity might lead some to maintain two sets of content: one for
general use and one for use in China. Also, businesses will need to have procedures in place
to respond to take-down orders from the government should some of the business content
be considered undesirable. Sites or content about politics and gambling are considered
high risk but even news and user-generated content may be subject to scrutiny. Chinese
regulations require that content providers comply with Chinese law, including not posting
prohibited content, having all necessary licenses, and monitoring the site to ensure banned
content is not posted.

Content Delivery Network Considerations in China


Given the technical and legal issues with delivering content in China, it is clear that
specialized services are needed to comply with local regulations. Content delivery network
providers may be in a position to assist their customers with compliance if the providers
have local staff that are familiar with regulations and understand procedures for
responding to take-down notices and other government orders.

Summary
Multiple data centers, content delivery networks, and application delivery networks with
network acceleration can improve application performance and responsiveness from an
end users perspective. Data centers provide essential redundancy needed for reliable
access to applications and content. They are, however, costly and complex to operate. By
combining a small number of data centers with application and network acceleration
technologies, businesses can realize improved application performance without the cost of
additional data centers.
Delivering content to global markets requires attention to a variety of national regulations
and cultural expectations. China presents opportunities for business but requires
compliance with laws governing the types of content that should be available to Internet
users within China.

79
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 5: Architecture of Clouds and


Content Delivery
Although cloud computing has been widely adopted across a range of organizations, this
technology might not meet all the key requirements of enterprise applications. The
previous chapter of this guide examines how the architecture of the Internet affects
application performance, especially on a global scale. This chapter turns the focus to the
architecture of public cloud services and considers how public cloud providers meet some,
but typically not all, enterprise requirements. The chapter considers this issue from three
perspectives:

Public cloud providers and virtualized IT infrastructure


Application designers and their responsibility for application architecture
Content delivery networks as complementary to public cloud provider services
Designing and deploying enterprise applications on a global scale requires attention to
infrastructure, architecture, and network optimization. As this chapter highlights, public
cloud providers are well positioned to address infrastructure and related service issues.
Content delivery networks address the network optimization problem. Combining public
cloud services with content delivery services in a well-designed application architecture
can provide a solid foundation for building enterprise applications.
Although the IT industry sometimes refers to public cloud providers in general, it is
important to understand that there are different types of public cloud services and
deployment models. Lets begin a discussion of public cloud providers with a review of
distinguishing characteristics.

Public Cloud Providers and Virtualized IT Infrastructure


The common characteristic of all cloud providers is a service model based around on-
demand access to shared, typically virtualized, computing and storage resources. Cloud
providers distinguish themselves in the way they virtualize IT infrastructure. You can see
these differences in varying service and deployment models. Service models vary according
to the types of infrastructure and application services offered, while the deployment
models vary according to the types of users with access to the providers resources.

Before diving into a detailed description of service and deployment models, lets first define
essential characteristics of all clouds.

80
The Definitive Guide to Cloud Acceleration Dan Sullivan

Essential Characteristics of Cloud Computing


Public cloud providers offer relatively easy access to computing and storage services. The
US National Institute of Standards and Technology (NIST) has identified five essential
characteristics of cloud computing:

On-demand service
Broad network access
Resource pooling
Rapid elasticity
Measured service
(For full details on the definition, see The NIST Definition of Cloud Computing:
Recommendations of the National Institute of Standards and Technology. NIST Special
Publication 800-145.)
Network performance is outside the scope of the cloud computing definition, but it is still
an important element of overall application performance. Already in this discussion, it is
becoming apparent that cloud computing alone does not address all the requirements of
globally deployed, enterprise applications. Latency is a dominant problem in Web
application performance. Cloud applications run in data centers that might be a long
distance from end users, which introduces distance-induced latencies. There are other
potential performance-degrading problems as well. For example, peering agreements
between ISPs may be structured in ways that lead to degraded performance when data is
routed over networks controlled by other ISPs.

On-Demand Service
Cloud users have the ability to provision and use virtualized resources when they are
needed for as long as they are needed. Specialized systems administration skills are not
required. Users work with a cloud computing dashboard to select the types and number of
virtualized servers needed. For example, if a group of analysts needs a server to analyze a
large data set, they would log into a dashboard or control panel, select an appropriate type
of virtual server, identify the machine image with the necessary operating system (OS) and
analysis software, and launch the virtual machine.
It is important to emphasize the self-service nature of this process. Virtualization has long
been used in data centers to improve the efficiency of server utilization, but prior to cloud
computing, setting up a virtualized server required significant knowledge about OSs,
hypervisors, and system configurations. Cloud computing platforms automate many of the
steps required to instantiate a virtual machine.

81
The Definitive Guide to Cloud Acceleration Dan Sullivan

Broad Network Access


Cloud computing allows for access to services over standard network and application
protocols. This approach decouples virtual servers and storage resources from the client
devices that access them. This decoupling is especially important when applications have to
support multiple client devices, such as mobile devices, laptops, servers, and mainframes.
In addition to the ability to reach applications, broad network access enables access to data
stored persistently in the cloud. Data access may be mediated through an application or
through a standard Web services method such as REST that allows users to store,
manipulate, or delete data using URL-based commands.

Resource Pooling
Cloud computing customers share infrastructure. When a user starts a virtual machine, it is
instantiated on a physical server in one of the cloud providers data centers. The customer
may be able to choose the region or data center in which the virtual machine runs but does
not choose the physical server itself. In all likelihood, the virtual machines running on a
single server belong to several customers. Similarly, one customers blocks of storage in a
storage array may be intermingled with other customers data.

The cloud computing platform and hypervisors are responsible for isolating resources so
that they are accessible only to resource owners and others explicitly granted access to the
resources. With secure resource pooling, cloud providers can optimize the efficiency of
server utilization by pooling the resource requirements of large numbers of customers.

Rapid Elasticity
In a cloud, it is relatively easy to instantiate a large number of virtual servers. For example,
the group of analysts working on a large data set may determine that the best way to
analyze the data is to use a cluster of servers working in parallel. Launching 20 servers is
not much more difficult than launching one. (Coordinating the work of the 20 servers is a
more complex problem, but there are applications for managing distributed workloads that
can be readily deployed in a cloud.)

Cloud providers also offer services to monitor loads on servers and bring additional
servers online as needed. For example, if there is a spike in demand for a Web application,
the cloud management platform can detect the increased demand, bring additional servers
online, and add them to a load-balanced group of servers. Those servers can be shut down
automatically when demand drops.
The ability to rapidly add servers or storage can help address some aspects of peak
demand, but this type of rapid elasticity does not address network-related performance
issues. Consider the following simple example.

82
The Definitive Guide to Cloud Acceleration Dan Sullivan

A Web application running in a cloud data center in the eastern United States is
experiencing higher than usual demand from users in Europe and Asia. The load-
monitoring system detects an increase in workload and brings an additional virtual server
online. The set of application servers can now process a larger number of transactions in a
given amount of time. This setup does not, however, affect the time required to transmit
data between the servers and the client devices. The latency of the network between the
data center in the US and the client devices in Europe and Asia is not altered by changes in
the application server cluster running in the cloud.

Rapid elasticity is clearly an important part of optimizing application performance, but it is


not sufficient to address all performance issues.

Measured Service
Cloud providers use a pay as you go or pay for what you use cost model. This setup fits
well with the rapid elasticity and pooled resource aspects of cloud computing. Customers
do not have sole use of dedicated hardware, so it is important to charge according to the
share of resources used. Cloud providers typically use tiered pricing based on the size or
quality of a service. For example, a high-memory virtual machine will cost more than a low-
memory virtual machine. Similarly, charges for higher-performance solid state drives will
be higher than those for commodity disk storage.

As you can see from this description, the essential characteristics of cloud computing are
insufficient to address the full range of cloud optimization requirements. There are
different deployment and service models, and it is worth considering how these may
impact cloud optimization issues.

83
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 5.1: The essential characteristics of cloud computing address many areas
relevant to application performance, but key considerations, such as latency and
packet loss, are outside the scope of cloud computing fundamentals.

Cloud Computing Deployment Models


You can distinguish cloud services by the types of users who are granted access to the
services. Four deployment models are:

Public
Private
Hybrid
Community
Public clouds are open for use to the general public. Generally, anyone with a credit card
and access to the Internet can make use of public cloud resources. Public clouds are the
least restrictive of the four deployment models with regards to who is granted access.

84
The Definitive Guide to Cloud Acceleration Dan Sullivan

Private clouds are at the other end of the access spectrum. This type is the one of the most
restrictive cloud deployment models. Access to a private cloud is restricted to members of
a single organization, such as a business or government entity.

Hybrid clouds are clouds with two or more clouds that are linked in such a way to allow for
portability of data and applications between the two clouds. This definition allows for
different combinations of cloud types (for example, two private clouds, a private and
community cloud, and so on) but a private cloud and a public cloud is most typical.

Community clouds are designed to serve a group of users from multiple organizations who
share common requirements. For example, a community cloud could provide specialized
HIPAA-compliant services to healthcare providers. Only members of the specialized
community are granted access to these clouds.
Deployment models are important considerations for those concerned with security and
compliance issues. These models do not directly affect application performance because
the same cloud infrastructure and management platforms can be deployed in any of these
models.
Deployment models may indirectly affect performance in cases in which you want to
leverage the benefits of content delivery services or network optimization services.
Consider an example of a hybrid cloud based on one private cloud and one public cloud.
The public cloud may offer a proprietary content delivery network that works with content
stored in the public cloud. Content maintained in the private cloud may have to be
replicated to the public cloud before it can be served through the content delivery network.
In addition to considering deployment models, it is important to consider how different
service models may impact overall application performance and our ability to optimize
application and network services.

Cloud Service Models


Service models of cloud computing differ according to the amount of control customers
have over virtualized resources and the level of service provided. The three types of cloud
service models are:

Infrastructure as a Service (IaaS)


Platform as a Service (PaaS)
Software as a Service (SaaS)
Some vendors employ even finer-grained classification schemes with terms such as
Databases as a Services (DaaS) and Analytics as a Service (AaaS); for this chapters
purposes, we will not delve into those specialized areas. The goal here it to understand how
various cloud service models can impact application performance and optimization. The
IaaS, PaaS, and SaaS models are sufficient for the needs of this discussion.

85
The Definitive Guide to Cloud Acceleration Dan Sullivan

Infrastructure as a Service
IaaS models allow users the greatest level of control over the provisioned infrastructure.
IaaS customers, for example, can choose the size of the virtual server they run, the OS,
software libraries and applications, as well as various types of storage. Users control the
OS, so they have substantial control over the application platform and system
configuration. The owner of a provisioned server could:

Grant administrator access to others if needed


Install software applications
Create user accounts for others to log into the server
Change access control privileges on files stored locally
Configure local firewalls and other security measures
IaaS clouds are a good fit when users have specialized requirements and the need for
control over OS and application stack software. If you need to run a legacy, custom
application in the cloud, the IaaS model is probably the best option.
IaaS clouds provide a service catalog consisting of machine images that can be run in the
cloud. These can include images with a variety of minimally configured OSs (for example, a
variety of Linux and Windows Server versions) or machine images that include OSs as well
as parts of an application stack, such as relational databases, application servers, and
specialized applications, such as search and indexing systems.
Although IaaS provides the greatest degree of control, it also requires the most systems
management expertise. For example, you might need a combination of OSs, patch levels,
and software libraries that is not available in the service catalog. In this case, you would
have to start with a base image and install the additional patches and components you
need. If you plan to use this image for extended periods of time, you can save the image but
you will have to maintain the image going forward. This maintenance includes patching
and updating the OS as needed.


Figure 5.2: IaaS allows for high levels of control but entails high levels of
responsibility on the part of cloud users.

86
The Definitive Guide to Cloud Acceleration Dan Sullivan

In addition to performing systems administration operations, you might need to perform


database administration and other types of application management when using IaaS.
Again, there is a tradeoff between flexibility and responsibility. In an IaaS setup, you have a
choice of relational database management systems, search and indexing applications, and
application servers. The disadvantage is that you are then responsible for properly
installing, configuring, and maintaining these components.
In some situations, developers and application designers do not need the level of control
offered by IaaS. In those cases, a PaaS cloud may be a better option.

Platform as a Service
In a PaaS cloud, customers do not have substantial control over virtual servers, OSs, or
application stacks. Instead, the PaaS provider supports programming languages, libraries
and application services used by developers to create programs that run on the PaaS
platform.

PaaS providers offer different types of services. Some specialize in a single language or
language family, such as Java and languages that run on the Java virtual machine (JVM).
Others tend to be language agnostic but offer a variety of frameworks and data stores that
can be combined to meet the particular needs of each customer.

An advantage of PaaS over IaaS is that the PaaS cloud provider is responsible for managing
more components in the application stack and infrastructure. As with IaaS clouds, the PaaS
cloud provider manages underlying hardware and network infrastructure on behalf of
customers. PaaS providers manage additional software components as well.


Figure 5.3: PaaS models build on the same kind of infrastructure provided by an IaaS
setup but alleviate some of the management responsibility of an IaaS.

87
The Definitive Guide to Cloud Acceleration Dan Sullivan

The tradeoff for developers is that they are more constrained in their choices in a PaaS. For
example, a PaaS cloud provider may implement a message queue service for its cloud. If
developers need a message queue, they will have access to the PaaS providers chosen tool.
Developers working an IaaS cloud could choose from a number of message queue systems
and manage it themselves.

Extending this model of increasing levels of service and decreasing levels of control brings
us to the SaaS model.

Software as a Service
SaaS providers are the most specialized of cloud providers. Rather than focus on providing
access to virtualized servers and storage or offering developers a managed application
stack, SaaS providers offer full applications. Common SaaS use cases include:

Human resources (HR) management


Customer relationship management (CRM)
Financial management
Project management
Time tracking
Lead generation
As the list implies, a wide variety of back-office functions are available through SaaS
providers. SaaS providers offer turnkey solutions. Developers do not have to design data
models or build user interfaces as they would in a PaaS. They also do not assume the
systems management tasks associated with using IaaS clouds.

The disadvantage of SaaS is that customers have the least control over the service. For
example, a SaaS customer cannot generally dictate the type of data store used to
persistently store application data. The SaaS provider makes such design decisions and all
customers use a common application platform.

Since SaaS providers have control over the underlying architecture of the system, it is
important for customers to understand how the SaaS providers design choices affect data
integration, data protection, and other aspects of application management. For example,
data from a SaaS financial management system may be needed in an independent
management reporting system. Users may be able to perform bulk exports or query smaller
subsets of data through an application programming interface (API). Some customers may
need to maintain their own backups of data from the SaaS application. In this case, the
customer will need to define and implement an export process in order to maintain up-to-
date copies of data on-premise. Customers would also need to understand the data model
used by SaaS to extract data for use in other applications.

88
The Definitive Guide to Cloud Acceleration Dan Sullivan


Figure 5.4: SaaS provides turn-key services with the lowest levels of systems
management responsibility but limited ability to customize the implementation of
the service.

The essential characteristics of cloud computing along with the deployment models and
service models give a broad, structured view of cloud computing services. From this
perspective, you can see that cloud providers virtualize, some but not all, the key
components of enterprise-scale applications. In particular, cloud service providers
virtualize:

Servers
Storage
Platform and software services
The servers may be substantially under the control of cloud users, as in the case of an IaaS
cloud; abstracted to logical computing units, as in some PaaS providers; or essentially
hidden from users, as is the case with SaaS.

89
The Definitive Guide to Cloud Acceleration Dan Sullivan

Storage is also virtualized. IaaS cloud providers offer several types of persistent storage,
including object storage, file system storage, and database storage systems.
Services, such as message passing, search over unstructured data, and specialized data
services are also virtualized. As one moves into PaaS and SaaS service models, higher-level
services are provided.
This movement from lower-level infrastructure management to turn-key system services
may give the impression that these various cloud models provide a comprehensive
platform for deploying enterprise-scale applications. Such is not always the case.
Application designers still have multiple areas of life cycle management and performance
management to consider when deploying such applications.

Application Design and Application Architecture


Application designers and architects use cloud infrastructure and services to deliver
business services to customers, employees, collaborators, and other stakeholders. IaaS and
PaaS clouds provide building blocks upon which to design and deploy custom applications.
SaaS providers offer turnkey solutions that can either solve a standalone problem or fit into
a larger workflow that incorporates SaaS applications with other systems. This situation
leaves many choices for software designers and architects. Rather than try to examine all
design issues, this section will consider four representative issues that can help offer a
sense of challenges that designers and architects face. This discussion will examine issues
relevant to IaaS and PaaS users; software designers and architects developing SaaS
applications have to contend with these issues, but SaaS customers do not.

The four issues considered here are:

Server failover
Application server replication
Content caching
Network optimization
Cloud providers offer solutions, or building blocks for solutions for some of these but as we
will see, there is no single approach that will work for all plausible use cases.

Designing for Server Failover


Highly reliable systems are designed to continue functioning in the event of a failure in one
or more of the components. Failover techniques are applied at virtually all levels of design.
Physical servers may have multiple power supplies to ensure the server continues to
function even if one of the power supplies fails. Storage devices use multiple disks in
various RAID configurations to mitigate the risk of data loss due to a hardware failure in
the disk drive. Moving up in systems complexity from individual servers to clusters of
servers, you can see additional mechanisms for promoting resiliency.

90
The Definitive Guide to Cloud Acceleration Dan Sullivan

A server-level failure could take an application offline unless there is a mechanism in place
to shift the workload from the failed server to a functioning server running the same
application. One simple way to do this is to deploy a cluster of servers running the same
software and route traffic to those servers through a load balancer. The load balancer
distributes the load across servers in the cluster. If the load balancer detects a failure in one
of the servers (for example, the server does not respond to a ping request), the load
balancer can route traffic to other servers in the cluster until the failed server is back
online.


Figure 5.5: A load balancer provides a simple but often effective form of failover in a
cluster of servers.

Application Server Replication


Enterprise application architects might have to consider failures at higher levels of
organization than even clusters. For example, if a natural disaster, systematic software
error, or other catastrophic failure prevents access to a data center or slows network traffic
to unacceptable levels, server-level redundancy and cluster-level failover strategies will be
insufficient to address this risk.

91
The Definitive Guide to Cloud Acceleration Dan Sullivan

Enterprise applications may require redundancy across data centers. If application servers
within a single data center are unavailable, traffic to those servers can be routed to an
alternative data center. This situation is not ideal, of course. Depending on the distance,
peering arrangements between ISPs, and other network configuration issues, there may be
longer latency for users working with a distant data center. The additional workload on the
application servers in the redundant data center might also slow application response time.
Adding more servers to the application server cluster may help alleviate some of this
problem, but there may be a limited number of servers available to add to the cluster. Such
might be especially true if a large data center is experiencing a failure and multiple
enterprise applications are shifting workloads to the redundant data center.


Figure 5.6: Multiple data centers can provide failover recovery in the event of a
catastrophic failure at the data center level (Image source: CDNetworks).

The first thoughts about failover recovery may be focused on application servers and
ensuring that backup servers are available. These items are a necessary part of failover
recovery but are not the only crucial components.
In addition to application servers, data must be accessible to the failover servers. When a
single server fails in a cluster, the other servers will still have access to data on storage
arrays in the data center. To ensure the ability to failover between data centers, you must
make sure data is replicated between the data centers.

One of the considerations in designing a failover strategy is determining how frequently to


synchronize data between the data centers. More frequent updates reduce the risk of losing
data, but the cost is additional, perhaps significant, traffic between data centers. You can
extend the window of time between synchronization operations to reduce network traffic.
The maximum amount of potential data loss is the amount of data that is created or
modified within that window of time. In the worst-case scenario, a data center failure
would occur just prior to a synchronization operation. Network optimization techniques,
such as those described in earlier chapters, can help reduce the time and traffic required to
replicate data between data centers.
Replication is an important element of failover recovery but another type of replication,
content caching, is an important part of many application designs.

92
The Definitive Guide to Cloud Acceleration Dan Sullivan

Content Caching
Application designers can take advantage of the properties of static content to improve
application performance. Static content is any type of data that can be generated and stored
for use at some time in the future. This type encompasses content ranging from Web pages
to data files that rarely change. Static content changes infrequently, so there is minimal risk
to maintaining multiple copies.
Caching is an efficient strategy for a number of reasons. By keeping a copy of static content
in multiple locations, users can receive data from the closest location and therefore
typically reduce latency. Static data is stored in a local cache after the first time it is
accessed.


Figure 5.7: Static content rarely changes, so once it is retrieved from a distant server,
it can be cached for future use by other users of the closer data center.

Application designers do not have to devise a metric to determine the content most likely
to be requested. Instead, only the content that has been requested is cached. Consider a
multi-language Web site. The number of users requesting content in Finnish is likely to be
low outside of northern Europe. The data center serving that region will likely have cached
copies of that content while data centers in other parts of the global probably will not.

93
The Definitive Guide to Cloud Acceleration Dan Sullivan

The amount of memory dedicated to caching will determine the upper bound on the
amount of static data that can be a cached at one time. An effective way to work within this
limit is to track the last request time for each content object. Older objects that have not
been requested are good candidates for removing from the cache without adversely
impacting performance. News stories or major announcements, for example, may be
popular for a time but eventually are surpassed in popularity by other newer stories or
announcements.
Removing the least-used content is just one strategy for managing caches. One could
consider the size of an object and develop a weighting scheme that favors retaining smaller
objects. The idea here is that removing a single large object would allow multiple smaller
objects to be stored. Other factors, such as the number of times an object is requested over
time, can also be factored into the cache management algorithm.

As useful as caching is, it does not address the need to improve network performance when
transmitting dynamic content.

Network Optimization
Application architects have to consider many aspects of application performance and
reliability. Network optimization is especially important for enterprise and globally used
applications.
Consider a hypothetical scenario: An organization is deploying a new financial market
monitoring application to the cloud. The application will have users in North America,
Europe, and Asia. The application collects nearreal-time data from institutions across
three continents. Up-to-date information is vital to the users of this application, so cached
data will not meet requirements. A commodity trader in Chicago, for example, might want
the latest information on commodity prices in Hong Kong. A cached data set that is 2 hours
old is essentially useless to the trader in Chicago. Or, it could be worse than useless. Making
a decision based on out-of-date information could lead to costly transactions that could
have been avoided with timely data.

As data must move between global data centers on an as-needed basis, network
optimizations are key considerations. Network optimizations can include:

TCP parameter optimization


Reducing overhead associated with retransmitting dropped packets
Increasing data window sizes to reduce overhead communications
These optimizations operate at the implementation level of TCP. Other strategies can help
as well. Compression can be used to reduce the total payload size, although there is the
additional cost of compressing the data at its source and decompressing it at the
destination.

94
The Definitive Guide to Cloud Acceleration Dan Sullivan

In addition, organizations or their network providers may negotiate better quality of


service (QoS) guarantees. When working with the Internet on a global scale, you need to
consider how peering agreements between ISPs will impact performance. Providers with
high-quality peering agreements can offer customers better performance in some areas
than those who depend on lower-capacity, slower ISPs.

Application architects consider many factors ranging from server reliability and failover to
content caching and network optimizations. As more organizations adopt cloud computing,
infrastructures can help to consider how content delivery networks can complement cloud
service providers.

Content Delivery Networks Complement Cloud Providers


Cloud service providers offer an array of infrastructure, software, and services for use in
enterprise applications. As this chapter has discussed, these address many but not all the
needs of high-demand Web applications. Fortunately for systems architects and application
designers, content deliver networks can complement the services provided by cloud
vendors in several key areas:

Distributed caching
Network protocol optimization
Application delivery network services
Secure content delivery
Support for content life cycle management
Distributed caching as noted earlier helps to improve the performance of applications
serving static content while network protocol optimizations help with dynamic content
applications. Managing distributed applications is a complex and demanding process. Cloud
providers have some basic controls and platforms to support distributed application
management, but content delivery networks that also include application delivery services
can offer additional services.

Discussions about enterprise computing and public clouds often include concerns about
security and compliance. Here again, content delivery network providers can complement
and supplement the services of public cloud providers with enterprise support for secure
content delivery and support for the full life cycle of content management.

The next chapter will examine issues related to choosing a content network delivery
service. As with other IT projects, making use of public clouds, content delivery services,
and application delivery services requires planning and careful attention to integration
issues.

95
The Definitive Guide to Cloud Acceleration Dan Sullivan

Chapter 6: How to Choose a Cloud


Application Acceleration Vendor
Throughout, this guide has examined the challenges to delivering services from the cloud,
with particular attention to the design of Web applications and the architecture of the
Internet. Providing high-performance applications to a geographically distributed user
base is a challenge for several reasons, including the physics and engineering constraints of
networking as well as the logical design of Internet protocols. Previous chapters describe
techniques to increase throughput and reduce latency using a combination of distributed
content servers and higher-performance protocols. This chapter turns the focus to
evaluation criteria for selecting a cloud application acceleration provider.
Cloud application acceleration can improve the performance of your applications but only
if essential components and functionality are in place. In addition to strictly performance-
related criteria, it is important to consider factors such as content life cycle management
and security. This chapter is organized into several sections, each of which addresses a key
evaluation area:

Global reach
Dynamic content acceleration
Security
Architecture considerations
Key performance metrics
Technical support
Key business considerations
These evaluation areas are generally applicable to enterprise applications, but some topics
might be more important than others. Application and organization requirements should
determine the weight applied to each of these areas. For example, if your primary concern
is increasing the performance of analytic applications used by customers across the globe,
dynamic content acceleration is more important than is static content caching. Consider the
suite of applications your organization is supporting as you determine the relative
importance of each of these areas. Also keep in mind strategic plans and their implications
for system design. You might find that you have or will likely have a combination of
applications that could benefit from cloud application accelerations services.

96
The Definitive Guide to Cloud Acceleration Dan Sullivan

Global Reach
Global reach in the context of cloud application acceleration has both a technical and a
business dimension. Consider both during your evaluation.

Technical Dimension of Global Reach


Companies turn to cloud application acceleration and content distribution network
providers to improve the user experience for application users. Ideally, all users would
have a high-quality experience using a responsive system with high availability, low
latency, and accurate, up-to-date content. Users in North America accessing content that
originates in Europe should have the same approximate experience as Europeans accessing
the same content. Content delivery networks and network acceleration on a global scale
are both required to meet this objective.
Content delivery networks are geographically distributed servers that maintain local copies
of content that originates elsewhere. For example, a server in the eastern United States
might host a content management system (CMS) with static content used in an
organizations web site. Customers from the eastern US and Midwest will likely receive this
content with low latency. Users in the western parts of the country might experience
somewhat higher latency than their counterparts in the East. Users in Asia, however, would
have to wait significantly longer for a static Web page because transferring content across
North America and the Pacific Ocean will take significant time.
As you evaluate your need for global reach, consider your existing and potential user base.
Will your organization be implementing new strategies and services in Europe and Asia? If
so, how can a content delivery network support those services? What is the average latency
from the edge servers that cache content from originating servers to different parts of your
market (see Figure 6.1)?


Figure 6.1: Content delivery networks require globally distributed edge servers to
support delivery from caches closer to end users.

97
The Definitive Guide to Cloud Acceleration Dan Sullivan

Business Dimension of Global Reach


In addition to the technical aspects of caching content in edge servers around the globe,
there are legal and cultural considerations to keep in mind. Governing bodies around the
world have varying restrictions on Internet content and their own requirements with
regards to registering and licensing Internet service providers (ISPs). These regulations
might be minimal and easily addressed in consultation with local legal counsel. At the other
end of the spectrum, you might find frequent and ongoing government regulation and
monitoring. China, for example, has established relatively strict controls on the Internet
within the country (see Figure 6.2).


Figure 6.2: Popular sites outside of China and those posting content deemed
inappropriate might be blocked by the Chinese government. The censoring
infrastructure is commonly known as the Great Firewall of China.

98
The Definitive Guide to Cloud Acceleration Dan Sullivan

Consider how content delivery network and network acceleration providers can assist you
with navigating local regulations, complying with operational restrictions and responding
to orders from the government. As with many other business services, organizations might
consider developing in-house expertise to manage these issues. This choice is reasonable in
some casesfor example, when you have a long history of business in the country, have in-
depth knowledge of legal and cultural issues, and understand legal procedures in the
country. When the cost of developing expertise in local matters outweighs the benefits, it is
appropriate to consider how your content delivery network provider and network
acceleration provider might be able to assist you with local matters.
Global reach encompasses both technical aspects, such as the distribution of edge servers
and the ability to accelerate network traffic over global distances, and business aspects,
such as support for complying with local regulations around the globe. Lets next turn to a
more in-depth look at several technical areas.

Dynamic Content Acceleration


There are many types of applications that lend themselves to protocol optimizations
instead of caching. These include order-processing applications, such as airline ticket
reservation systems, and ad hoc query applications, such as business intelligence systems.


Figure 6.3: The middle mile is the intermediate network infrastructure between
edge networks.

99
The Definitive Guide to Cloud Acceleration Dan Sullivan

When evaluating dynamic content acceleration techniques, consider how your provider
implements acceleration. For example, traffic in the middle mile of the network between
two edge servers can be optimized because the acceleration network provider controls
both endpoints. The servers can negotiate protocol settings that reduce the number of
packets that must be resent when packets are dropped and set other TCP configuration
parameters to reduce overhead on the network.
The distribution of edge servers is also a key consideration in evaluating dynamic content
acceleration. Edge servers should be distributed in ways that reach large portions of the
user base, mitigate the impact of poor performance peering agreements between ISPs, and
maintain reasonable loads on the servers.

The implementation choices made by cloud acceleration providers are important because
they can impact key application requirements, including:

High availability
Faster application performance
Improved end user experience

High Availability
Customers expect business Web sites and applications to be available 24x7 in spite of
hardware failures, network problems, and malicious cyber-attacks. Cloud acceleration
providers can help mitigate the risk of unwanted downtime by providing high-availability
hardware and networks.
Hardware fails. Todays large-scale data centers house large numbers of servers and
storage devices and therefore it is reasonable to assume that at least one component in a
large data center will fail in production. Cloud acceleration vendors can provide for high-
availability Web sites and applications with a combination of failover clusters, redundant
storage, and multiple data centers.
When high availability is a requirement for an application, that application may be run in a
cluster of servers. In some cases, all servers in a cluster share the application workload and
if one fails, the other servers will continue to process the workload. In other cases, a single
server may process the full workload while a stand-by server is constantly updated with
the state of the primary server. If the primary server fails, the stand-by server takes over
processing the workload. Cloud providers should support the appropriate type of high-
availability configuration appropriate for your applications.
Storage systems are also subject to occasional failures. Redundant storage systems improve
high availability by reducing the chances that a storage failure will lead to application
downtime.

100
The Definitive Guide to Cloud Acceleration Dan Sullivan

Large-scale network problems and natural disasters can result in large-scale disruptions to
a data center. Cloud acceleration providers with multiple data centers can continue to
provide access to applications by routing traffic away from the disrupted data centers to
other data centers hosting the same content or able to run the applications that had been
available from the disrupted data center.

Faster Application Performance


The responsiveness of a Web application is determined, in part, by network latency. If large
numbers of packets must be exchanged between a server and a client, application
performance can suffer. TCP and HTTP protocols can both require multiple round-trip
packet exchanges between clients and servers. Dynamic acceleration technologies can
reduce network overhead by optimizing the number of packets exchanged between client
devices and servers.
Network acceleration is particularly important in the middle mile, which can constitute
the longest segment between a server and a client device. Reducing the amount of time
required to exchange data on the longest segment can substantially reduce the time users
are waiting for a response from a server. Also, network acceleration systems can maintain
pools of TCP connections that can be reused without incurring the overhead of creating a
new TCP connection.

Better End User Experience


The most important beneficiaries of Web application acceleration are end users. Users of
ecommerce applications will find that dynamic acceleration improves responsiveness and
allows users to complete transactions more efficiently. Analysts working with data
warehouses and business intelligence applications can perform more analytic operations in
less time when network traffic is accelerated.

Dynamic content acceleration allows businesses to provide highly available, responsive


Web applications that deliver an improved end user experience.

Security Considerations
Security is a broad topic that encompasses confidentiality, integrity, and availability of data
and applications and applies to virtually all aspect of information technology. Cloud
acceleration is no exception. Several topics are of particular importance with regards to
content distribution networks and dynamic content accelerations:

Secure Sockets Layer (SSL) encryption and cloud acceleration


Distributed Denial of Service (DDoS) protection
Data security
Authentication

101
The Definitive Guide to Cloud Acceleration Dan Sullivan

SSL Encryption and Cloud Acceleration


SSL, predecessor of the newer Transport Layer Security (TLS), is the commonly used
protocol for encrypting data transferred over the Internet. It is of particular importance in
protecting the confidentiality of content as it is transmitted.

As you plan to deploy content delivery networks and dynamic content acceleration,
consider which content should be encrypted during transmission. If you organization has a
data classification system in place, that system can inform you about the types of content
that might need to be encrypted. For example, data subject to government or industry
regulations may require strong encryption anytime it is transmitted over the Internet. In
other cases, data classified as publicthat is, data that if released to the public would not
cause any harm to the organizationcan be transmitted without encryption.

Need for Encryption vs. Cost


One way to address the issue of deciding which data to encrypt is to simply encrypt all data.
This approach sounds prudent at first glance. After all, better safe than sorry is one way
to address security questions. The problem is that this approach does not take into account
the cost of encrypting data.
SSL encryption uses a combination of two encryption techniques: asymmetric
cryptography and symmetric cryptography. Both use keys to encrypt and decrypt data.
Symmetric key cryptography uses one key while asymmetric cryptography uses two.
Symmetric key cryptography is the less computationally demanding of the two methods
but because only one key is used, partners exchanging encrypted data have to share a
common key. Transmitting an encryption key in unencrypted form is risky and can lead to
a compromised key. Asymmetric cryptography is computationally expensive but has the
advantage of not requiring a shared key.

Accelerating Encryption
SSL takes a best-of-both-worlds approach and uses both asymmetric and symmetric key
cryptography. Asymmetric cryptography is used during the SSL handshake (see Figure 6.8)
when two devices are establishing an encrypted session. During the handshake, the two
devices exchange information about the algorithms and other parameters that each
supports. Asymmetric techniques are used, so this communication can occur over a secured
channel. During the handshake, the devices exchange a symmetric key that is used to
encrypt data for the rest of the session.
Encrypting data, especially during the handshake using asymmetric encryption, is
computationally demanding. Encrypting large volumes of data over many different sessions
can place heavy demands on CPUs. Edge servers providing access to encrypted data can
benefit from SSL acceleration.

SSL acceleration is implemented by specialized hardware designed to offload computation


from server CPUs. Content delivery network providers may or may not provide SSL
encryption, so be sure to evaluate support for SSL acceleration if you plan to use SSL to
protect the confidentiality of your data.

102
The Definitive Guide to Cloud Acceleration Dan Sullivan

Distributed Denial of Service (DDoS) Protection


Businesses and other organizations face an array of security threats; one of the most
challenging is the Distributed Denial of Service (DDoS) attack. As the name implies, the
object of the attack is to disrupt services provided by legitimate Web sites and applications
so that they are not available to customers and users. There are a few common types of
DDoS attacks, but they all involve overwhelming servers with malicious network traffic.

The Structure and Function of a DDoS Attack


There are multiple ways to implement DoS attacks, including flooding servers with
network traffic and disrupting DNS functions. A simple type of DDoS attack sends large
volumes of SYN or other types of TCP packets that prompt the receiving server to open
connections. When a legitimate connection is open, the client device that initiated the
connection will respond after it receives an acknowledgement packet from the server (see
Figure 6.5).


Figure 6.4: In a typical TCP handshake, a client initiates a connection with a SYN
packet, the server responds with a SYN-ACK and the client then responds with an
ACK.

103
The Definitive Guide to Cloud Acceleration Dan Sullivan

In the case of a malicious attack, the client does not respond and the server is left holding
the connection open while it waits for a response. Eventually, the connection will time out,
but the during that time, the attacker will have issued other connection requests ultimately
consuming all connection resources. As a result, legitimate traffic is unable to establish
connections to the server (see Figure 6.5).


Figure 6.5: In a malicious DoS attack, the clients flood the server with SYN packets
leading to corresponding SYN-ACK, which are unacknowledged by the attacker. As a
result, the server waits for acknowledgement while connection resources are
consumed for non-functional connections.

DDoS attacks use a collection of compromised devices, known as a botnet, to flood a target
server (see Figure 6.6). The compromised devices have been infected with malware that
allows the attacker to issue commands to the compromised computers. These commands
specify the type of attack to launch and the target server. In addition to the compromised
computers that are flooding servers with malicious traffic, botnets often include multiple
command and control servers. The person controlling the botnet, known as the bot herder,
communicates with command and control servers, which in turn communicate with
compromised devices.

104
The Definitive Guide to Cloud Acceleration Dan Sullivan

One way to disrupt a botnet is to shut down or isolate the command and control server so
that it can no longer issue commands. Botnet designers have recognized this potential
single point of failure and have developed techniques to support multiple command and
control servers. If one is identified and taken offline, another can assume the
responsibilities of communicating with compromised devices. As a result, botnets are
resilient to attacks on their infrastructure.


Figure 6.6: Botnets are distributed systems controlled by multiple command and
control systems, making them difficult to disrupt by taking down bots or command
and control servers.

Targets of DDoS Attacks


DDoS attacks are simple but effective. Government agencies, financial institutions, and
retailers have all been victims of DDoS attacks. The impact on businesses is multifaceted.
Retailers suffer immediate adverse effects, including lost sales. All organizations may suffer
damage to their brand as users are unable to conduct normal business operations with the
victims of DDoS attacks.

105
The Definitive Guide to Cloud Acceleration Dan Sullivan

One of the reasons for the growing threat of DDoS attacks is that they are relatively easy to
launch. Information on how to launch a DDoS attack is readily available online. DDoS
application code is available as well. Even those without the technical skill to implement
their own attack can find DDoS service providers on the cybercrime black market who
have their own DDoS infrastructure and launch attacks for others.

Responding to DDoS Attacks


One method to respond to the threat of DDoS attacks is to add infrastructure to absorb an
attack. This approach is not practical. Attackers can launch attacks consuming 40 to 60GB
of bandwidth. They have access to multiple compromised devices, many of which have
high-speed Internet connections, so it is a fairly simple matter to scale up the size of
botnets to consume all available network and server resources at the target site.

A better method of responding to DDoS attacks is to use DDoS absorption techniques, as


Figure 6.7 illustrates. DDoS absorption systems are network devices that analyze traffic
and detect patterns indicative of a DDoS attack. Malicious packets are filtered before they
reach production servers on a network. Depending on the type of attack, network
engineers might use data from the DDoS absorption systems to determine whether some
types of requests should be blocked or blacklisted.


Figure 6.7: DDoS absorption blocks malicious DoS traffic before it reaches
application servers.

A cloud acceleration provider should support DDoS attack mitigation. In addition to DDoS
absorption devices, procedures should be in place to notify network engineers of an attack,
provide detailed data about the attack, and support additional mitigation measures, such as
using alternative data centers to maintain access to Web applications.

106
The Definitive Guide to Cloud Acceleration Dan Sullivan

Data Security
Maintaining confidentiality of data with SSL and ensuring availability of applications with
DDoS attack mitigation technologies are two key security considerations when evaluating
cloud acceleration providers. A third key consideration is ensuring data security with
regards to government and industry regulations.
Businesses and government agencies may be subject to multiple data protection
regulations such as the Sarbanes Oxley (SOX) Act, the Health Insurance Portability and
Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI DSS),
and others. These regulations have specific requirements for protecting the confidentiality
and integrity of data. Cloud acceleration providers should offer sufficient controls to allow
customers to meet regulatory requirements.

Although regulations vary in their requirements, there are common characteristics such as
the need for access controls, confidentiality measures, and the ability to demonstrate
compliance. As part of your evaluation, verify cloud acceleration vendors are in compliance
with relevant regulations for your business.

Authentication
Authentication is one area of security that sounds fairly straightforward but can be riddled
with organizational challenges. Authentication is the process of verifying the identity of a
user prior to granting that user access to data, applications, and systems. This process
requires an authentication infrastructure that supports:

A database of users and identifying information


Authentication mechanisms, such as passwords or digital certificates
Two-factor authentication, in some cases
Life cycle management services
Content served from a content delivery network may require authentication controls. If
you have such a requirement, consider how the content delivery network authenticates
users and whether it meets your requirements and allows you to manage access controls
with reasonable administrative overhead.

Architecture Considerations
The architecture of the content delivery network and dynamic content acceleration
network is another area to assess when evaluating potential providers. Consider edge
server locations, protocol optimizations, and key performance metrics.
The location of edge servers helps to shape the overall performance of your applications.
Edge servers that are in close physical proximity to users will help reduce latency because
packets have shorter distances to travel. Edge servers on networks with high-performance
peering agreements are less likely to be subject to degraded performance when using other
ISPs networks.

107
The Definitive Guide to Cloud Acceleration Dan Sullivan

Protocol optimizations are especially important for dynamic content acceleration. TCP has
changed over time to support several types of optimizations that can improve throughput.
These optimizations can benefit both static and dynamic content because they are typically
applied to network traffic between edge servers.
Together, the location of edge servers and protocol optimizations can improve the overall
performance of your applications. To quantify those improvements, look to multiple key
performance metrics:

Reduced latency or load time


Reduced HTTP request failures
Improved throughput
Reduced origin load
The provider should offer analysis and reporting tools that make these and other key
performance indicators readily available. In the event that performance degrades or you
experience other problemsfor example, disruption in encryption services because of a
problem with an SSL certificatethe provider should be in a position to offer support
24x7.

Key Business Considerations


A cloud application acceleration provider will become a partner in delivering access to your
applications and data. Business considerations should be evaluated along with technical
considerations. Key business areas include:

CostConsider startup costs, such as design, consulting, and new equipment, as


well as ongoing operational costs. Also consider any costs associated with changing
services or configuration.
ReliabilityFailures in content delivery network or dynamic content delivery
services could lead to failures in delivering your services. Consider providers past
performance, service level agreements (SLAs), and compensation for downtime.
Also carefully review how downtime is calculated and the process for submitting
claims.
ScalabilityOne of the advantages of cloud computing is the ability to rapidly scale
up the number of servers used to deliver application services. If, however, the
network cannot scale as well, the benefits of cloud scalability can be undermined.
SecurityThe availability of applications and the integrity and confidentiality of
your data are key security considerations.
User experienceHow is the user experience altered by deploying your application
through a content delivery network and accelerated network? Besides reduced
latency, are there other changes to the way applications perform that could
substantially alter the user experience?

108
The Definitive Guide to Cloud Acceleration Dan Sullivan

Deployment timeHow quickly can you reach new geographic regions or increase
scalability in a region?
Management supportHow does the provider support your management of the
network? Is technical support available 24x7?
Analysis and reportingWhat types of reports are available to help you manage
ongoing operations? Are reports sufficient to support compliance with regulations?
Ideally, your provider will be able to assume the role of expert for managing
implementation details of the content delivery network. In some cases, they can also
become intermediaries dealing with government regulations. Of course, these technical
considerations are all designed to support core business requirements related to meeting
customer needs and expectations.

Impact of Slow Applications


Businesses deploying Web applications are facing a combination of two factors that could
adversely affect their application performance: increasingly complex Web sites and a
geographically distributed customer base.

Adverse Customer Experiences


Web designers are making use of development tools and techniques designed to improve a
users experience. Features that were once restricted to desktop applications are now
available in Web applications. This improvement is welcome but it often comes at the cost
of increased application complexity. Complexity, in turn, can lead to longer page load times.
Research indicates that users expect a response within about 3 seconds. Sites that
experience longer load times can anticipate users will abandon the site at rates significantly
higher than those sites that maintain a sub-3 second average response time.
In addition, expanding your customer base into new geographic areas should be a positive
experience, but high latency in Web applications can undermine your ability to deliver a
suitable customer experience. Poor page loading performance can also impose a drag on
innovation. Developers and designers may be hesitant to add features that enhance a user
experience but would further prolong page load times. Lack of innovation over time can
lead to sites that appear dated and lacking functionality. At the same time you are
constrained in your options because of poor application performance, competitors may be
adding features such as real-time inventory lookup, videos, optimized content, and
interactive features. Baseline expectations of Web applications are constantly evolving, and
high latency and other performance issues can inhibit your ability to continue to meet
those changing expectations.

109
The Definitive Guide to Cloud Acceleration Dan Sullivan

Ultimately, poor application performance translates into lower revenues. For example, a
study by the Aberdeen Group found that a 1-second delay in page loading led to an 11%
drop in page views and a 7% loss in sales. Even when customers finish a transaction on a
poorly performing site, their chances of returning drop significantly. 79% of customers are
less likely to purchase from a vendor in the future if they are dissatisfied with the Web
sites performance.
Accelerating application performance allows application designers to continue to innovate
and deliver quality user experiences. Just as important, it provides the means to maintain
performance required to reduce the risk that customers will abandon shopping carts,
switch to competitor sites, or otherwise abandon an application or site.

Summary
Delivering applications to a global user base is challenging. You will face technical
difficulties as well as cultural issues. Business needs are driving the adoption of cloud
acceleration to improve the overall performance of applications. Technical considerations
are best addressed with a combination of data centers, content delivery networks, and
dynamic content acceleration techniques. As this chapter has outlined, there are multiple
considerations to evaluate when assessing content delivery network providers. The
importance of particular considerations will vary according to your specific business
requirements. Considering the full range of technical, business, and cultural issues you face
in delivering content to a global user base will help you evaluate your content delivery
network and cloud application acceleration options.

110

You might also like