Proximity-Aware Cloud Selection and Virtual Machine Allocation in Iaas Cloud Platforms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2013 IEEE Seventh International Symposium on Service-Oriented System Engineering

Proximity-aware Cloud Selection and Virtual


Machine Allocation in IaaS Cloud Platforms
Hangwei Qian Qian Lv
VMware, Inc. Western Digital, Inc.
Palo Alto, CA, USA Irvine, CA, USA
[email protected] [email protected]

Abstract—Cloud platforms deploy data centers geographically ing to both the various billing models and the characteristics
distributed around the world and application providers select of the applications. For example, storage-intensive and CPU-
cloud infrastructures and allocate virtual machines based on the intensive applications would have different preferences under
requirements of their applications. While existing works focused
on the comparison on cloud services, we argue that proximity different billing models.
also plays a very important role when deciding in which cloud Also, we need to make sure that the performance of the
infrastructures to deploy the applications. In this position paper, applications is able to meet the quality of service (QoS)
we take into account the location of cloud infrastructures, requirement. Multiple elements would affect the performance
application clients and related applications, and the interaction of the applications, such as the capacity of the servers, storage
among application components as well as the deployment cost
and propose to build a system to automatically select the cloud systems and networks of the cloud infrastructures. One im-
infrastructures for customers. Meanwhile, we also discuss some portant aspect that did not receive much attention but actually
issues and give suggestions when allocating the virtual machines would impact the performance significantly is the proximity.
within cloud infrastructures. When Internet clients access the applications deployed in the
cloud infrastructures, the network latency and bandwidth play
I. I NTRODUCTION
very an important role in the client-perceived QoS. According
Cloud computing has obtained tremendous attention in to [30], there is strong inverse correlation between distance
recent years and is pushing the IT industry into a new era. and bandwidth. Thus, selecting the cloud infrastructures that
According to Gartner, the worldwide market will be worth are close to clients can potentially reduce the network latency
$176.8 billion by 2015 [34]. One promise of cloud computing and increase the available bandwidth during the interaction
is that cloud providers (CPs) deploy multiple cloud infras- between clients and servers significantly. Moreover, modern
tructures (data centers, or DCs) distributed geographically applications often consist of multiple components geograph-
around the world so that applications are able to reach the ically distributed around the world, making the selection of
global footprints according to their business demands, known cloud infrastructures more challenging.
as infrastructure as a service (IaaS). For example, Amazon Since the scale of the problem can be very large, we also
deploys data centers at North America, Europe and Asia [24]. need to address the scalability problem. As mentioned above,
Nowadays, many IaaS cloud providers exist in the industry, we need to consider multiple aspects in the selection of the
such as Amazon, Google, Microsoft, RackSpace, to name a cloud infrastructures, making it a multi-objective optimization
few, and more would expect to emerge in the future. Given problem. Given so many data centers and multiple application
the large number of data centers, how can customers (to components, we need to find efficient algorithms to make the
avoid the ambiguity, we refer the application providers as the best decisions. More discussion is available at Section III-C.
customers of cloud providers, and the ones that send requests Finally, our system also considers the virtual machine (VM)
to applications as the clients of the applications) decide which allocation within a data center. After deciding which data
ones to choose to deploy their applications? Existing works, center to place a component, application providers still need to
such as Cloudcmp [22, 23], focused on the comparison of consider other problems, such as what kind of virtual machines
the cost and performance of the cloud services to guide the to allocate and how many of them are needed so that they can
selection of the cloud infrastructures. Complementary to these meet the performance requirements and minimize the cost,
works, in this position paper, we outline a system which etc. In this paper, we also discuss these issues and provide
automatically select the cloud infrastructures for customers so suggestions for application providers.
that they can get rid of the complex selection task completely. In short, in the position paper, we introduces a system to
In order to be able to select the best cloud infrastructures automatically select the cloud infrastructures and help allocate
automatically, many challenges exist. First and foremost, we virtual machines for customers, and make the the following
should try to reduce the cost of application deployment as contributions:
much as possible. Different cloud providers have different • We discuss the challenges in the automatic selection of
billing models. We should make the selection decision accord- cloud infrastructures and investigate preliminary solutions

978-0-7695-4944-6/12 $26.00 © 2012 IEEE 403


DOI 10.1109/SOSE.2013.54
to these challenges. client information can also be obtained from DNS logs. For
• We take the location of cloud infrastructures, application applications that are newly developed, it might be hard to
clients and related applications, as well as the interaction know the client distribution precisely. In this case, applica-
of application components into account when selecting tion providers can make the estimation according to their
the cloud infrastructures, which has not been done before business relationships and characteristics of the applications
to our knowledge. at their best effort. For example, the clients of California
• We discuss issues and provide suggestions when allocat- state government web sites usually come from California.
ing virtual machines within the cloud infrastructures. Also, it is now relatively easier to migrate the applications
from one location to another with virtualized technology after
II. P ROXIMITY the precise client distribution information is available later
Qian et al. [30] found that generally the closer the two on. For example, applications can be wrapped up in Open
communicating entities are, the less the network latency and Virtualization Format (OVF) and transferred and deployed at
the higher the bandwidth are between them. This is because other cloud infrastructures
requests travel less hops in the communication. Also, lower Another question is, how to organize the client information
round trip time between close entities allows more data to effectively? Due to the huge number of Internet users, this is a
be sent out during same period of time for TCP. In addition, non-trivial task at all. We use clustering technique to solve the
given the smaller number of hops between close entities, it is problem and divide the Internet clients into client clusters, or
less likely that congested links exist at the data routing path CCs. Within each client cluster, clients are close to each other,
assuming equal chance for each link to become overloaded. and thus can be represented as a single entity. We can group
When selecting the cloud infrastructures for an application, the clients into clusters based on the network-aware prefixes
we need to consider the location of cloud data centers, the [21], referred to as network-aware clustering. After knowing
location of Internet clients of the applications, the relationship the client IP addresses, we can extract their network-aware
of the components of the application as well as the location prefix information from the Border Gateway Protocol (BGP)
of the related applications. In the following subsections, we tables and put the clients with the same prefix into the same
discuss them in details. cluster. Note that, since this method needs to know the client IP
addresses, it can only be applied for existing applications that
A. Data Center Distribution have been deployed already and have the client information in
As mentioned in section I, cloud providers often deploy the logs.
many data centers around the world, which can greatly meet Alternatively, we can cluster the Internet clients according
the requirements of distributed applications, such as stream to the geographic locations, which we call regional clustering.
services (like Netflix, which has already utilized Amazon’s The benefit of this method is that it can be employed by
cloud platform), content delivery networks (like Akamai) and all the applications. For applications that have the client
distributed database systems, etc. Knowing the data center information, we can extract their IP addresses and get their
distribution of the cloud providers is a necessity for the success location information from the GeoIP databases, such as [27].
of the cloud infrastructure selection. For applications that do not have the client information yet,
We made a preliminary investigation of the data center application providers can provide geographic location informa-
distribution of the main cloud providers through the search tion based on estimation, such as from which cities, or states
on the Internet, which is shown in table I. Due to the limited the clients come from, etc.
access of information, it can be considered as the low bound of In addition, it is necessary to know the distances between
the data center distribution and is subject to change. From the the cloud infrastructures and the client clusters after obtaining
table, we can see that different cloud providers have different the location information of them. If we use the network-aware
policies to deploy their cloud infrastructures, and all the major clustering technique, then the distance would be the network
cloud platforms span multiple continents. Note that, for each of latency. We can simply allocate a dummy virtual machine
cloud provider in the table, we have the references to the links at each data center and get the network distances from each
by which readers can get more information of data centers. data center to all the client clusters as well as the distances
among the data centers by ”ping” command. Since the distance
B. Client Distribution information is relatively very stable, this kind of measurement
Knowing where client requests are from enables application can be conducted very infrequent, say, every month. Also,
providers to select cloud infrastructures that are close to their the dummy virtual machines are only used for the simple
clients. One question is, how can the application providers measurement, and do not have much resource consumption.
know where the Internet clients come from? For applications Thus this method is very cheap. There are also many existing
that exist already and are being transferred to cloud platforms, techniques for measuring or estimating the distance between
application providers can mine the client information from hosts and can also be applied in our system, like [15] and [17].
the application server logs, which contain not only the client When using the regional clustering, the distance would be
IP addresses, but also other important information, like how the Euclidean distance between the entities. Note that, here
frequently the clients access the applications. Alternatively, the Euclidean distance is used to approximate the network

404
Cloud Provider Data Center Number Spanned State Number Spanned Country Number Spanned Continent Number
Amazon [24] 15 15 8 3
Google [28] 36 27 12 4
Microsoft [3] 6 6 3 3
Rackspace [10] 8 8 4 3
AT&T [7] 37 24 11 4
Terremark [9, 14, 6] 43 26 17 4
Joyent [8] 4 4 2 2
TABLE I
DATA C ENTER D ISTRIBUTION

distance, which is applied in many real systems, such as at the private cloud and only computationally expensive and
”Google Local” and ”Microsoft Live” [33]. less security-sensitive components are placed the at the public
cloud infrastructures.
C. Component Relationship In order to capture the relationship among the applica-
Modern applications usually consist of multiple components tion components, we can construct the interaction pattern
interacting with each other. For example, it is common practice graph, G=(V,E), of the application. Each component has a
that web applications have multi-tier architectures, including corresponding node in the graph G. When one application
web servers, application servers and back-end databases. At component C1 interacts with another component C2, an edge
each tier, there might be multiple replicas for high service e(C1, C2) would be added between the C1 and C2. For each
reliability and scalability. Furthermore, there are some highly edge, there would be a weight associated with it, which is the
distributed applications that have many more components and volume of interactions between the two components in order
provide the services to Internet clients across the world. For to serve a single client request. The bigger the weight is, the
instances, content delivery networks (CDNs), such as Akamai more we need to pay attention to the proximity between the
and Limelight, help deliver the content (such as videos and im- corresponding components.
ages) from the content providers to Internet clients. To ensure D. Related Applications
the high bandwidth and low network latency of the service,
We also need to take into account of the location of the
it is critical to bring the content close to the Internet clients,
related applications when selecting the cloud infrastructures
necessitating the high distribution of the service components.
for an application. It is common that applications interact
Other examples include multiplayer distributed online games
with each other. For example, the financial systems need to
[4, 20], distributed database systems [5, 35], etc.
communicate with each other for many critical tasks, like
When deploying the applications in the cloud platforms,
money transfer and credit card payments. Third-party geoIP
it is very important to consider the proximity among the
database services are often accessed by other applications
application components. The interaction among the compo-
for IP and location information. Also, hospitals often share
nents of applications plays an important role in the quality of
patient information and need to communicate with each other.
service of these applications. For example, the web servers
It would be preferable to place the applications close to the
that receive client requests need to contact the application
location of related applications to reduce the network latency
servers which further need to access the back-end databases
and increase the network bandwidth. The location information
before replying to the clients. In order to minimize the network
of the related applications are relatively easy to obtain by
latency, it is better to place all the components in the same
checking the behavior of each application components. Simi-
location. However, the centralized deployment suffers from
larly, we also need to consider the volume of communication
the single point of failure. Cloud computing platforms are not
between the applications.
perfectly reliable. For example, many services are interrupted
by the failure of Amazon cloud platforms [12]. Furthermore, III. P ROXIMITY- AWARE C LOUD I NFRASTRUCTURE
some highly distributed applications, like the CDNs mentioned S ELECTION
above, requires the geographic distribution of the application In this section, we discuss the selection of cloud infrastruc-
components across the world such that high bandwidth and tures taking into account of both the price and proximity.
low network latency can be ensured. In addition, security
concern in the cloud platform may require that application A. Environment
components are placed at different locations. When deploying Each component of the application has its resource
the applications in the cloud platforms, application providers requirement, which is represented by the vector <
may prefer to host some components, such as the databases, at CP U, M em, Disk, N etwork >. Assume there are M com-
the on-premises infrastructures so that they are still under their ponents of the application, N cloud providers with overall W
own control for security reasons, and only put components data centers, L client clusters and R related applications. We
that are less security-sensitive at the cloud platforms. Also, need to find a placement policy to deploy the M components
in the hybrid cloud, security-sensitive components usually run at the W data centers. Let P be the placement matrix of

405
    requirements for CPU, memory, network and storage. The
billing of different resources is also different for different
 cloud providers. For applications that are CPU-sensitive, like
data processing services, cloud infrastructures that charges less
on CPU (even though it may charge more on storage) are
    
  preferred. For network-sensitive applications, such as stream

services, it is better to choose the data centers that charge less

on network usages.

One thing that makes the problem more complex is that

some cloud providers offer infrastructures with better per-



formance than others. For example, according to [22], the
response time and throughput for an disk I/O operation is
different for different cloud providers. Thus, we can not simply
consider the price disregarding the quality of the cloud infras-

 
tructures. In order to handle this problem, we normalize the
price over the performance quality of the cloud infrastructures.
Fig. 1. System Graph
Minimize the overall distances. In order to minimize the
overall network latency to enhance the quality of service
the components. If component COM Pi is deployed at the of applications, we should minimize the overall network
data center DCj , then Pi,j = 1; otherwise, Pi,j = 0, for distances. It is more complicated than the classical K-Median
i = 1, 2, ..., M, j = 1, 2, ..., W . Note that, some application problem [11], as it not only needs to minimize the distances
providers may prefer to deploy all the application components from client clusters to the data centers that are selected to
at the data centers belonging to the same cloud provider, deploy the application components, but also needs to minimize
which may provide extra benefit of better interconnection the distances among the application components as well as the
among the data centers (some cloud providers, such as Google distances between the components and the related applications.
and Microsoft, often have their own backbone networks to C. Problem Hardness
interconnect their data centers [39].) In that case, an additional
constraint on the selected data centers can be applied. In this It is a multi-objective optimization problem: deploying all
paper, we assume that application components can be place at the components at the cheapest cloud infrastructure would
any data center by default. incur smallest cost, but might lead to larger user-perceived
The interaction pattern graph of the application components response time, which would hurt the business. Noticing the
mentioned above is extended to also include the client clusters binary choice of each element in the placement matrix P , it
and related applications, as shown in the system graph G=(V,E) is a variation of the 0-1 integer linear programming problem,
of figure 1 as an example. Each client cluster (only three, CC1, which is NP-hard [2].
CC2 and CC3 are shown in the figure) and related application The complexity of this problem is O(W M ), which is
(like node F and G) also have a node in the graph, dubbed prohibitively expensive in computation. For example, suppose
client cluster node and application node respectively. An edge there are 10 components of the application and 100 data
is added from each client cluster to the component accessed centers of all the cloud providers, there could be W M = 10010
by the clients (like node A). Also, if a component visits a placement policies. In order to deal with this problem, we give
related application, an edge is added between them, like the each objective a weight indicating how important the objective
edge e(B, F ). A value is associated with each edge, which is, and then combine them into a single optimization problem
is the distance between the nodes. Note that, there is another with one cost function. Meanwhile, in stead of checking each
value on the edges between the components indicating the possible placement policy, we propose a step-wise heuristic
volume of the interaction between the components, which is algorithm in which we only decide the placement of one
not shown in the figure. component at a step. In this way, the complexity is reduced
to O(W.M ). We are still investigating this step-wise selection
B. Objectives algorithm, which is one of the main tasks in our ongoing work.
When generating the placement policy matrix P , the system
tries to fulfill the following objectives: IV. L OCAL VM A LLOCATION
Minimize the deployment cost. Different cloud providers After knowing which data center to deploy each appli-
have different billing models. Even different cloud infrastruc- cation component, we need to allocate VMs at the se-
tures belonging to the same cloud provider charge differently. lected data centers for it. While this might seem straight-
For example, according to [1], the Amazon data centers at forward in some sense, more issues should be considered
northern California charges $0.049 per hour for standard actually. For example, given the resource requirement <
reserved instances, while data centers at north Virginia charges CP U, M em, Disk, N etwork > of a component, we need
$0.039 per hour. Meanwhile, each application has the specific to decide how many VMs and what the size is of each

406
VM for the component. There can be multiple options. For components and propose to build a system to automatically
instance, one can allocate just a few or even a single VM select the cloud infrastructures for customers.
that can satisfy all the resource requirements, called ”Giant In our ongoing work, we are developing heuristic algo-
VMs”, or allocate more VMs, each with smaller amount of rithms and implementing the system for extensive evaluation.
resources, called ”Tiny VMs”. The VMs would host replicas We would also consider to introduce tiered service model
of the component, which usually sit behind a load balancer [26, 25, 36] for the cloud provider, which classifies the service
and form an application level cluster [29, 38]. In terms of into tiers and charge the service tiers at different prices.
price, the ”Giant VMs” option is usually cheaper than the
”Tiny VMs” option, since ”Tiny VMs” have the minimum R EFERENCES
memory requirement in order to work properly, incurring more [1] Amazon ec2 pricing. http://aws.amazon.com/ec2/pricing/.
overall memory cost. Furthermore, more communications are [2] Integer programming. http://en.wikipedia.org/wiki/i
needed among the cluster replicas, which introduces more nteger programming.
network bandwidth consumption. However, ”Tiny VMs” is [3] Microsft adds new data centers to Azure infrastruct
more preferable in terms of reliability, since the failure of an ure. http://www.datacenterdynamics.com/focus/arch
VM would not impact the availability of the component too ive/2012/04/microsft-adds-new-data-centers-azure-
much compared with ”Giant VMs” option. infrastructure.
Another issue is the network configuration of the VMs. [4] Daniel Bauer, Sean Rooney, and Paolo Scotton. Net-
Some data centers are connected with multiple ISPs to increase work infrastructure for massively distributed games. In
the reliability of the network [39]. In these multi-homed data NetGames. ACM, 2002.
centers, we need to decide from which ISP to allocate the [5] Philip A. Bernstein and Nathan Goodman. Concurrency
IP addresses for the VMs. Allocating IPs from the same ISP control in distributed database systems. In ACM Comput.
for all the application components as much as possible can Surv. ACM, 1981.
potentially improve the network performance. [6] Asia Pacific Data Centers.
http://www.terremark.com/data-centers/asia-pacific.aspx.
V. R ELATED W ORK [7] AT&T Data Centers. http://www.datacentermapping.com/
Several works on the comparing and selecting the cloud data-centers/provider/at&t.
services exist in the past years. The work in [22, 23] compared [8] Joyent Data Centers. http://joyent.com/products/joyent-
the service performance of four major cloud providers to help cloud/data-centers.
the customers choose cloud services. In [18], a conceptional [9] North America Data Centers.
framework is proposed to compare the cloud services based http://www.terremark.com/data-centers/americas.aspx.
on the performance of virtual machines. Unlike their work, we [10] Rackspace Global Data Centers.
take the proximity into account and aim to build a system to http://www.rackspace.com/whyrackspace/network/
select the cloud infrastructures automatically for customers. datacenters/.
Also, authors in [31] introduced a mathematical formulation [11] Moses Charikar, Sudipto Guha, Éva Tardos, and David B.
and method of the cloud service selection based on multiple Shmoys. A constant-factor approximation algorithm for
abstract criteria. Different from it, our work intends to develop the k-median problem. In STOC. ACM, 1999.
a real system to automatically selection the infrastructures [12] Amazon EC2 data center fail-
in IaaS cloud platforms. In [16], an approach utilizing the ure affects Netflix Instagram.
analytic hierarchy process is proposed to select the services http://www.electronista.com/articles/12/06/30/storms.in.n
for software as a service cloud. Our work differentiate from orth.virginia.takes.out.amazon.data.center/.
them in that we focus on the infrastructure as a service cloud, [13] Fred Douglis, Deepti Bhardwaj, Hangwei Qian, and
in which the proximity should be considered explicitly, while Philip Shilane. Content-aware load balancing for dis-
in software as a service cloud, only the overall service quality tributed backup. In LISA. USENIX, 2011.
needs to be considered. Finally, many work focus on the stor- [14] Middle East Europe and Africa Data Centers.
age services, like storage selection, security, storage backup http://www.terremark.com/data-centers/europe-middle-
and deduplication etc., such as [32, 19, 13, 37]. Different from east-africa.aspx.
them, this work considers the general applications. [15] Paul Francis, Sugih Jamin, Cheng Jin, Yixin Jin, Danny
In addition, to our knowledge, this work is the first one to Raz, Yuval Shavitt, and Lixia Zhang. Idmaps: a global
consider the interaction among application components and the internet host distance estimation service. In IEEE/ACM
relationship with other applications when making the selection Trans. Netw. IEEE Press, 2001.
of cloud services. [16] Manish Godse and Shrikant Mulik. An approach for se-
lecting software-as-a-service (saas) product. In CLOUD.
VI. C ONCLUSION AND F UTURE W ORK IEEE, 2009.
In this position paper, we consider the deployment cost and [17] Krishna P. Gummadi, Stefan Saroiu, and Steven D. Grib-
the location of cloud infrastructures, application clients, related ble. King: estimating latency between arbitrary internet
applications as well as the interaction among the application end hosts. In IMW. ACM, 2002.

407
[18] Seung-Min Han, Mohammad Mehedi Hassan, Chang- Hsu. Characteristics of backup workloads in production
Woo Yoon, and Eui-Nam Huh. Efficient service recom- systems. In FAST. USENIX, 2012.
mendation system for cloud computing market. In ICIS. [38] W. Zhang, H. Qian, C.E. Wills, and M. Rabinovich. Agile
ACM, 2009. resource management in a virtualized data center. In
[19] Seny Kamara and Kristin Lauter. Cryptographic cloud ACM WOSP/SIPEW, 2010.
storage. In FC. Springer, 2010. [39] Zheng Zhang, Ming Zhang, Albert Greenberg, Y. Charlie
[20] Rajesh Krishna Balan, Maria Ebling, Paul Castro, and Hu, Ratul Mahajan, and Blaine Christian. Optimizing
Archan Misra. Matrix: adaptive middleware for dis- cost and performance in online service provider net-
tributed multiplayer games. In Middleware, 2005. works. In NSDI. USENIX, 2010.
[21] B. Krishnamurthy and J. Wang. On network-aware
clustering of web clients. In ACM SIGCOMM, 2000.
[22] Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming
Zhang. Cloudcmp: comparing public cloud providers.
In IMC. ACM, 2010.
[23] Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming
Zhang. Cloudcmp: shopping for a cloud made easy. In
HotCloud. USENIX, 2010.
[24] Where Amazons Data Centers Are Located.
http://www.datacenterknowledge.com/archives/2008/11/1
8/where-amazons-data-centers-are-located/.
[25] Qian Lv and George N. Rouskas. On Optimal Sizing of
Tiered Network Services. In INFOCOM. IEEE, 2008.
[26] Qian Lv and George N. Rouskas. An economic model
for pricing tiered network services. In ICC. IEEE, 2009.
[27] Maxmind. http://www.maxmind.com.
[28] Map of all Google data center locations.
http://royal.pingdom.com/2008/04/11/map-of-all-google-
data-center-locations/.
[29] Hangwei Qian, Elliot Miller, Wei Zhang, Michael Rabi-
novich, and Craig E. Wills. Agility in virtualized utility
computing. In VTDC, 2007.
[30] Hangwei Qian, Michael Rabinovich, and Zakaria Al-
Qudah. Bringing local dns servers close to their clients.
In GLOBECOM, 2011.
[31] Zia ur Rehman, Farookh K. Hussain, and Omar K.
Hussain. Towards multi-criteria cloud service selection.
In IMIS. IEEE, 2011.
[32] Arkaitz Ruiz-Alvarez and Marty Humphrey. An auto-
mated approach to cloud storage service selection. In
ScienceCloud. ACM, 2011.
[33] Hanan Samet, Jagan Sankaranarayanan, and Houman
Alborzi. Scalable network distance browsing in spatial
databases. In SIGMOD. ACM, 2008.
[34] Sizing the Public Cloud Services Market.
http://softwarestrategiesblog.com/2011/07/02/sizing-
the-public-cloud-services-market/.
[35] Alexander Thomson, Thaddeus Diamond, Shu-Chun
Weng, Kun Ren, Philip Shao, and Daniel J. Abadi.
Calvin: fast distributed transactions for partitioned
database systems. In SIGMOD. ACM, 2012.
[36] Vytautas Valancius, Cristian Lumezanu, Nick Feamster,
Ramesh Johari, and Vijay V. Vazirani. How many tiers?:
pricing in the internet transit market. In SIGCOMM.
ACM, 2011.
[37] Grant Wallace, Fred Douglis, Hangwei Qian, Philip Shi-
lane, Stephen Smaldone, Mark Chamness, and Windsor

408

You might also like