Global Software Development with Cloud
Platforms
Pavan Yara, Ramaseshan Ramachandran, Gayathri Balasubramanian,
Karthik Muthuswamy, and Divya Chandrasekar
Cognizant Technology Solutions,
# 5/639 Old Mahabalipuram Road, Kandanchavadi, Chennai - 600096, India
{Pavankumar.Yara,Ramaseshan.Ramachandran,Gayathri.Balasubramanian,
Karthik.Muthuswamy,Divya.Chandrasekar}@cognizant.com
http://www.cognizant.com/
Abstract. Offshore and outsourced distributed software development
models and processes are facing challenges, previously unknown, with
respect to computing capacity, bandwidth, storage, security, complexity,
reliability, and business uncertainty. Clouds promise to address these
challenges by adopting recent advances in virtualization, parallel and
distributed systems, utility computing, and software services. In this
paper, we envision a cloud-based platform that addresses some of these
core problems. We outline a generic cloud architecture, its design and our
first implementation results for three cloud forms - a compute cloud, a
storage cloud and a cloud-based software service- in the context of global
distributed software development (GSD). Our "compute cloud"provides
computational services such as continuous code integration and a compile
server farm, "storage cloud" offers storage (block or file-based) services
with an on-line virtual storage service, whereas the on-line virtual labs
represent a useful cloud service. We note some of the use cases for clouds
in GSD, the lessons learned with our prototypes and identify challenges
that must be conquered before realizing the full business benefits. We
believe that in the future, software practitioners will focus more on these
cloud computing platforms and see clouds as a means to supporting a
ecosystem of clients, developers and other key stakeholders.
Keywords: Globally Distributed Software Development, Cloud computing, Software-as-a-Service, compute cloud, storage cloud.
1
Introduction
The last decade has witnessed Globally Distributed Software Development(GSD)
model becoming a business necessity to capitalize on global resource pools,
attractive cost structures, and round-the-clock development for achieving faster
cycle-time accelerations [3,26,31]. At the same time, GSD has also brought
unique nuances, complexities, and challenges ranging from technical, temporal,
spatial, and process standpoints [4,25,34]. Some of these issues are long standing
such as effective capacity planning, resource provisioning, software lifecycle
O. Gotel, M. Joseph, and B. Meyer (Eds.): SEAFOOD 2009, LNBIP 35, pp. 129–143, 2009.
c Springer-Verlag Berlin Heidelberg 2009
130
P. Yara et al.
management, communication, coordination, and collaboration mechanisms.
In addition, we are also seeing relatively new challenges with the rise of
multi cores, virtualization, recent programming frameworks & abstractions, and
other complex advances. Now, with the intensification of global economic
activity and the resulting demand for cost/benefit analysis, the need for
better outsourcing software engineering and management approaches has only
become more pronounced. Also, over the years, the various structured and
other disciplined software engineering approaches, advocated as key remedies
for addressing these GSD challenges, have undergone refinement. A range of
new, effective platforms and practices have emerged and have been adopted to
address these unique challenges of GSD. These mechanisms - such as better
communication and coordination practices [7], management of global software
teams [28], effective resource leveraging with virtual teams [33], collaboration
and knowledge management tools & techniques [16], programming methodologies
and processes [13], software lifecycle models, service oriented architecture [27]
concepts specifically web services and web 2.0 technologies, grid infrastructures
to provide IT services [17] - have succesfully tried to address the GSD challenges.
In this paper, we discuss one such emerging paradigm with several possible
positive implications for GSD. "Cloud computing", as it is popularly known,
is a paradigm that represents a disruptive business and technology concept
with different meanings for different GSD ecosystem partners. For example,
for IT users, it is a way to deliver computing, storage and applications over
the network, often the Internet, from centralized data centers. For application
developers, it is an internet-scale software development platform and run-time
environment with several interesting use case scenarios such as always-on and
always-available development environments, content collaboration spaces to
share code, documents, presentations, discussions in a Facebook mode social
media style, and services like online IDEs, continuous code builds and testing.
For infrastructure providers and administrators, it is a massive, distributed data
center infrastructure connected by IP networks to achieve economies of scale
and grant "on-demand" access to computing capacity. Thus, cloud computing
delivers "IT" as a service (ITaaS) which can be adapted to address some of the
core challenges in the GSD.
Cloud computing tries to replace the traditional desktop-as-a-platform with
network-as-a-platform model. As such, it builds on decades of research in
virtualization, parallel and distributed systems, utility computing, and more
recent advances in the fields of networking, Web, and software services. The
key idea of this paradigm is to provide a utility service, similar to a power
grid, into which a user may plug-in regardless of location to access always-on,
always-available, and device-independent IT services. In this way, it represents
the next natural step in the evolution of computing and IT services. It promises
to maximize the productivity of all IT-related activities. One such instance is
where time and resources spent building or customizing application frameworks
or building software/hardware infrastructure could be better spent on improving
the business logic with cloud paradigm.
Global Software Development with Cloud Platforms
131
We explore the nature and potential of clouds in the paper. The main
contribution of this paper is in laying a cloud-based vision for GSD and
formulating a generic architecture to address some of the GSD challenges. In
doing so, we also showcase how we realize key business benefits with our cloudbased platforms. Our cloud platforms are easily adaptable to provide a common,
managed, and powerful infrastructure to support GSD activities. Accordingly,
the paper is structured as follows: we explain the basic concepts underlying cloud
computing paradigm in Section 2 and discuss how clouds can be used within
GSD ecosystem in Section 3. As such, we present a preliminary architecture
of the framework and discuss candidate technologies to realize it in Section 4.
Section 5 describes three GSD adaptable cloud service prototypes done at our
lab - a Compute cloud providing computational services such as continuous code
builds and integration with on-demand infrastructure; a Storage cloud providing
massive online storage for software project-related artifacts, and a cloud-based
online virtual lab solution for providing always-on and always-available training,
testing and debugging facilities - before concluding.
2
Cloud Computing Paradigm
Cloud Computing fulfills the long held dream of computing as a utility [45]
and thus has the potential to transform how the IT ecosystem makes and uses
hardware and software as a utility and a service.
2.1
Concepts
The term Cloud Computing usually refers to online delivery and consumption
model for business and customer services. These include IT services like Softwareas-a-Service (SaaS) and Storage or Server capacity as a service and many nonIT business and consumer services which are not computing tasks. All these
are commonly referred to as X-as-a-Service (XaaS). However, for our paper, we
adopt these following definitions:
Cloud is a pool of highly scalable, abstracted infrastructure, capable of hosting
end-customer applications, that is billed by consumption [20].
Cloud services refer to the consumer and business products, services and
solutions that are delivered and consumed in real-time over network.
Cloud Computing is an emerging IT development, deployment and delivery
model which enables real-time delivery of products, services, and solutions
over the Web (i.e., enabling cloud services).
Technically, this paradigm refers to providing services on virtual machines
allocated on top of a large machine pool, whereas in business terms, the term
means a method to address scalability and availability concerns for applications.
132
2.2
P. Yara et al.
Characteristics
From an higher abstraction point, three aspects are primary to clouds - Elasticity,
Pay-per-use model, and High-Availability. The key characteristics of cloud
services, in the lines of Forrester [20], are as follows:
1. Standardized IT-based capability: Delivers compute, storage, network or
software-based capabilities, solely or in combination through standard
offerings.
2. Accessible via Internet protocols from any computer: Standards-based,
universal network access through a regular web browser via HTTP, XMPP,
Open ID, OAuth or Atom protocols.
3. Always available, and scales automatically to adjust to demand: Resilient and
highly available; elastic enough to cope with scale and demand.
4. Pay-per-use or advertising-based: Service is paid up in three ways - advertising, subscription or transaction based.
5. Web or programmatic-based control interfaces: Uses service-based interfaces
like XML, JSON and REST-style software connection standards.
6. Offers full customer self-service: Customers can provision, manage, and
terminate services themselves and the control is via a Web interface or
programmatic call to service APIs.
2.3
Examples
Although the paradigm has emerged only recently, the implications of IT
services provided through it are wide-reaching [5,8,40,45]. Cloudy infrastructure
companies such as Amazon and GoGrid offer data storage priced by the
gigabyte per month and computing capacity by the CPU-hour [1]. Office and
productivity applications such as Google Apps [21], Zoho office suite, MS
SharePoint Online, Cisco WebEx in the cloud make collaboration more accessible
and highly available. SaaS companies offer CRM services through their multitenant shared facilities so that clients can manage their customers without
buying software [39]. These use cases represent only the beginning of options for
delivering all kinds of complex capabilities like online businesses, collaboration
tools, R&D projects, quick project promotions, partner integration, new business
ventures [11,40], etc.
2.4
Benefits
The economic appeal of clouds is often summed up by the statement converting
capital expenses to operating expenses [2]. There are other clear business
benefits such as almost zero upfront infrastructure investment, just-in-time
infrastructure, more efficient resource utilization, usage-based costing and a real
potential for shrinking the processing time. We recommend the reader to refer
[2,5,8,11,36,40] for more details.
Global Software Development with Cloud Platforms
2.5
133
Public, Private and Hybrid Clouds
As discussed in the introduction, clouds are the result of the natural transformation of the IT infrastructure of enterprises over the last decade and can take
many forms and can be of many types. In this paper, we look at three major
types of clouds.
Public clouds are cloud services offered by third-party providers (vendors) such
as Amazon, Google, Salesforce.com for public consumption. The vendors fully
host and manage the infrastructure and charge customers for the resources they
use, usually on a hourly or transaction based interaction. Private clouds are cloud
services provided within the enterprise firewall and managed by the enterprises
such as Boeing or GM. They offer the same benefits as public clouds but with
fine-grained control, security and compliance norms. The major difficulty with
private clouds is the complexity and cost involved in setting up “internal” clouds.
Hybrid Clouds are a combination of public and private cloud properties. They
leverage services that are in both the public and private work spaces and are
typically used in scenarios like where they need to receive customer payments or
do employee payroll processing. The major drawback with hybrid clouds is the
difficulty in effectively creating and governing such a solution [18].
3
Cloud Platforms for GSD
While we are yet to see fundamentally new types of applications enabled by cloud
computing, we believe that it offers compelling benefits with several important
classes of existing applications for GSD model. The cloud paradigm, by its design,
tries to optimize IT-related productivity by taking care of scaling and availability
concerns and redirecting resources to long term strategic business development.
Emphasizing communication, collaborative work and community interaction, we
perceive clouds to offer huge leverage in many of the GSD related activities.
For example, when companies outsource tasks, those tasks often require close
working relationships between the companies involved. These collaborations
grow organically to form communities around the particular task they aim
to solve. This presents multiple issues. In previous generations of GSD, the
environments and tools had to be made available to teams involved; organizations
had to acquire the tools at their own cost, pool resources to provision the
requested job; workers couldn’t locate and use their best tools for the job as
determined by them; and they had to span time and space to share, discuss,
collaborate or even publish content where necessary, as time and circumstance
required. In addition, the exchange of content that happens in multiple forms
such as emails, discussion forums, bug tracking systems, version control systems
and logging, make it a very complex activity. Cloud-based platforms can be used
for such cases, in the form of content collaboration spaces and always-on and
always-accessible IT services. In this section, we discuss three such useful areas
for GSD - development, quality assurance & testing, and IT operations.
134
P. Yara et al.
3.1
Development
Clouds offer instant resource provisioning, flexibility, on-the-fly scaling, and highavailability for continuously evolving GSD-related activities. Some of the use
cases include:
– Development Environments: With clouds, the ability to acquire, deploy,
configure, and host development environments becomes "on-demand". The
development environments are, then, always-on and always-available to the
concerned teams with fine-grained access control mechanisms. In addition,
the development environments can be purpose-built with support for
application-level tools, source code repositories, and programming tools.
After the project is done, these can also be archived or destroyed. The other
key element of these "on-demand" hosting environments is the flexibility
through its quick "prototyping" support. Prototyping becomes flexible, in
that as new code and ideas can be quickly turned into workable Proof-OfConcepts and tested.
– Developer Tools: Hosting developer tools such as IDEs and simple code
editors in the cloud eliminates the need for developers to have local IDEs and
other associated development tools. This also offers the concerned project
members to access the development environment and tools, across time-zones
and places.
– Content Collaboration Spaces: Clouds make collaboration and coordination practical, intuitive, and flexible through easy enabling of content
collaboration spaces, modeled after the social software domain tools like
Facebook or Flickr, but centering on project-related information like invoices,
statments, RFPs, requirement docs, documentation, images and data sets.
These content spaces can automate many project related tasks such as
automatically creating MS Word versions of all imported text documents
or as complex as running workflows to collate information from several
different organizations working in collaboration. Each content space can
be unique, created by composing a set of project requirements. Users can
invite internal and external collaborators into this customized environment,
assigning appropriate roles and responsibilities. After the group’s work is
"complete", their content space can be archived or destroyed. These spaces
can be designed to support distributed version control systems like bzr,
mercurial, and git enabling social platform conversations and other content
management features.
– Continuous Code Integration: Compute clouds let ’compile-testchange’ software cycle on-the-fly do continuous builds and integration
checks to meet strict quality checks and development guidelines. They can
also enforce policies for customized builds.
– APIs & Programming Frameworks: Clouds also compel developers
to embrace standard programming model APIs where ever possible and
adhere to style guides, conventions, and coding standards in meeting the
specific project requirements. They also force developers to embrace new
programming models and abstractions such as .NET, GWT, Django, Rails,
Global Software Development with Cloud Platforms
135
and Spring for increasing overall productivity. One more key feature of
using clouds is that they enforce constraints, which pushes developers to
address the critical next-gen programming challenges of multi-cores, parallel
programming and virtualization [22].
The software engineering community is also fast evaluating approaches like Agile,
Automatic, Extreme, Pair and Re-factoring to suit clouds [2,22]. The above
mentioned use cases can be applied within both “public” and “private” clouds. If
the client requirements require “security” and "control” to be the main concerns,
we recommend offshore development companies to adopt “private” clouds as they
allow enterprises to retain control, but at the same time offer them flexibility,
availability and economies of scale.
3.2
Testing and Quality Assurance
There are two components to cloud computing in software testing. Clouds
provide the computing infrastructure for doing software testing across platforms
and in various combinations. The other component uses clouds to run fully
functional test cases with industry standard frameworks and regression support.
The use of virtual appliances for providing the requested computing requirements
is becoming a practice in the software testing domain. Virtual appliances are a set
of virtual machines pre-built, pre-configured, ready-to-run applications packaged
along with optimized operating systems. These enable flexible and quick software
testing. On the other hand, they are also used to automate execution of
some industry standard tests, support debugging and code coverage tools to
identify gaps in test procedures. In addition, the ability of clouds to simulate
thousands of users hitting Web applications is particularly attractive. Thus,
cloudifying testing services opens up interesting possibilities. One immediate use
case is where cloud testing is used to verify the real scalability of sites, servers,
applications, and networks in advance of a genuine surge in traffic.
3.3
IT Operations
Now, clouds are increasingly being used to simplify the management part of
operations in offshore development centers. Recent studies shows that cloud
deployment times can be reduced to less than 6 hours from the traditional IT
deployment times of 14-24 days for eight typical IT management tasks [29].
These tasks include operating system tasks like back-up, recovery, installation of
patches, network tasks including server assignments, configuration of network
and security parameters, installation of software, etc. The first significant
advantage of such clouds is the “cost savings” factor. The traditional IT model
requires business users to make a front-loaded investment in software and
hardware as well as a lifecycle investment in professional staff to maintain
servers and upgrade software. Clouds shift much of this expense to a “payas-you-go” model and so offer significant cost advantages in terms of power,
space, cooling, hardware and operations personnel [2,11,24,44]. Other key
136
P. Yara et al.
operations benefits include the ease and effective use for backup and restore
activities to provide business continuity; ability to handle security and archiving
required for accountability and compliance regulation laws such as SOX, HIPAA
and the powerful software configuration management [37] it provides so that
infrastructure gets provisioned, deployed and relinquished according to business
needs.
4
Our Architecture and Service Offerings
Having listed the cloud advantages for GSD model, we present an applicable
and generic high-level architecture of clouds, and a hierarchy of cloud service
offerings possible with our architecture to benefit all the key stakeholders in the
ecosystem.
Application
Server
Web-based Services
Software-as-a-Service
Platform
App-components-as-a-Service
Infrastructure
Software-platform-as-a-Service
Virtualization
Virtual-infrastructure-as-a-Service
Storage
Network
(a) Cloud Architecture
Physical-infrastructure-as-a-Service
(b) Cloud Service offerings possible
Fig. 1. Generic Cloud Architecture and various service offerings possible with the
architecture
4.1
Architecture Overview
We present our architecture as a layered stack to suitably represent the growing
list of technologies and IT offerings in this space. There are several elements to
the entire GSD ecosystem and the architecture is envisioned as being spreadout over and catering to cover most of these elements (generic and abstract).
Figure 1(a) shows a generic cloud architecture for GSD. The Application layer
covers the Web-based UIs, web service APIs, multi-tenant architecture and a
rich variety of configuration options. The Platform layer adds a software stack to
the underlying infrastructure layer, manages virtual machines and supports the
development, integration and run-time execution of cloud application software.
The Infrastructure layer makes use of the underlying virtual infrastructure
so as to economically scale to very high volumes, and preferably do so in a
granular fashion. The Virtualization layer abstracts the physical resources like
servers, storage or network devices and presents equivalent logical resources for
consumption to other layers. The architecture is designed to facilitate service
offerings that serve to improve processes in GSD. Thus, we attempt to address
issues pertaining to cost constraints, hardware/software resource provisioning
and collaboration through our generic and high-level architecture.
Global Software Development with Cloud Platforms
4.2
137
Cloud Service Offerings
Embarking on familiar GSD product categories like developer tools, middleware
and IT infrastructure tasks, we segment cloud services based on the proposed
cloud architecture, as shown in Figure 1(b).
Software-as-a-Service (SaaS) delivers a single application through the browser
to thousands of customers using a multi-tenant architecture. For the customer,
it means no upfront investment in servers or software licensing; for the provider,
with just one app to maintain, costs are low compared to conventional hosting
(e.g., Salesforce.com [39]). App-components-as-a-Service spans a spectrum from
mash ups to third-party APIs. These app components are aimed at offering
developers higher-level software modules for combining existing code to create
applications (e.g.,Live Mesh API [32]). This should improve efficiency and
encourage code reuse in the development process, which is one of the pain areas in
GSD. Software-platform-as-a-service (PaaS) is an entirely virtualized platform
that includes one or more servers, operating systems, and specific applications
(e.g.,Google App Engine [22]). Virtual-Infrastructure-as-a-service (IaaS) or
Hardware-as-a- service (HaaS) is the delivery of computer infrastructure as a
service. This layer differs from PaaS in that the virtual hardware is provided
without a software stack (e.g., Amazon EC2 [1]). There are other offerings also
possible such as communication-as-a-service, desktop-as-a-service, database-asa-service, data-storage-as-a-service, data-as-a-service, data-mining-as-a-service,
finance-as-a-service, framework-as-a-service, IDE-as-a-service, integration-as-aservice, and monitoring-as-a-service [11,14,20,36,44].
4.3
Key Enabling Technologies
A lot of enabling technologies contribute to the outlined cloud architecture Here,
we identify some state-of-the-art technologies that make clouds practical and
possible:
Virtualization enables clouds to deliver on-demand IT infrastructures through
virtual machines (VMs). VMs are created and managed by a Virtual Machine
Monitor (VMM), which is the software layer between the operating system
and the physical machine. VM-based platforms offer several advantages including better isolation, availability and portability apart from the flexibility
and scalability it brings. There is a lot of renewed interest in virtualized
platforms these days which is evident by its presence in various forms such
as Server, Desktop, Application, Storage and Network coming from industry
players like VMware, Citrix, Microsoft, Red Hat, Cisco, and Sun. For more
details, the reader is advised to refer [6,38,43].
MapReduce [15] is the dominant programming model used in clouds that
provide on-demand computing capacity. Map Reduce assumes that many
common programming applications can be coded as processes that manipulate large data sets of <key,value> pairs. The Map process maps each <key,
value> pair in the data set into a new pair of <key’,value’>. The Reduce
138
P. Yara et al.
process, then, merges values with the same key. Although this is seemingly
simple model, it has been used to support a large number of applications, that
manipulate data. Hadoop is an open source implementation of this model
[9]. Stream-based parallel programming models, in which a User Defined
Function (UDF) is applied to all the data, are also commonly used.
Other programming models modeled after Google File System (GFS) and
Big Table are also common in many cloud forms. GFS refers to a scalable
distributed file system for large data-intensive applications [19]. It not only
provides fault tolerance while running on inexpensive commodity hardware,
but also delivers high aggregate performance to a large number of clients.
Data automatically get distributed to nodes at load time, and are processed
locally, in parallel with output data written to local disks, forming a single
user-accessible volume. Big Table [12] represents a database layer with
the key idea of separating organized storage from query storage. It is a
distributed storage system for managing structured data that is designed
to scale to a very large size. HDFS and HBase are the open source
implementation of these models [9].
Service-Oriented Architecture (SOA) allows for delivery of an integrated
and orchestrated suite of functions to an end-user through composition of
both loosely and tightly coupled functions, or services - often network-based,
following industry standards like WSDL, SOAP and UDDI [10,27].
Some cloud forms also make use of frameworks such as Pig [35], Zookeeper,
Hive, Sawzall [23], LINQ [30], Condor [42] to cope up with the complexity,
frequent failured nature of commodity hardware like hard drive crashes, network
up-downs.
5
Our Experiments
We have used these technologies to implement three GSD adaptable internal
cloud prototypes. This section presents our experiments and initial results.
5.1
Compile Server Farm
The traditional modes of compiling large software projects across clusters has
always been done using message passing interfaces such as MPI or OpenMP.
These interfaces are difficult to code, prone to errors and often time-consuming.
Moreover, in GSD environments, there has always been a need for faster compile
and build cycles to test, debug and maintain complex software projects across
verticals and domains. In such scenarios, lots of projects gets created, compiled,
and maintained on a regular basis to suit the business and client requirements.
Each of these projects typically constitute thousands of files, which when
compiled might take hours or days together to build and deploy. This changecompile-test cycle is one of the most time consuming event of a project and it
requires huge resources like server class machines, or clusters.
Global Software Development with Cloud Platforms
139
Our first prototype tries to address this common issue using compute cloud
concepts. We represent a compute cloud as a “nexus of hardware, software and
data which provides compute services over network”. Our compute cloud is
actually a “compute server farm”. By using Hadoop and Condor, we managed to
speedup the Change-Compile-Test cycle of large software projects. Another,
key motivation for our compute clouds is to do continuous software code
integration. The following algorithm is used for our “compute server farm”.
Algorithm 1. Compute Server Farm with Hadoop and Condor
1.
2.
3.
4.
5.
6.
Client workstations launch jobs
Condor dynamically allocates clusters
Hadoop-on-Demand starts the MapReduce program on the clusters
MapReduce program reads/writes into HDFS
When done, the results are either stored in the HDFS and/or returned to the client
Condor reclaims nodes
We tested this approach on compiling and building up Eclipse IDE from its
Java source code. Condor [42] is used as a batch scheduler to schedule jobs
on idle workstations. We have written a Map-Reduce program [15] targeting
the hadoop run time environment to automatically split, distribute, store and
parallelize the computation using HDFS [9] as a temporary file storage. We used
JDepend for analyzing the source dependencies. The results are encouraging
with the total compilation taking about 80 minutes with 5 standard desktop
workstations, against a standalone job, taking 150 minutes. Our experience with
this approach encourages us to believe that this is a viable and simpler way of
mimicking the compute cloud for larger GSD projects.
5.2
Online Storage Cloud
The key motivation for an online storage cloud is the need for scalable
data management platform to support variety of typical use cases in GSD
environments. It should support an ecosystem of users and developments
growing around project content and fast-changing content related tasks and
ideas across domains and requirements. Additionally, GSD projects have to deal
with massive, structured, unstructured and queued files. This sheer number of
files implies cumbersome storage and organization on existing storage systems;
for example, while a SAN can provide enough storage, the simple file system
interface layered on top of a SAN is not expressive enough to manage these files.
Moreover in such projects, teams and project members tend to move from
one location to other based on various business and administrative needs. When
such team or project transfers happen, there is a strong demand for personal
data backup and restoration requests from all over the project. This presents
multiple problems. First, personal computer disks are limited in capacity and
unreliable to host prolonged project specific or personal data. Second, there is a
140
P. Yara et al.
need for policy management and capacity management to deal with the growing
security concerns and unprecedented data growth. Third, the ability to access
data, driven by policy based management, remotely is not present.
To address this, we designed a storage cloud with these characteristics: a
storage service delivered over a network (Internet or Intranet); economically
scaling capacity and performance; easy to manage (e.g. terabytes+); private and
driven by enforced policies. At its core, our storage cloud is a middleware layer
with virtualized mass storage, allowing the underlying physical storage to be
NFS or NAS, shared nothing cluster file systems, or some combination of these.
Files created or hosted in our cloud are uniquely identified with a URL so that it
can be directly addressed or collectively accessed through a FUSE based virtual
directory.
Our storage cloud is implemented with three nodes making use of Xen VMs
[6], HDFS [9], NFS and powered by OpenQRM management solution. This
remote storage is mounted as a local virtual directory through File System
in USErspace (FUSE) [41] modules in Linux and Windows. Apart from the
GSD project artifacts, it can also serve other needs like content collaboration
spaces hosting code repositories, digital content, file archives, streaming media
as outlined in the Section 3.
5.3
Lab Any Where(LAW): Online Virtual Labs
Many GSD partners like IT and ITeS organizations face major challenges
in aligning their training delivery mechanisms to business objectives. These
challenges - related to cost, time, reach, and effectiveness- are prompting
organizations to revisit their traditional training delivery modes. Dedicated
physical classes don’t reach a large audience in offshore and certainly difficult for
on-site teams due to logistics related issues. Online e-learning systems provides
scalability and rapid delivery but still offshore/on-site people miss "handson" experience with real software systems. Other methodologies including
Web conference products, terminal emulators, web-based work spaces, Learning
Management systems also do not provide "real" interaction with software
systems.
Our Lab-Any-Where (LAW) prototype, tries to address this training problem,
by making use of cloud principles and components. LAW is a cloud-based
application designed to provide fully-immersive technical training and software
testing labs over network (Intranet or Internet). This Web application also
provides richly featured platforms for centrally managing hands-on training and
testing scenarios via scheduled and on-demand delivery mechanisms. We make
use of virtual appliances in achieving rapid deployment through Web browser
interface [38].
Our design prototype is implemented in two modes: one with Microsoft
Virtual Server (MVS) for delivering and testing windows OS-based training
environments, and the other one with User Mode Linux (UML) for Linux
OS-based scenarios. As such, the LAW prototype has two major components:
(i) LAW Management that can manage, control and ultimately orchestrate
Global Software Development with Cloud Platforms
141
lab resources through configurable workflows, scheduling, customization and
reporting (2) Delivery component that does automatic deployment through
virtual appliances with secure access.
6
Conclusions and Future Work
Clouds represent an inflection point in global distributed software development.
The concept draws on many existing technologies and architectures, as we have
seen in this paper. Although there is some FUD (Fear, Uncertainty, and Doubt)
with all the hype about clouds, we see clouds as a viable and effective platform
for offshore and outsourced development in the longer run. In this paper, we
outlined some positive indications and resultant implications if they are deployed
in a globally distributed software model. However, there are still concerns with
respect to vendor lock-in, SLA control, privacy, reliability, data migration &
access, auditing and regulation compliance norms. We hope that as the IT
industry works to solve these problems, cloud adoption will occur in phases,
from the nascent clouds in place today to mature cloud-based platforms with
enhanced security and better SLA norms. Moreover, it is our belief that cloud
paradigm can provide significant benefits to all key stakeholders in the GSD
ecosystem, as evident from the prototypes we showcased here. We also continue
to monitor and experiment with the different architectural, programming, and
operational models of clouds and share our results with the GSD community.
One active area is to explore the ability of clouds to play a game-changing role
in software testing. We intend to investigate further into cloudifying the testing
services as it provides ample scope and ground to check the full potential of
cloud paradigm.
References
1. Amazon: Amazon web services for simple db, s3, ec2, http://aws.amazon.com
2. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee,
G., Patterson, D., Rabkin, A., Stoica, I., et al.: Above the Clouds: A Berkeley View
of Cloud Computing. University of California, Berkeley, Tech. Rep. (2009)
3. Aspray, W., Mayadas, F., Vardi, M.Y.: Globalization and offshoring of software. A
Report of the ACM Job Migration Task Force, Executive Summary and Findings.
ACM, New York (2006)
4. Atkinson, R.D.: Understanding the offshoring challenge. Progressive Policy
Institute, Washington, DC (2004)
5. Baker, S.: Google and the wisdom of clouds. Business Week (2007)
6. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R.,
Pratt, I., Warfield, A.: Xen and the art of virtualization. ACM SIGOPS Operating
Systems Review 37, 164–177 (2003)
7. Battin, R.D., Crocker, R., Kreidler, J., Subramanian, K.: Leveraging resources in
global software development. IEEE Softw. 18(2), 70–77 (2001)
8. Bechtolsheim, A.: Cloud Computing and Cloud Networking. talk at UC Berkeley
(2008)
142
P. Yara et al.
9. Bialecki, A., Cafarella, M., Cutting, D., Malley, O.: Hadoop: a framework for
running applications on large clusters built of commodity hardware,
http://lucene.apache.org/hadoop
10. Buschmann, F.: Pattern-oriented software architecture: a system of patterns. Wiley,
Chichester (2002)
11. Buyya, R., Yeo, C.S., Venugopal, S., Ltd, M.P., Melbourne, A.: Market-oriented
cloud computing: Vision, hype, and reality for delivering it services as computing
utilities. In: Proceedings of the 10th IEEE International Conference on High
Performance Computing and Communications (HPCC 2008). IEEE CS Press, Los
Alamitos (2008)
12. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M.,
Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for
structured data. In: Proceedings of the 7th USENIX Symposium on Operating
Systems Design and Implementation (OSDI 2006) (2006)
13. Cheng, L.T., de Souza, C.R., Hupfer, S., Patterson, J., Ross, S.: Building
collaboration into ides. Queue 1(9), 40–50 (2004)
14. Church, K., Hamilton, J., Greenberg, A.: On delivering embarassingly distributed
cloud services. Hotnets VII (2008)
15. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters.
In: OSDI 2004: Proceedings of the 6th conference on Symposium on Opearting
Systems Design & Implementation, Berkeley, CA, USA, p. 10. USENIX Association
(2004)
16. Desouza, K., Awazu, Y., Baloh, P.: Managing knowledge in global software
development efforts: Issues and practices. IEEE software 23(5), 30–37 (2006)
17. Foster, I., Kesselman, C.: The grid: blueprint for a new computing infrastructure.
Morgan Kaufmann, San Francisco (2004)
18. Fryer, K., Gothe, M.: Global software development and delivery: Trends and
challenges. IBM Developer Works 1 (January 2008)
19. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: SOSP 2003:
Proceedings of the nineteenth ACM symposium on Operating systems principles,
pp. 29–43. ACM, New York (2003)
20. Gillett, E.F., Brown, G.E., Staten, J., Lee, C.: The new tech ecosystems of cloud,
cloud services, and cloud computing. Forrester Research Report (August 2008)
21. Google: Google docs and spreadsheets, http://docs.google.com
22. Google: Google’s cloud implementation as app engine,
http://code.google.com/appengine/
23. Griesemer, R.: Parallelism by design: data analysis with sawzall. In: CGO 2008:
Proceedings of the sixth annual IEEE/ACM international symposium on Code
generation and optimization, p. 3. ACM, New York (2008)
24. Hamilton, J.: Perspectives blog, http://perspectives.mvdirona.com
25. Herbsleb, J.D., Mockus, A.: An empirical study of speed and communication
in globally distributed software development. IEEE Transactions on Software
Engineering 29(6), 481–494 (2003)
26. Herbsleb, J., Moitra, D.: Global software development. IEEE software 18(2), 16–20
(2001)
27. Huhns, M.N., Singh, M.P.: Service-oriented computing: Key concepts and
principles. IEEE Internet Computing 9(1), 75–81 (2005)
28. Krishna, S., Sahay, S., Walsham, G.: Managing cross-cultural issues in global
software outsourcing. Communications of the ACM 47(4), 62–66 (2004)
29. Lin, G., Fu, D., Zhu, J., Dasmalchi, G.: Cloud computing: It as a service. IT
Professional 11(2), 10–13 (2009)
Global Software Development with Cloud Platforms
143
30. Meijer, E., Beckman, B., Bierman, G.: LINQ: reconciling object, relations and XML
in the.NET framework. In: Proceedings of the 2006 ACM SIGMOD international
conference on Management of data, p. 706. ACM, New York (2006)
31. Meyer, B., Hochschule, E.T., Zurich, S.: The unspoken revolution in software
engineering. IEEE Computer 39(1), 124 (2006)
32. Microsoft-Live: Microsoft live mesh api, http://www.mesh.com
33. Montoya-Weiss, M., Massey, A., Song, M.: Getting it together: Temporal
coordination and conflict management in global virtual teams. Academy of
Management Journal, 1251–1262 (2001)
34. Olson, J.S., Olson, G.M.: Culture surprises in remote software development teams.
Queue 1(9), 52–59 (2004)
35. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A not-soforeign language for data processing. In: Proceedings of the 2008 ACM SIGMOD
international conference on Management of data, pp. 1099–1110. ACM, New York
(2008)
36. Rangan, K.: The Cloud Wars: $100+ billion at stake. Technical report, Tech. rep.,
Merrill Lynch (2008)
37. ReductiveLabs: Puppet configuration management,
http://reductivelabs.com/trac/puppet
38. Rosenblum, M., Garfinkel, T.: Virtual machine monitors: Current technology and
future trends. IEEE Computer 38(5), 39–47 (2005)
39. Salesforce: Salesforce customer relationships management (crm) system,
http://www.salesforce.com/
40. Siegele, L.: Let It Rise: A Special Report on Corporate IT. The Economist (October
2008)
41. Szeredi, M.: Filesystem in userspace, http://fuse.sourceforge.net
42. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: The
Condor experience. Concurrency and Computation: Practice and Experience 17(24), 323–356 (2005)
43. Uhlig, R., Neiger, G., Rodgers, D., Santoni, A.L., Martins, F.C.M., Anderson, A.V.,
Bennett, S.M., Kagi, A., Leung, F.H., Smith, L.: Intel virtualization technology.
IEEE Computer 38, 48–56 (2005)
44. Vogels, W.: A Head in the Clouds - The Power of Infrastructure as a Service. In:
First workshop on Cloud Computing and in Applications (CCA 2008) (October
2008)
45. Weiss, A.: Computing in the clouds. ACM net Worker 11(4), 16–25 (2007)