Academia.eduAcademia.edu

Global Software Development with Cloud Platforms

2009

Offshore and outsourced distributed software development models and processes are facing challenges, previously unknown, with respect to computing capacity, bandwidth, storage, security, complexity, reliability, and business uncertainty. Clouds promise to address these challenges by adopting recent advances in virtualization, parallel and distributed systems, utility computing, and software services. In this paper, we envision a cloud-based platform that addresses some of these core problems. We outline a generic cloud architecture, its design and our first implementation results for three cloud forms - a compute cloud, a storage cloud and a cloud-based software service- in the context of global distributed software development (GSD). Our ”compute cloud” provides computational services such as continuous code integration and a compile server farm, ”storage cloud” offers storage (block or file-based) services with an on-line virtual storage service, whereas the on-line virtual labs represent a useful cloud service. We note some of the use cases for clouds in GSD, the lessons learned with our prototypes and identify challenges that must be conquered before realizing the full business benefits. We believe that in the future, software practitioners will focus more on these cloud computing platforms and see clouds as a means to supporting a ecosystem of clients, developers and other key stakeholders.

Global Software Development with Cloud Platforms Pavan Yara, Ramaseshan Ramachandran, Gayathri Balasubramanian, Karthik Muthuswamy, and Divya Chandrasekar Cognizant Technology Solutions, # 5/639 Old Mahabalipuram Road, Kandanchavadi, Chennai - 600096, India {Pavankumar.Yara,Ramaseshan.Ramachandran,Gayathri.Balasubramanian, Karthik.Muthuswamy,Divya.Chandrasekar}@cognizant.com http://www.cognizant.com/ Abstract. Offshore and outsourced distributed software development models and processes are facing challenges, previously unknown, with respect to computing capacity, bandwidth, storage, security, complexity, reliability, and business uncertainty. Clouds promise to address these challenges by adopting recent advances in virtualization, parallel and distributed systems, utility computing, and software services. In this paper, we envision a cloud-based platform that addresses some of these core problems. We outline a generic cloud architecture, its design and our first implementation results for three cloud forms - a compute cloud, a storage cloud and a cloud-based software service- in the context of global distributed software development (GSD). Our "compute cloud"provides computational services such as continuous code integration and a compile server farm, "storage cloud" offers storage (block or file-based) services with an on-line virtual storage service, whereas the on-line virtual labs represent a useful cloud service. We note some of the use cases for clouds in GSD, the lessons learned with our prototypes and identify challenges that must be conquered before realizing the full business benefits. We believe that in the future, software practitioners will focus more on these cloud computing platforms and see clouds as a means to supporting a ecosystem of clients, developers and other key stakeholders. Keywords: Globally Distributed Software Development, Cloud computing, Software-as-a-Service, compute cloud, storage cloud. 1 Introduction The last decade has witnessed Globally Distributed Software Development(GSD) model becoming a business necessity to capitalize on global resource pools, attractive cost structures, and round-the-clock development for achieving faster cycle-time accelerations [3,26,31]. At the same time, GSD has also brought unique nuances, complexities, and challenges ranging from technical, temporal, spatial, and process standpoints [4,25,34]. Some of these issues are long standing such as effective capacity planning, resource provisioning, software lifecycle O. Gotel, M. Joseph, and B. Meyer (Eds.): SEAFOOD 2009, LNBIP 35, pp. 129–143, 2009. c Springer-Verlag Berlin Heidelberg 2009  130 P. Yara et al. management, communication, coordination, and collaboration mechanisms. In addition, we are also seeing relatively new challenges with the rise of multi cores, virtualization, recent programming frameworks & abstractions, and other complex advances. Now, with the intensification of global economic activity and the resulting demand for cost/benefit analysis, the need for better outsourcing software engineering and management approaches has only become more pronounced. Also, over the years, the various structured and other disciplined software engineering approaches, advocated as key remedies for addressing these GSD challenges, have undergone refinement. A range of new, effective platforms and practices have emerged and have been adopted to address these unique challenges of GSD. These mechanisms - such as better communication and coordination practices [7], management of global software teams [28], effective resource leveraging with virtual teams [33], collaboration and knowledge management tools & techniques [16], programming methodologies and processes [13], software lifecycle models, service oriented architecture [27] concepts specifically web services and web 2.0 technologies, grid infrastructures to provide IT services [17] - have succesfully tried to address the GSD challenges. In this paper, we discuss one such emerging paradigm with several possible positive implications for GSD. "Cloud computing", as it is popularly known, is a paradigm that represents a disruptive business and technology concept with different meanings for different GSD ecosystem partners. For example, for IT users, it is a way to deliver computing, storage and applications over the network, often the Internet, from centralized data centers. For application developers, it is an internet-scale software development platform and run-time environment with several interesting use case scenarios such as always-on and always-available development environments, content collaboration spaces to share code, documents, presentations, discussions in a Facebook mode social media style, and services like online IDEs, continuous code builds and testing. For infrastructure providers and administrators, it is a massive, distributed data center infrastructure connected by IP networks to achieve economies of scale and grant "on-demand" access to computing capacity. Thus, cloud computing delivers "IT" as a service (ITaaS) which can be adapted to address some of the core challenges in the GSD. Cloud computing tries to replace the traditional desktop-as-a-platform with network-as-a-platform model. As such, it builds on decades of research in virtualization, parallel and distributed systems, utility computing, and more recent advances in the fields of networking, Web, and software services. The key idea of this paradigm is to provide a utility service, similar to a power grid, into which a user may plug-in regardless of location to access always-on, always-available, and device-independent IT services. In this way, it represents the next natural step in the evolution of computing and IT services. It promises to maximize the productivity of all IT-related activities. One such instance is where time and resources spent building or customizing application frameworks or building software/hardware infrastructure could be better spent on improving the business logic with cloud paradigm. Global Software Development with Cloud Platforms 131 We explore the nature and potential of clouds in the paper. The main contribution of this paper is in laying a cloud-based vision for GSD and formulating a generic architecture to address some of the GSD challenges. In doing so, we also showcase how we realize key business benefits with our cloudbased platforms. Our cloud platforms are easily adaptable to provide a common, managed, and powerful infrastructure to support GSD activities. Accordingly, the paper is structured as follows: we explain the basic concepts underlying cloud computing paradigm in Section 2 and discuss how clouds can be used within GSD ecosystem in Section 3. As such, we present a preliminary architecture of the framework and discuss candidate technologies to realize it in Section 4. Section 5 describes three GSD adaptable cloud service prototypes done at our lab - a Compute cloud providing computational services such as continuous code builds and integration with on-demand infrastructure; a Storage cloud providing massive online storage for software project-related artifacts, and a cloud-based online virtual lab solution for providing always-on and always-available training, testing and debugging facilities - before concluding. 2 Cloud Computing Paradigm Cloud Computing fulfills the long held dream of computing as a utility [45] and thus has the potential to transform how the IT ecosystem makes and uses hardware and software as a utility and a service. 2.1 Concepts The term Cloud Computing usually refers to online delivery and consumption model for business and customer services. These include IT services like Softwareas-a-Service (SaaS) and Storage or Server capacity as a service and many nonIT business and consumer services which are not computing tasks. All these are commonly referred to as X-as-a-Service (XaaS). However, for our paper, we adopt these following definitions: Cloud is a pool of highly scalable, abstracted infrastructure, capable of hosting end-customer applications, that is billed by consumption [20]. Cloud services refer to the consumer and business products, services and solutions that are delivered and consumed in real-time over network. Cloud Computing is an emerging IT development, deployment and delivery model which enables real-time delivery of products, services, and solutions over the Web (i.e., enabling cloud services). Technically, this paradigm refers to providing services on virtual machines allocated on top of a large machine pool, whereas in business terms, the term means a method to address scalability and availability concerns for applications. 132 2.2 P. Yara et al. Characteristics From an higher abstraction point, three aspects are primary to clouds - Elasticity, Pay-per-use model, and High-Availability. The key characteristics of cloud services, in the lines of Forrester [20], are as follows: 1. Standardized IT-based capability: Delivers compute, storage, network or software-based capabilities, solely or in combination through standard offerings. 2. Accessible via Internet protocols from any computer: Standards-based, universal network access through a regular web browser via HTTP, XMPP, Open ID, OAuth or Atom protocols. 3. Always available, and scales automatically to adjust to demand: Resilient and highly available; elastic enough to cope with scale and demand. 4. Pay-per-use or advertising-based: Service is paid up in three ways - advertising, subscription or transaction based. 5. Web or programmatic-based control interfaces: Uses service-based interfaces like XML, JSON and REST-style software connection standards. 6. Offers full customer self-service: Customers can provision, manage, and terminate services themselves and the control is via a Web interface or programmatic call to service APIs. 2.3 Examples Although the paradigm has emerged only recently, the implications of IT services provided through it are wide-reaching [5,8,40,45]. Cloudy infrastructure companies such as Amazon and GoGrid offer data storage priced by the gigabyte per month and computing capacity by the CPU-hour [1]. Office and productivity applications such as Google Apps [21], Zoho office suite, MS SharePoint Online, Cisco WebEx in the cloud make collaboration more accessible and highly available. SaaS companies offer CRM services through their multitenant shared facilities so that clients can manage their customers without buying software [39]. These use cases represent only the beginning of options for delivering all kinds of complex capabilities like online businesses, collaboration tools, R&D projects, quick project promotions, partner integration, new business ventures [11,40], etc. 2.4 Benefits The economic appeal of clouds is often summed up by the statement converting capital expenses to operating expenses [2]. There are other clear business benefits such as almost zero upfront infrastructure investment, just-in-time infrastructure, more efficient resource utilization, usage-based costing and a real potential for shrinking the processing time. We recommend the reader to refer [2,5,8,11,36,40] for more details. Global Software Development with Cloud Platforms 2.5 133 Public, Private and Hybrid Clouds As discussed in the introduction, clouds are the result of the natural transformation of the IT infrastructure of enterprises over the last decade and can take many forms and can be of many types. In this paper, we look at three major types of clouds. Public clouds are cloud services offered by third-party providers (vendors) such as Amazon, Google, Salesforce.com for public consumption. The vendors fully host and manage the infrastructure and charge customers for the resources they use, usually on a hourly or transaction based interaction. Private clouds are cloud services provided within the enterprise firewall and managed by the enterprises such as Boeing or GM. They offer the same benefits as public clouds but with fine-grained control, security and compliance norms. The major difficulty with private clouds is the complexity and cost involved in setting up “internal” clouds. Hybrid Clouds are a combination of public and private cloud properties. They leverage services that are in both the public and private work spaces and are typically used in scenarios like where they need to receive customer payments or do employee payroll processing. The major drawback with hybrid clouds is the difficulty in effectively creating and governing such a solution [18]. 3 Cloud Platforms for GSD While we are yet to see fundamentally new types of applications enabled by cloud computing, we believe that it offers compelling benefits with several important classes of existing applications for GSD model. The cloud paradigm, by its design, tries to optimize IT-related productivity by taking care of scaling and availability concerns and redirecting resources to long term strategic business development. Emphasizing communication, collaborative work and community interaction, we perceive clouds to offer huge leverage in many of the GSD related activities. For example, when companies outsource tasks, those tasks often require close working relationships between the companies involved. These collaborations grow organically to form communities around the particular task they aim to solve. This presents multiple issues. In previous generations of GSD, the environments and tools had to be made available to teams involved; organizations had to acquire the tools at their own cost, pool resources to provision the requested job; workers couldn’t locate and use their best tools for the job as determined by them; and they had to span time and space to share, discuss, collaborate or even publish content where necessary, as time and circumstance required. In addition, the exchange of content that happens in multiple forms such as emails, discussion forums, bug tracking systems, version control systems and logging, make it a very complex activity. Cloud-based platforms can be used for such cases, in the form of content collaboration spaces and always-on and always-accessible IT services. In this section, we discuss three such useful areas for GSD - development, quality assurance & testing, and IT operations. 134 P. Yara et al. 3.1 Development Clouds offer instant resource provisioning, flexibility, on-the-fly scaling, and highavailability for continuously evolving GSD-related activities. Some of the use cases include: – Development Environments: With clouds, the ability to acquire, deploy, configure, and host development environments becomes "on-demand". The development environments are, then, always-on and always-available to the concerned teams with fine-grained access control mechanisms. In addition, the development environments can be purpose-built with support for application-level tools, source code repositories, and programming tools. After the project is done, these can also be archived or destroyed. The other key element of these "on-demand" hosting environments is the flexibility through its quick "prototyping" support. Prototyping becomes flexible, in that as new code and ideas can be quickly turned into workable Proof-OfConcepts and tested. – Developer Tools: Hosting developer tools such as IDEs and simple code editors in the cloud eliminates the need for developers to have local IDEs and other associated development tools. This also offers the concerned project members to access the development environment and tools, across time-zones and places. – Content Collaboration Spaces: Clouds make collaboration and coordination practical, intuitive, and flexible through easy enabling of content collaboration spaces, modeled after the social software domain tools like Facebook or Flickr, but centering on project-related information like invoices, statments, RFPs, requirement docs, documentation, images and data sets. These content spaces can automate many project related tasks such as automatically creating MS Word versions of all imported text documents or as complex as running workflows to collate information from several different organizations working in collaboration. Each content space can be unique, created by composing a set of project requirements. Users can invite internal and external collaborators into this customized environment, assigning appropriate roles and responsibilities. After the group’s work is "complete", their content space can be archived or destroyed. These spaces can be designed to support distributed version control systems like bzr, mercurial, and git enabling social platform conversations and other content management features. – Continuous Code Integration: Compute clouds let ’compile-testchange’ software cycle on-the-fly do continuous builds and integration checks to meet strict quality checks and development guidelines. They can also enforce policies for customized builds. – APIs & Programming Frameworks: Clouds also compel developers to embrace standard programming model APIs where ever possible and adhere to style guides, conventions, and coding standards in meeting the specific project requirements. They also force developers to embrace new programming models and abstractions such as .NET, GWT, Django, Rails, Global Software Development with Cloud Platforms 135 and Spring for increasing overall productivity. One more key feature of using clouds is that they enforce constraints, which pushes developers to address the critical next-gen programming challenges of multi-cores, parallel programming and virtualization [22]. The software engineering community is also fast evaluating approaches like Agile, Automatic, Extreme, Pair and Re-factoring to suit clouds [2,22]. The above mentioned use cases can be applied within both “public” and “private” clouds. If the client requirements require “security” and "control” to be the main concerns, we recommend offshore development companies to adopt “private” clouds as they allow enterprises to retain control, but at the same time offer them flexibility, availability and economies of scale. 3.2 Testing and Quality Assurance There are two components to cloud computing in software testing. Clouds provide the computing infrastructure for doing software testing across platforms and in various combinations. The other component uses clouds to run fully functional test cases with industry standard frameworks and regression support. The use of virtual appliances for providing the requested computing requirements is becoming a practice in the software testing domain. Virtual appliances are a set of virtual machines pre-built, pre-configured, ready-to-run applications packaged along with optimized operating systems. These enable flexible and quick software testing. On the other hand, they are also used to automate execution of some industry standard tests, support debugging and code coverage tools to identify gaps in test procedures. In addition, the ability of clouds to simulate thousands of users hitting Web applications is particularly attractive. Thus, cloudifying testing services opens up interesting possibilities. One immediate use case is where cloud testing is used to verify the real scalability of sites, servers, applications, and networks in advance of a genuine surge in traffic. 3.3 IT Operations Now, clouds are increasingly being used to simplify the management part of operations in offshore development centers. Recent studies shows that cloud deployment times can be reduced to less than 6 hours from the traditional IT deployment times of 14-24 days for eight typical IT management tasks [29]. These tasks include operating system tasks like back-up, recovery, installation of patches, network tasks including server assignments, configuration of network and security parameters, installation of software, etc. The first significant advantage of such clouds is the “cost savings” factor. The traditional IT model requires business users to make a front-loaded investment in software and hardware as well as a lifecycle investment in professional staff to maintain servers and upgrade software. Clouds shift much of this expense to a “payas-you-go” model and so offer significant cost advantages in terms of power, space, cooling, hardware and operations personnel [2,11,24,44]. Other key 136 P. Yara et al. operations benefits include the ease and effective use for backup and restore activities to provide business continuity; ability to handle security and archiving required for accountability and compliance regulation laws such as SOX, HIPAA and the powerful software configuration management [37] it provides so that infrastructure gets provisioned, deployed and relinquished according to business needs. 4 Our Architecture and Service Offerings Having listed the cloud advantages for GSD model, we present an applicable and generic high-level architecture of clouds, and a hierarchy of cloud service offerings possible with our architecture to benefit all the key stakeholders in the ecosystem. Application Server Web-based Services Software-as-a-Service Platform App-components-as-a-Service Infrastructure Software-platform-as-a-Service Virtualization Virtual-infrastructure-as-a-Service Storage Network (a) Cloud Architecture Physical-infrastructure-as-a-Service (b) Cloud Service offerings possible Fig. 1. Generic Cloud Architecture and various service offerings possible with the architecture 4.1 Architecture Overview We present our architecture as a layered stack to suitably represent the growing list of technologies and IT offerings in this space. There are several elements to the entire GSD ecosystem and the architecture is envisioned as being spreadout over and catering to cover most of these elements (generic and abstract). Figure 1(a) shows a generic cloud architecture for GSD. The Application layer covers the Web-based UIs, web service APIs, multi-tenant architecture and a rich variety of configuration options. The Platform layer adds a software stack to the underlying infrastructure layer, manages virtual machines and supports the development, integration and run-time execution of cloud application software. The Infrastructure layer makes use of the underlying virtual infrastructure so as to economically scale to very high volumes, and preferably do so in a granular fashion. The Virtualization layer abstracts the physical resources like servers, storage or network devices and presents equivalent logical resources for consumption to other layers. The architecture is designed to facilitate service offerings that serve to improve processes in GSD. Thus, we attempt to address issues pertaining to cost constraints, hardware/software resource provisioning and collaboration through our generic and high-level architecture. Global Software Development with Cloud Platforms 4.2 137 Cloud Service Offerings Embarking on familiar GSD product categories like developer tools, middleware and IT infrastructure tasks, we segment cloud services based on the proposed cloud architecture, as shown in Figure 1(b). Software-as-a-Service (SaaS) delivers a single application through the browser to thousands of customers using a multi-tenant architecture. For the customer, it means no upfront investment in servers or software licensing; for the provider, with just one app to maintain, costs are low compared to conventional hosting (e.g., Salesforce.com [39]). App-components-as-a-Service spans a spectrum from mash ups to third-party APIs. These app components are aimed at offering developers higher-level software modules for combining existing code to create applications (e.g.,Live Mesh API [32]). This should improve efficiency and encourage code reuse in the development process, which is one of the pain areas in GSD. Software-platform-as-a-service (PaaS) is an entirely virtualized platform that includes one or more servers, operating systems, and specific applications (e.g.,Google App Engine [22]). Virtual-Infrastructure-as-a-service (IaaS) or Hardware-as-a- service (HaaS) is the delivery of computer infrastructure as a service. This layer differs from PaaS in that the virtual hardware is provided without a software stack (e.g., Amazon EC2 [1]). There are other offerings also possible such as communication-as-a-service, desktop-as-a-service, database-asa-service, data-storage-as-a-service, data-as-a-service, data-mining-as-a-service, finance-as-a-service, framework-as-a-service, IDE-as-a-service, integration-as-aservice, and monitoring-as-a-service [11,14,20,36,44]. 4.3 Key Enabling Technologies A lot of enabling technologies contribute to the outlined cloud architecture Here, we identify some state-of-the-art technologies that make clouds practical and possible: Virtualization enables clouds to deliver on-demand IT infrastructures through virtual machines (VMs). VMs are created and managed by a Virtual Machine Monitor (VMM), which is the software layer between the operating system and the physical machine. VM-based platforms offer several advantages including better isolation, availability and portability apart from the flexibility and scalability it brings. There is a lot of renewed interest in virtualized platforms these days which is evident by its presence in various forms such as Server, Desktop, Application, Storage and Network coming from industry players like VMware, Citrix, Microsoft, Red Hat, Cisco, and Sun. For more details, the reader is advised to refer [6,38,43]. MapReduce [15] is the dominant programming model used in clouds that provide on-demand computing capacity. Map Reduce assumes that many common programming applications can be coded as processes that manipulate large data sets of <key,value> pairs. The Map process maps each <key, value> pair in the data set into a new pair of <key’,value’>. The Reduce 138 P. Yara et al. process, then, merges values with the same key. Although this is seemingly simple model, it has been used to support a large number of applications, that manipulate data. Hadoop is an open source implementation of this model [9]. Stream-based parallel programming models, in which a User Defined Function (UDF) is applied to all the data, are also commonly used. Other programming models modeled after Google File System (GFS) and Big Table are also common in many cloud forms. GFS refers to a scalable distributed file system for large data-intensive applications [19]. It not only provides fault tolerance while running on inexpensive commodity hardware, but also delivers high aggregate performance to a large number of clients. Data automatically get distributed to nodes at load time, and are processed locally, in parallel with output data written to local disks, forming a single user-accessible volume. Big Table [12] represents a database layer with the key idea of separating organized storage from query storage. It is a distributed storage system for managing structured data that is designed to scale to a very large size. HDFS and HBase are the open source implementation of these models [9]. Service-Oriented Architecture (SOA) allows for delivery of an integrated and orchestrated suite of functions to an end-user through composition of both loosely and tightly coupled functions, or services - often network-based, following industry standards like WSDL, SOAP and UDDI [10,27]. Some cloud forms also make use of frameworks such as Pig [35], Zookeeper, Hive, Sawzall [23], LINQ [30], Condor [42] to cope up with the complexity, frequent failured nature of commodity hardware like hard drive crashes, network up-downs. 5 Our Experiments We have used these technologies to implement three GSD adaptable internal cloud prototypes. This section presents our experiments and initial results. 5.1 Compile Server Farm The traditional modes of compiling large software projects across clusters has always been done using message passing interfaces such as MPI or OpenMP. These interfaces are difficult to code, prone to errors and often time-consuming. Moreover, in GSD environments, there has always been a need for faster compile and build cycles to test, debug and maintain complex software projects across verticals and domains. In such scenarios, lots of projects gets created, compiled, and maintained on a regular basis to suit the business and client requirements. Each of these projects typically constitute thousands of files, which when compiled might take hours or days together to build and deploy. This changecompile-test cycle is one of the most time consuming event of a project and it requires huge resources like server class machines, or clusters. Global Software Development with Cloud Platforms 139 Our first prototype tries to address this common issue using compute cloud concepts. We represent a compute cloud as a “nexus of hardware, software and data which provides compute services over network”. Our compute cloud is actually a “compute server farm”. By using Hadoop and Condor, we managed to speedup the Change-Compile-Test cycle of large software projects. Another, key motivation for our compute clouds is to do continuous software code integration. The following algorithm is used for our “compute server farm”. Algorithm 1. Compute Server Farm with Hadoop and Condor 1. 2. 3. 4. 5. 6. Client workstations launch jobs Condor dynamically allocates clusters Hadoop-on-Demand starts the MapReduce program on the clusters MapReduce program reads/writes into HDFS When done, the results are either stored in the HDFS and/or returned to the client Condor reclaims nodes We tested this approach on compiling and building up Eclipse IDE from its Java source code. Condor [42] is used as a batch scheduler to schedule jobs on idle workstations. We have written a Map-Reduce program [15] targeting the hadoop run time environment to automatically split, distribute, store and parallelize the computation using HDFS [9] as a temporary file storage. We used JDepend for analyzing the source dependencies. The results are encouraging with the total compilation taking about 80 minutes with 5 standard desktop workstations, against a standalone job, taking 150 minutes. Our experience with this approach encourages us to believe that this is a viable and simpler way of mimicking the compute cloud for larger GSD projects. 5.2 Online Storage Cloud The key motivation for an online storage cloud is the need for scalable data management platform to support variety of typical use cases in GSD environments. It should support an ecosystem of users and developments growing around project content and fast-changing content related tasks and ideas across domains and requirements. Additionally, GSD projects have to deal with massive, structured, unstructured and queued files. This sheer number of files implies cumbersome storage and organization on existing storage systems; for example, while a SAN can provide enough storage, the simple file system interface layered on top of a SAN is not expressive enough to manage these files. Moreover in such projects, teams and project members tend to move from one location to other based on various business and administrative needs. When such team or project transfers happen, there is a strong demand for personal data backup and restoration requests from all over the project. This presents multiple problems. First, personal computer disks are limited in capacity and unreliable to host prolonged project specific or personal data. Second, there is a 140 P. Yara et al. need for policy management and capacity management to deal with the growing security concerns and unprecedented data growth. Third, the ability to access data, driven by policy based management, remotely is not present. To address this, we designed a storage cloud with these characteristics: a storage service delivered over a network (Internet or Intranet); economically scaling capacity and performance; easy to manage (e.g. terabytes+); private and driven by enforced policies. At its core, our storage cloud is a middleware layer with virtualized mass storage, allowing the underlying physical storage to be NFS or NAS, shared nothing cluster file systems, or some combination of these. Files created or hosted in our cloud are uniquely identified with a URL so that it can be directly addressed or collectively accessed through a FUSE based virtual directory. Our storage cloud is implemented with three nodes making use of Xen VMs [6], HDFS [9], NFS and powered by OpenQRM management solution. This remote storage is mounted as a local virtual directory through File System in USErspace (FUSE) [41] modules in Linux and Windows. Apart from the GSD project artifacts, it can also serve other needs like content collaboration spaces hosting code repositories, digital content, file archives, streaming media as outlined in the Section 3. 5.3 Lab Any Where(LAW): Online Virtual Labs Many GSD partners like IT and ITeS organizations face major challenges in aligning their training delivery mechanisms to business objectives. These challenges - related to cost, time, reach, and effectiveness- are prompting organizations to revisit their traditional training delivery modes. Dedicated physical classes don’t reach a large audience in offshore and certainly difficult for on-site teams due to logistics related issues. Online e-learning systems provides scalability and rapid delivery but still offshore/on-site people miss "handson" experience with real software systems. Other methodologies including Web conference products, terminal emulators, web-based work spaces, Learning Management systems also do not provide "real" interaction with software systems. Our Lab-Any-Where (LAW) prototype, tries to address this training problem, by making use of cloud principles and components. LAW is a cloud-based application designed to provide fully-immersive technical training and software testing labs over network (Intranet or Internet). This Web application also provides richly featured platforms for centrally managing hands-on training and testing scenarios via scheduled and on-demand delivery mechanisms. We make use of virtual appliances in achieving rapid deployment through Web browser interface [38]. Our design prototype is implemented in two modes: one with Microsoft Virtual Server (MVS) for delivering and testing windows OS-based training environments, and the other one with User Mode Linux (UML) for Linux OS-based scenarios. As such, the LAW prototype has two major components: (i) LAW Management that can manage, control and ultimately orchestrate Global Software Development with Cloud Platforms 141 lab resources through configurable workflows, scheduling, customization and reporting (2) Delivery component that does automatic deployment through virtual appliances with secure access. 6 Conclusions and Future Work Clouds represent an inflection point in global distributed software development. The concept draws on many existing technologies and architectures, as we have seen in this paper. Although there is some FUD (Fear, Uncertainty, and Doubt) with all the hype about clouds, we see clouds as a viable and effective platform for offshore and outsourced development in the longer run. In this paper, we outlined some positive indications and resultant implications if they are deployed in a globally distributed software model. However, there are still concerns with respect to vendor lock-in, SLA control, privacy, reliability, data migration & access, auditing and regulation compliance norms. We hope that as the IT industry works to solve these problems, cloud adoption will occur in phases, from the nascent clouds in place today to mature cloud-based platforms with enhanced security and better SLA norms. Moreover, it is our belief that cloud paradigm can provide significant benefits to all key stakeholders in the GSD ecosystem, as evident from the prototypes we showcased here. We also continue to monitor and experiment with the different architectural, programming, and operational models of clouds and share our results with the GSD community. One active area is to explore the ability of clouds to play a game-changing role in software testing. We intend to investigate further into cloudifying the testing services as it provides ample scope and ground to check the full potential of cloud paradigm. References 1. Amazon: Amazon web services for simple db, s3, ec2, http://aws.amazon.com 2. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al.: Above the Clouds: A Berkeley View of Cloud Computing. University of California, Berkeley, Tech. Rep. (2009) 3. Aspray, W., Mayadas, F., Vardi, M.Y.: Globalization and offshoring of software. A Report of the ACM Job Migration Task Force, Executive Summary and Findings. ACM, New York (2006) 4. Atkinson, R.D.: Understanding the offshoring challenge. Progressive Policy Institute, Washington, DC (2004) 5. Baker, S.: Google and the wisdom of clouds. Business Week (2007) 6. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. ACM SIGOPS Operating Systems Review 37, 164–177 (2003) 7. Battin, R.D., Crocker, R., Kreidler, J., Subramanian, K.: Leveraging resources in global software development. IEEE Softw. 18(2), 70–77 (2001) 8. Bechtolsheim, A.: Cloud Computing and Cloud Networking. talk at UC Berkeley (2008) 142 P. Yara et al. 9. Bialecki, A., Cafarella, M., Cutting, D., Malley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware, http://lucene.apache.org/hadoop 10. Buschmann, F.: Pattern-oriented software architecture: a system of patterns. Wiley, Chichester (2002) 11. Buyya, R., Yeo, C.S., Venugopal, S., Ltd, M.P., Melbourne, A.: Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities. In: Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications (HPCC 2008). IEEE CS Press, Los Alamitos (2008) 12. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006) (2006) 13. Cheng, L.T., de Souza, C.R., Hupfer, S., Patterson, J., Ross, S.: Building collaboration into ides. Queue 1(9), 40–50 (2004) 14. Church, K., Hamilton, J., Greenberg, A.: On delivering embarassingly distributed cloud services. Hotnets VII (2008) 15. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, Berkeley, CA, USA, p. 10. USENIX Association (2004) 16. Desouza, K., Awazu, Y., Baloh, P.: Managing knowledge in global software development efforts: Issues and practices. IEEE software 23(5), 30–37 (2006) 17. Foster, I., Kesselman, C.: The grid: blueprint for a new computing infrastructure. Morgan Kaufmann, San Francisco (2004) 18. Fryer, K., Gothe, M.: Global software development and delivery: Trends and challenges. IBM Developer Works 1 (January 2008) 19. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: SOSP 2003: Proceedings of the nineteenth ACM symposium on Operating systems principles, pp. 29–43. ACM, New York (2003) 20. Gillett, E.F., Brown, G.E., Staten, J., Lee, C.: The new tech ecosystems of cloud, cloud services, and cloud computing. Forrester Research Report (August 2008) 21. Google: Google docs and spreadsheets, http://docs.google.com 22. Google: Google’s cloud implementation as app engine, http://code.google.com/appengine/ 23. Griesemer, R.: Parallelism by design: data analysis with sawzall. In: CGO 2008: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, p. 3. ACM, New York (2008) 24. Hamilton, J.: Perspectives blog, http://perspectives.mvdirona.com 25. Herbsleb, J.D., Mockus, A.: An empirical study of speed and communication in globally distributed software development. IEEE Transactions on Software Engineering 29(6), 481–494 (2003) 26. Herbsleb, J., Moitra, D.: Global software development. IEEE software 18(2), 16–20 (2001) 27. Huhns, M.N., Singh, M.P.: Service-oriented computing: Key concepts and principles. IEEE Internet Computing 9(1), 75–81 (2005) 28. Krishna, S., Sahay, S., Walsham, G.: Managing cross-cultural issues in global software outsourcing. Communications of the ACM 47(4), 62–66 (2004) 29. Lin, G., Fu, D., Zhu, J., Dasmalchi, G.: Cloud computing: It as a service. IT Professional 11(2), 10–13 (2009) Global Software Development with Cloud Platforms 143 30. Meijer, E., Beckman, B., Bierman, G.: LINQ: reconciling object, relations and XML in the.NET framework. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, p. 706. ACM, New York (2006) 31. Meyer, B., Hochschule, E.T., Zurich, S.: The unspoken revolution in software engineering. IEEE Computer 39(1), 124 (2006) 32. Microsoft-Live: Microsoft live mesh api, http://www.mesh.com 33. Montoya-Weiss, M., Massey, A., Song, M.: Getting it together: Temporal coordination and conflict management in global virtual teams. Academy of Management Journal, 1251–1262 (2001) 34. Olson, J.S., Olson, G.M.: Culture surprises in remote software development teams. Queue 1(9), 52–59 (2004) 35. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A not-soforeign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099–1110. ACM, New York (2008) 36. Rangan, K.: The Cloud Wars: $100+ billion at stake. Technical report, Tech. rep., Merrill Lynch (2008) 37. ReductiveLabs: Puppet configuration management, http://reductivelabs.com/trac/puppet 38. Rosenblum, M., Garfinkel, T.: Virtual machine monitors: Current technology and future trends. IEEE Computer 38(5), 39–47 (2005) 39. Salesforce: Salesforce customer relationships management (crm) system, http://www.salesforce.com/ 40. Siegele, L.: Let It Rise: A Special Report on Corporate IT. The Economist (October 2008) 41. Szeredi, M.: Filesystem in userspace, http://fuse.sourceforge.net 42. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: The Condor experience. Concurrency and Computation: Practice and Experience 17(24), 323–356 (2005) 43. Uhlig, R., Neiger, G., Rodgers, D., Santoni, A.L., Martins, F.C.M., Anderson, A.V., Bennett, S.M., Kagi, A., Leung, F.H., Smith, L.: Intel virtualization technology. IEEE Computer 38, 48–56 (2005) 44. Vogels, W.: A Head in the Clouds - The Power of Infrastructure as a Service. In: First workshop on Cloud Computing and in Applications (CCA 2008) (October 2008) 45. Weiss, A.: Computing in the clouds. ACM net Worker 11(4), 16–25 (2007)