Build and Release Management:: Understanding The Costs of Doing It Yourself

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Build and Release Management:

Understanding The Costs of Doing it Yourself


John Graham-Cumming Co-Founder Electric Cloud, Inc.

Table of Contents
Introduction Building it Yourself 1. How it All Begins 2. Feedback Becomes Essential 3. Outgrowing a Single Machine 4. Supporting Multiple, Distributed Machines 5. Monitoring and Management 6. Extensibility How Did we Get Here? The Cost A Commercial Alternative: ElectricCommander What Makes ElectricCommander Unique? Conclusion About Electric Cloud 3 5 6 9 10 11 11 12 14 15 16 16 3

2. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

Introduction
If you were starting a new software development team today, could you imagine one of the 'to do' list items being 'build our own source code management system'? It's unlikely that anyone would dream of building their own SCM system today: there are multiple open source and commercial SCM systems available to meet all possible needs. But the same can't be said for build and release management where most organizations handle these processes with manual, script-driven systems. It's ironic because the most widely used build tool, Make, was created around the same time in the early 1970s as the first SCM tool, SCCS, yet build and release management has lagged behind. In the intervening 30 years SCM has progressed from simple tools to complex, commercial products as software projects have grown in size and complexity. Build and release management has only recently caught up with the release of suites of tools that simplify and improve the entire process. build These tools mean that build managers no longer have to create from scratch, nor are they saddled with ongoing systems

maintenance costs for home grown systems. This paper looks at the needs of build and release management, how those needs are met by open source and commercial tools that go way beyond the likes of Make or Ant, and the real costs of creating and maintaining a homegrown system.

1. How it All Begins


Today most software development organizations do create their own build systems and they gradually end up with unmanageable, poorly documented or impenetrable code. This happens because, initially,

making your own build and release system looks easy. At first a small team can get away with a simple build script. with a little bit of Perl or some other scripting language. The script

often grows out of the software's Makefile or Ant script by wrapping them

3. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

While the Makefile will show how to build the software, the wrapper script probably incorporates steps like checking out the source code, running smoke or unit tests and 'deploying' the software by copying it to an appropriate directory. Ambitious build managers will have included a mechanism to send an email when the build ends, perhaps even including details of the success or failure of individual steps in the process (sometimes including snippets of log files relevant to debugging a broken build). importantly, time. The scope of the build script is really only limited by the build manager's imagination and, more Some build scripts might even try to interrogate an SCM to discover who checked in since the last build, and hence might be to blame for a build failure. Once the relevant steps have been written into the build script it's possible to automate the builds by calling the script from a 'cron' job or similar job scheduler. It's typical to see a team install a dedicated 'build machine' which runs the build script and produces regularly scheduled builds (at least one nightly build). Teams that decide to do continuous integration 1 (CI) usually start to see problems with home grown build scripts. To make CI work they build another script that monitors the SCM system for changes to a particular branch, determines when the changes have finished (by waiting for a 'quiet period' of say 15 minutes to give developers enough time to commit) and then uses the existing build script to run the build. This means integration with the SCM system and modifying the build script to build the correct branch of code. Typically, teams doing CI want feedback on these 'stimuli builds' (builds that were the result of some change in SCM) via email, web pages or RSS; this just adds more work to the CI build script wrapper. And doing CI can mean many builds per day, tying up the one 'build machine' that was originally designated.

The scope of the build script is really only limited by the build manager's imagination and, more importantly, time.

1 Defined as Continuous Integration is a software development practice where members of a team integrate their work frequently; usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Continuous Integration, Martin Fowler, http://www.martinfowler.com/articles/continuousIntegration.html.

4. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

Stepping back it's easy to imagine that all the Makefiles, Ant scripts, Perl wrappers and other assorted pieces of code can easily constitute 5,000 to 10,000 lines of code to be managed by the build team. week to maintain it. At this size the code is manageable by a single person requiring a few hours of work per

2. Feedback Becomes Essential


As teams move to Continuous Integration two things happen: the number of builds being run increases dramatically and the need for feedback also increases. When switching from a single nightly build to CI it will initially be possible to use a single build machine; although this limits the number of builds per day. CI is still possible without going to the expense and complexity Build feedback gives the vital signs of the software being constructed; with CI, the build feedback becomes a heartbeat of sorts indicating the health of the project. of distributing the build across many machines. On the other hand, build feedback becomes very important. The very

idea of CI is that multiple builds occur per day and that developers get feedback on the status of those builds to ensure that the software is building and testing correctly. Email notifications are the most basic form of build feedback, but webbased interfaces for viewing build status, viewing the build queue (to determine when a CI build will run) and debugging broken builds are essential to successful CI. Build feedback gives the vital signs of the If developers software being constructed; with CI, the build feedback becomes a heartbeat of sorts indicating the health of the project. want to know build status, they also need to be able to drill down to discover quickly why a build failed to get the CI build train back on track. Martin Fowler's seminal paper 'Continuous Integration' describes build feedback under the heading 'Everyone can see what's happening'. system. To build such a system requires a much larger investment than the initial wrapping of the build process with scripts. A web server must be put in The only way to achieve that is through a web-based (or even RSS-based)

5. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

place and maintained, and significant amounts of software must be written to dynamically produce web pages showing build status, enable drill down to discover build problems, analyze builds logs and isolate relevant sections for display, and store build history in a database (with many builds per day, even the problem of managing just the build logs becomes significant). One open source project that can help in some situations is CruiseControl. Mike Clark's Pragmatic Project Automation book 2 describes CruiseControl as cron for Ant, but with many bells and whistles. If you are working in a Java/Ant environment and want something more powerful than cron for running builds and getting status, then CruiseControl is a reasonable starting point. It lacks many features that become essential as projects and builds grow, but it's better than writing everything yourself. Whatever approach has been taken, the build system now starts to look like a real software development project and probably requires its own section in the company's SCM system. The total size of the scripts associated with the build system will have grown, meaning that the system requires maintenance and upgrading (as software projects change and grow) at least one day a week.

3. Outgrowing a Single Machine


The next big step in build system complexity occurs when the build outgrows a single machine. This can happen for a number of reasons: rapid changes in source may place a heavy load on a CI system necessitating multiple simultaneous builds, the number of parallel branches may mean that many simultaneous builds are necessary for a 'complete build', there may be many target processors (in the embedded world) or target platforms meaning that a single machine is either not sufficient for all the variants or simply not the right platform. Whatever the cause, builds typically outgrow a single build machine quite rapidly as the 'build matrix' of builds and resources needed by each build expands.

Pragmatic Project Automation, Mike Clark, Pragmatic Bookshelf Books, July 2004.

6. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

Another way multiple machines become important is when builds are too slow. Build managers often try to speed up builds by buying faster hardware, but reach a limit and have to resort to parallelism. They can parallelize the build either by manually breaking the build up into discrete blocks that can be safely built in parallel, or by using an open source or commercial tool that parallelizes the build across a cluster of machines (such as Electric Clouds ElectricAccelerator). Moving from one machine to many machines introduces a number of problems that have to be integrated into the build scripts: a) Matching Build Resources: since it's unlikely that all builds can run Whatever the cause, builds typically outgrow a single build machine quite rapidly as the 'build matrix' of builds and resources needed by each build expands. on all machines in the build cluster (because of different operating systems or processor types) the build system has to be smart enough to know where it can run any given build. match a build to specific build resources. That means creating some repository of configuration information that can It's unlikely that this is a one-to-one mapping: a single build may need to be run on a number of target platforms to constitute a full build and results from all platforms need to be merged for reporting purposes. b) Resource Allocation: the single build machine's resources were fairly simple to manage. Either the machine was running a build or it was not (and discovering this was as easy as looking at the build monitoring web site, or even running 'top'). With multiple builds, getting an overview of which machines are available is much more difficult and assigning appropriate machines to the queue of builds to be run becomes a resource allocation task. If you couple the available hardware, operating systems and processors with the different builds and their resource demands, it's easy to see that resource allocation is a complex optimization problem that must be solved if build machines are to be used efficiently. c) Remote Invocation: multiple machines mean either building remotely (invoking an entire build on a remote machine) or running individual build steps remotely. Either way, the build

7. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

system needs to support remote command invocation across differing platforms. This could be as simple as 'rsh', or more complex if a mix of Unix and non-Unix (e.g. Windows) platforms are involved. In addition, the remote invocation system must be able to return generated files, command status, log output and console messages to the invoking machine so that the build can be assembled from parts built remotely, and reports can be generated from failing build steps. d) Access Control: that's easy with a single machine, but not with many. Controlling access to the cluster of machines becomes access can significantly affect running builds. important because disturbances in the build cluster caused by unauthorized Secondly, keeping the build machines at a specific patch level or operating system revision may be vital to ensuring that the software builds correctly. example, problems). there may be At the same time the large build different maintainers for different infrastructure may require multiple people to have access (for platforms, or developers may need access to debug build These requirements mean that the build cluster itself Also, some build Once again, needs appropriate access control mechanisms.

organizations will put in place self-service builds where any member of the organization can request a build. only granted to appropriate parts of the source tree. e) Prioritization: a large build cluster inevitably means that multiple builds are running and that, in turn, means that build prioritization becomes important. With a single build machine, such a system requires access control to ensure that access is

build prioritization is pretty simple: when there's something more important to do than the current build you just stop it and start the important build. level feature. With multiple build machines tied up with multiple builds spread across the cluster, prioritization is a topNot only must different levels of CI builds be scheduled (for example, to prevent a specific team from starving out the build resources because of rapid changes), but sudden changes in priority must be available to deal with special situations (such as a patch release that has to go out 'today' for

8. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

an important customer, or a rebuild of a release that is close to its release date because of a late-breaking change). f) System Failure: Inevitably systems or software will fail fairly Dealing with system failure Failures

frequently when a large number of machines of different types are joined together in a build cluster. needs to be a feature of a multi-machine build system.

can come in a number of forms (outright hardware failure causing a machine to crash, software failure where a compiler or other tool crashes, or a runaway process on a remote machine). The build scripts need to be able to detect and report such failures; it's vital that a runaway process, for example, is not able to hold up the entire build cluster: build scripts need to be either multithreaded or event-driven. Ideally, a build system would be able to workaround a failure in real time by reassigning the build or build tasks from a failing machine to another machine in the cluster. Such live reassignment would have to operate in conjunction with prioritization, access control, and resource allocation processes.

4. Supporting Multiple, Distributed Teams


Build clusters are inherently shared resources, meaning that the ability to support multiple teams using the same build system is essential. Build resources will be used to create different types of software for different teams, but different teams frequently share build steps that can be packaged, maintained and distributed. For example, each team is likely to need a step that extracts sources from SCM; this can be stored as a single shared procedure saving development time for all teams by sharing common build steps. Another problem related to build clusters is that they are frequently accessed remotely and shared across geographically dispersed teams. This only intensifies the need for each of the points raised above. This will undoubtedly require integration with existing directory management systems (such as LDAP) for company-wide access control and the ability to share configuration information across teams and sites so that the build system is reusable without each team reinventing the wheel.

9. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

Implementing a build system that has these features goes well beyond the typical build scripts and well beyond the capabilities of open source solutions like CruiseControl. Such a build system requires a significant software development project with the inherent need for release management and maintenance.

5. Monitoring and Management


Once the basic issues of keeping a multiple machine cluster up and running are worked out day-to-day, management comes to the forefront. With so many builds running it will be necessary to be able to intelligently schedule the builds (through the prioritization mechanism and taking into account the different resources needed for each type of build and even the length of the builds to ensure optimal use of the build cluster), cancel running builds (because there'll be times when there's no point letting a Implementing a build system that has these features goes well beyond the typical build scripts and well beyond the capabilities of open source solutions. Such a build system requires a significant software development project with the inherent need for release management and maintenance. build run all the way through), check the live progress of a build (especially when trying to debug a build problem, or just when an engineer is impatient to know when their build will complete) and monitor build machine health (such as, which disks are filling up or is one machine running unusually slow). These system monitoring tasks are exacerbated by the fact that the cluster will consist of a large number of heterogeneous systems. Understanding the utilization of the build resources for planning purposes and generating customizable reports from the build process are the next step. The large investment in build infrastructure will drive a desire for management level reporting, and it'll become essential to be able to mine build logs to tune the overall build system and maximize resource utilization. Equally important is being able to predict the extra build Being able to predict up-front the extra build resources needed as the software product grows and the development organization changes. resources needed to support a new product will be an important part of the product planning process. Failure reporting is also needed so that the build manager can provide feedback to development about problem product areas (e.g. a particular module that experiences a high rate of build failures); this means that the build system is truly part of the overall software quality process.

10. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

6. Extensibility
All build systems need the ability to integrate with a very diverse range of tools. A typical build system might use a Make or Ant tool from one provider, an SCM system from another, a separate defect database for tracking, a special set of tools for running tests and probably has to integrate with home grown tools written in the scripting language most popular in the organization. Since many integrations will be common (such as integrations with SCM or testing tools) a build management system should have out-of-the-box integrations available for common tools. But ready made integrations are not enough. A build management system needs the flexibility to integrate with almost any tool. That makes welldefined, and open, APIs essential; simple interfaces for command-line or web-service based tools make a build management system easy to extend.

How Did we Get Here?


Each of the tasks listed here will require anywhere from a few days to a few months to develop. This adds up to several engineer years of work, plus significant ongoing work to maintain the system (see cost estimate below). Is this where you want to invest your development resources? These estimates assume that the system is built with a solid architecture for adding all of the new features. However, when you start off building your own its unlikely that you will invest in the kind of architecture needed to handle everything here. As result, youll end up trying to add fixes to a weak and quirky system. As the fixes accumulate, each new feature gets harder and harder to add. Youre unlikely to get anywhere near the top of this curve without a major re-architecting of your system.

11. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

As features and capabilities are added to the build and release system, the level of investment required to develop and maintain the system grows dramatically.

The Cost
To understand the size of the effort required to implement the type of build and release system outlined in this paper, it is possible to make some rough estimates of the cost of implementing the features. Start by assuming that the team has already put in place a solution based around either their own scripts (or an open source project like CruiseControl) that gives them basic scheduled builds, plus continuous integration builds and a simple web-based interface for build status with email notifications. The following is a rough estimate of the effort (in person months) of implementing a fully-functioning build and release system.

12. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

Feature
The Basics On-demand builds (developers can launch builds): Incremental builds (build system can do incremental and full builds to save time): Add control of individual build steps so that builds can be broken down and managed as discrete chunks of work: Dynamic resource allocation to cope with the demands of differing builds on differing build hardware/software: Add resource access control so that different build types (e.g. developer vs. nightly) get different build resources: Total for The Basics Security Integrate with standard authentication mechanism (e.g. LDAP) and authenticate use of build resources: Add role-based access control (e.g. who's allowed to start/stop builds, edit build scripts): Support secure remote access: Total for Security Flexibility Modify build scripts to make them easily maintainable and reusable: Build log management and searching for better build reporting and drill-down: Add project level management of builds to isolate different projects and allow better management of build steps and resources: Total for Flexibility Advanced Add APIs so that the build system can be scripted and enhanced externally (e.g. a REST interface, Perl module or other interface): Make the build system agnostic towards tools used (such as the compilers and linkers) so that heterogeneous environments are supported: Total for Advanced TOTAL 1 1 1 2 1

Time (person months)

6 months 2 2 1 5 months 1 1 2 4 months 2 1 3 months 18 months

That's a total of 18 person months to put in place a flexible build system suitable for enterprise software development. In addition to the If the system is development cost of creating the system, a system of that size will require a full-time person to maintain it going forward. needed. to be enhanced over time, additional development resources will be And thats not to mention costs associated with ongoing

maintenance of the system, knowledge lost if/when the developer who built the system leaves the company, etc.

A Commercial Alternative: ElectricCommander


The challenges outlined above can be met by an extensive development project with ongoing maintenance. But no organization would dream of doing that if the subject was SCM, so why spend the time and money on a build development project? Electric Cloud offers a commercial build and release management solution called ElectricCommander. ElectricCommander is an enterprise-class solution for automating software production processes. It helps teams to make software build, package, test, and deployment tasks more repeatable, more visible, and more efficient. Instead of building your own system, you can build around the ElectricCommander platform and avoid the maintenance hassle, brittleness and potentially slowness of a homegrown solution. At its core, ElectricCommander is a Web-based system for automating and managing the build and release process. It provides a scalable solution to some of the biggest challenges of managing these "back end" software development tasks, including:

Multiple, disconnected build and test systems across locations

o o o o o

Results in redundant work and an inability to share/reuse across teams Difficult to reproduce or audit processes Becomes painful to manage build and test data and resources Little or no management visibility or reporting Directly impacts release predictability and time-to-market

Home-built systems that are brittle, error-prone, and don't scale

Slow overall build and release cycles

ElectricCommander tackles these problems with a three tier architecture, AJAX-powered Web interface, and first-of-its-kind build and release analytics capabilities for reporting and compliance. With this solution, your developers, release engineers, build managers, QA teams, and managers gain:

14. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

A shared platform for disseminating best practices & reusing common procedures Centralized control for improved auditability Faster throughput and more efficient hardware utilization Improved ability to support geographically distributed teams Continuous integration and greater agility Visibility/reporting for better project predictability

What Makes ElectricCommander Unique?


Most scalable solution available Only ElectricCommander provides enterprise-class scalability for build and release management. It's simple to set up and use on a simple build, yet scales to support the largest and most complex build and test processes. Other commercial solutions continuously poll the management server for available resources, causing degraded performance as the number of jobs or tasks increases. Only ElectricCommander's multi-threaded Java server Most scalable solution available Improves project predictability Easy to adopt and deploy Improves project predictability with reports and analytics ElectricCommander's unique analytics provide valuable insight into the details of the build, not just success or failure. After each step in a process completes, a postprocessor is invoked to extract information from the step's log file this information is kept as persistent properties, providing easy access to diagnostic information for real-time and trend reporting. This allows you to collect pinpoint statistics (such as number of compilations, number of tests run, and number of test failures) and to gain visibility into important productivity metrics such as trends in error rates. ElectricCommander leverages the Eclipse BIRT reporting framework, so you have the ability to extend the reporting capabilities to suit your teams and extract information for use in the reporting tool of your choice. You gain the power to make informed decisions and supply a repeatable and well-defined production process. Easy to adopt and deploy The reality is that home grown systems are expensive to maintain and difficult to standardize across an enterprise. ElectricCommander makes it simple to achieve a scalable, optimal software production environment, provides efficient synchronization even under high job volume.

15. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

across the organization, without throwing out your existing scripts and processes. Unlike other solutions that require your development teams to learn new languages and processes, ElectricCommander is easy to adopt and roll out across teams with a highly interactive Web interface and simple techniques for migrating existing scripts. Its extensible SCM integrations (IBM Rational ClearCase, Perforce, AccuRev, Telelogic Synergy, etc.) allow it to fit seamlessly into your current environment, and to provide traceability between versioned source code and packaged executables.

Conclusion
Implementing enterprise-scale build and release management is a very large development task, but it's become an unavoidable one. External factors (such as distributed development teams, agile methods and market pressures) are forcing teams to address their build and release processes with the same attention as their front-end development practices. Initiating and evolving a do-it-yourself build system may seem simple at first, with easy productivity gains from even the simplest automation approaches. But over time, these systems become brittle, expensive to maintain, and really known only to the people who built the system. Although simple open source tools exist to help automate the build and release process they don't match up to the scale of builds for large, realworld projects: ElectricCommander does.

About Electric Cloud


Electric Cloud is the leading provider of software production management solutions that automate, accelerate, and analyze the build and release processes that follow the check-in of new code. The company's patented and award-winning solutions improve productivity in the face of increasing product complexity and time-to-market pressures for software delivery. In addition to ElectricCommander, Electric Cloud offers the only solution to deliver 20x faster builds through fine-grained parallelization, ElectricAccelerator.

16. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

Leading companies such as Qualcomm, Intuit, and PayPal rely on Electric Cloud's Software Production Management solutions to change software production from a liability to a competitive advantage. For customer inquiries please contact Electric Cloud at (650) 968-2950 or www.electriccloud.com.
Electric Cloud, ElectricInsight, ElectricAccelerator, ElectricCommander and Electric Make are trademarks of Electric Cloud. Other company and product names may be trademarks of their respective owners.

17. whitepaper

v2007.07 Electric Cloud, Inc. All rights reserved.

You might also like