Build and Release Management:: Understanding The Costs of Doing It Yourself
Build and Release Management:: Understanding The Costs of Doing It Yourself
Build and Release Management:: Understanding The Costs of Doing It Yourself
Table of Contents
Introduction Building it Yourself 1. How it All Begins 2. Feedback Becomes Essential 3. Outgrowing a Single Machine 4. Supporting Multiple, Distributed Machines 5. Monitoring and Management 6. Extensibility How Did we Get Here? The Cost A Commercial Alternative: ElectricCommander What Makes ElectricCommander Unique? Conclusion About Electric Cloud 3 5 6 9 10 11 11 12 14 15 16 16 3
2. whitepaper
Introduction
If you were starting a new software development team today, could you imagine one of the 'to do' list items being 'build our own source code management system'? It's unlikely that anyone would dream of building their own SCM system today: there are multiple open source and commercial SCM systems available to meet all possible needs. But the same can't be said for build and release management where most organizations handle these processes with manual, script-driven systems. It's ironic because the most widely used build tool, Make, was created around the same time in the early 1970s as the first SCM tool, SCCS, yet build and release management has lagged behind. In the intervening 30 years SCM has progressed from simple tools to complex, commercial products as software projects have grown in size and complexity. Build and release management has only recently caught up with the release of suites of tools that simplify and improve the entire process. build These tools mean that build managers no longer have to create from scratch, nor are they saddled with ongoing systems
maintenance costs for home grown systems. This paper looks at the needs of build and release management, how those needs are met by open source and commercial tools that go way beyond the likes of Make or Ant, and the real costs of creating and maintaining a homegrown system.
making your own build and release system looks easy. At first a small team can get away with a simple build script. with a little bit of Perl or some other scripting language. The script
often grows out of the software's Makefile or Ant script by wrapping them
3. whitepaper
While the Makefile will show how to build the software, the wrapper script probably incorporates steps like checking out the source code, running smoke or unit tests and 'deploying' the software by copying it to an appropriate directory. Ambitious build managers will have included a mechanism to send an email when the build ends, perhaps even including details of the success or failure of individual steps in the process (sometimes including snippets of log files relevant to debugging a broken build). importantly, time. The scope of the build script is really only limited by the build manager's imagination and, more Some build scripts might even try to interrogate an SCM to discover who checked in since the last build, and hence might be to blame for a build failure. Once the relevant steps have been written into the build script it's possible to automate the builds by calling the script from a 'cron' job or similar job scheduler. It's typical to see a team install a dedicated 'build machine' which runs the build script and produces regularly scheduled builds (at least one nightly build). Teams that decide to do continuous integration 1 (CI) usually start to see problems with home grown build scripts. To make CI work they build another script that monitors the SCM system for changes to a particular branch, determines when the changes have finished (by waiting for a 'quiet period' of say 15 minutes to give developers enough time to commit) and then uses the existing build script to run the build. This means integration with the SCM system and modifying the build script to build the correct branch of code. Typically, teams doing CI want feedback on these 'stimuli builds' (builds that were the result of some change in SCM) via email, web pages or RSS; this just adds more work to the CI build script wrapper. And doing CI can mean many builds per day, tying up the one 'build machine' that was originally designated.
The scope of the build script is really only limited by the build manager's imagination and, more importantly, time.
1 Defined as Continuous Integration is a software development practice where members of a team integrate their work frequently; usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Continuous Integration, Martin Fowler, http://www.martinfowler.com/articles/continuousIntegration.html.
4. whitepaper
Stepping back it's easy to imagine that all the Makefiles, Ant scripts, Perl wrappers and other assorted pieces of code can easily constitute 5,000 to 10,000 lines of code to be managed by the build team. week to maintain it. At this size the code is manageable by a single person requiring a few hours of work per
idea of CI is that multiple builds occur per day and that developers get feedback on the status of those builds to ensure that the software is building and testing correctly. Email notifications are the most basic form of build feedback, but webbased interfaces for viewing build status, viewing the build queue (to determine when a CI build will run) and debugging broken builds are essential to successful CI. Build feedback gives the vital signs of the If developers software being constructed; with CI, the build feedback becomes a heartbeat of sorts indicating the health of the project. want to know build status, they also need to be able to drill down to discover quickly why a build failed to get the CI build train back on track. Martin Fowler's seminal paper 'Continuous Integration' describes build feedback under the heading 'Everyone can see what's happening'. system. To build such a system requires a much larger investment than the initial wrapping of the build process with scripts. A web server must be put in The only way to achieve that is through a web-based (or even RSS-based)
5. whitepaper
place and maintained, and significant amounts of software must be written to dynamically produce web pages showing build status, enable drill down to discover build problems, analyze builds logs and isolate relevant sections for display, and store build history in a database (with many builds per day, even the problem of managing just the build logs becomes significant). One open source project that can help in some situations is CruiseControl. Mike Clark's Pragmatic Project Automation book 2 describes CruiseControl as cron for Ant, but with many bells and whistles. If you are working in a Java/Ant environment and want something more powerful than cron for running builds and getting status, then CruiseControl is a reasonable starting point. It lacks many features that become essential as projects and builds grow, but it's better than writing everything yourself. Whatever approach has been taken, the build system now starts to look like a real software development project and probably requires its own section in the company's SCM system. The total size of the scripts associated with the build system will have grown, meaning that the system requires maintenance and upgrading (as software projects change and grow) at least one day a week.
Pragmatic Project Automation, Mike Clark, Pragmatic Bookshelf Books, July 2004.
6. whitepaper
Another way multiple machines become important is when builds are too slow. Build managers often try to speed up builds by buying faster hardware, but reach a limit and have to resort to parallelism. They can parallelize the build either by manually breaking the build up into discrete blocks that can be safely built in parallel, or by using an open source or commercial tool that parallelizes the build across a cluster of machines (such as Electric Clouds ElectricAccelerator). Moving from one machine to many machines introduces a number of problems that have to be integrated into the build scripts: a) Matching Build Resources: since it's unlikely that all builds can run Whatever the cause, builds typically outgrow a single build machine quite rapidly as the 'build matrix' of builds and resources needed by each build expands. on all machines in the build cluster (because of different operating systems or processor types) the build system has to be smart enough to know where it can run any given build. match a build to specific build resources. That means creating some repository of configuration information that can It's unlikely that this is a one-to-one mapping: a single build may need to be run on a number of target platforms to constitute a full build and results from all platforms need to be merged for reporting purposes. b) Resource Allocation: the single build machine's resources were fairly simple to manage. Either the machine was running a build or it was not (and discovering this was as easy as looking at the build monitoring web site, or even running 'top'). With multiple builds, getting an overview of which machines are available is much more difficult and assigning appropriate machines to the queue of builds to be run becomes a resource allocation task. If you couple the available hardware, operating systems and processors with the different builds and their resource demands, it's easy to see that resource allocation is a complex optimization problem that must be solved if build machines are to be used efficiently. c) Remote Invocation: multiple machines mean either building remotely (invoking an entire build on a remote machine) or running individual build steps remotely. Either way, the build
7. whitepaper
system needs to support remote command invocation across differing platforms. This could be as simple as 'rsh', or more complex if a mix of Unix and non-Unix (e.g. Windows) platforms are involved. In addition, the remote invocation system must be able to return generated files, command status, log output and console messages to the invoking machine so that the build can be assembled from parts built remotely, and reports can be generated from failing build steps. d) Access Control: that's easy with a single machine, but not with many. Controlling access to the cluster of machines becomes access can significantly affect running builds. important because disturbances in the build cluster caused by unauthorized Secondly, keeping the build machines at a specific patch level or operating system revision may be vital to ensuring that the software builds correctly. example, problems). there may be At the same time the large build different maintainers for different infrastructure may require multiple people to have access (for platforms, or developers may need access to debug build These requirements mean that the build cluster itself Also, some build Once again, needs appropriate access control mechanisms.
organizations will put in place self-service builds where any member of the organization can request a build. only granted to appropriate parts of the source tree. e) Prioritization: a large build cluster inevitably means that multiple builds are running and that, in turn, means that build prioritization becomes important. With a single build machine, such a system requires access control to ensure that access is
build prioritization is pretty simple: when there's something more important to do than the current build you just stop it and start the important build. level feature. With multiple build machines tied up with multiple builds spread across the cluster, prioritization is a topNot only must different levels of CI builds be scheduled (for example, to prevent a specific team from starving out the build resources because of rapid changes), but sudden changes in priority must be available to deal with special situations (such as a patch release that has to go out 'today' for
8. whitepaper
an important customer, or a rebuild of a release that is close to its release date because of a late-breaking change). f) System Failure: Inevitably systems or software will fail fairly Dealing with system failure Failures
frequently when a large number of machines of different types are joined together in a build cluster. needs to be a feature of a multi-machine build system.
can come in a number of forms (outright hardware failure causing a machine to crash, software failure where a compiler or other tool crashes, or a runaway process on a remote machine). The build scripts need to be able to detect and report such failures; it's vital that a runaway process, for example, is not able to hold up the entire build cluster: build scripts need to be either multithreaded or event-driven. Ideally, a build system would be able to workaround a failure in real time by reassigning the build or build tasks from a failing machine to another machine in the cluster. Such live reassignment would have to operate in conjunction with prioritization, access control, and resource allocation processes.
9. whitepaper
Implementing a build system that has these features goes well beyond the typical build scripts and well beyond the capabilities of open source solutions like CruiseControl. Such a build system requires a significant software development project with the inherent need for release management and maintenance.
10. whitepaper
6. Extensibility
All build systems need the ability to integrate with a very diverse range of tools. A typical build system might use a Make or Ant tool from one provider, an SCM system from another, a separate defect database for tracking, a special set of tools for running tests and probably has to integrate with home grown tools written in the scripting language most popular in the organization. Since many integrations will be common (such as integrations with SCM or testing tools) a build management system should have out-of-the-box integrations available for common tools. But ready made integrations are not enough. A build management system needs the flexibility to integrate with almost any tool. That makes welldefined, and open, APIs essential; simple interfaces for command-line or web-service based tools make a build management system easy to extend.
11. whitepaper
As features and capabilities are added to the build and release system, the level of investment required to develop and maintain the system grows dramatically.
The Cost
To understand the size of the effort required to implement the type of build and release system outlined in this paper, it is possible to make some rough estimates of the cost of implementing the features. Start by assuming that the team has already put in place a solution based around either their own scripts (or an open source project like CruiseControl) that gives them basic scheduled builds, plus continuous integration builds and a simple web-based interface for build status with email notifications. The following is a rough estimate of the effort (in person months) of implementing a fully-functioning build and release system.
12. whitepaper
Feature
The Basics On-demand builds (developers can launch builds): Incremental builds (build system can do incremental and full builds to save time): Add control of individual build steps so that builds can be broken down and managed as discrete chunks of work: Dynamic resource allocation to cope with the demands of differing builds on differing build hardware/software: Add resource access control so that different build types (e.g. developer vs. nightly) get different build resources: Total for The Basics Security Integrate with standard authentication mechanism (e.g. LDAP) and authenticate use of build resources: Add role-based access control (e.g. who's allowed to start/stop builds, edit build scripts): Support secure remote access: Total for Security Flexibility Modify build scripts to make them easily maintainable and reusable: Build log management and searching for better build reporting and drill-down: Add project level management of builds to isolate different projects and allow better management of build steps and resources: Total for Flexibility Advanced Add APIs so that the build system can be scripted and enhanced externally (e.g. a REST interface, Perl module or other interface): Make the build system agnostic towards tools used (such as the compilers and linkers) so that heterogeneous environments are supported: Total for Advanced TOTAL 1 1 1 2 1
That's a total of 18 person months to put in place a flexible build system suitable for enterprise software development. In addition to the If the system is development cost of creating the system, a system of that size will require a full-time person to maintain it going forward. needed. to be enhanced over time, additional development resources will be And thats not to mention costs associated with ongoing
maintenance of the system, knowledge lost if/when the developer who built the system leaves the company, etc.
o o o o o
Results in redundant work and an inability to share/reuse across teams Difficult to reproduce or audit processes Becomes painful to manage build and test data and resources Little or no management visibility or reporting Directly impacts release predictability and time-to-market
ElectricCommander tackles these problems with a three tier architecture, AJAX-powered Web interface, and first-of-its-kind build and release analytics capabilities for reporting and compliance. With this solution, your developers, release engineers, build managers, QA teams, and managers gain:
14. whitepaper
A shared platform for disseminating best practices & reusing common procedures Centralized control for improved auditability Faster throughput and more efficient hardware utilization Improved ability to support geographically distributed teams Continuous integration and greater agility Visibility/reporting for better project predictability
15. whitepaper
across the organization, without throwing out your existing scripts and processes. Unlike other solutions that require your development teams to learn new languages and processes, ElectricCommander is easy to adopt and roll out across teams with a highly interactive Web interface and simple techniques for migrating existing scripts. Its extensible SCM integrations (IBM Rational ClearCase, Perforce, AccuRev, Telelogic Synergy, etc.) allow it to fit seamlessly into your current environment, and to provide traceability between versioned source code and packaged executables.
Conclusion
Implementing enterprise-scale build and release management is a very large development task, but it's become an unavoidable one. External factors (such as distributed development teams, agile methods and market pressures) are forcing teams to address their build and release processes with the same attention as their front-end development practices. Initiating and evolving a do-it-yourself build system may seem simple at first, with easy productivity gains from even the simplest automation approaches. But over time, these systems become brittle, expensive to maintain, and really known only to the people who built the system. Although simple open source tools exist to help automate the build and release process they don't match up to the scale of builds for large, realworld projects: ElectricCommander does.
16. whitepaper
Leading companies such as Qualcomm, Intuit, and PayPal rely on Electric Cloud's Software Production Management solutions to change software production from a liability to a competitive advantage. For customer inquiries please contact Electric Cloud at (650) 968-2950 or www.electriccloud.com.
Electric Cloud, ElectricInsight, ElectricAccelerator, ElectricCommander and Electric Make are trademarks of Electric Cloud. Other company and product names may be trademarks of their respective owners.
17. whitepaper