It’s been posted on slideshare, check out CI In The Enterprise: Lessons Learned!
Recently, I was asked how one could speed up a build. In this particular scenario, cost wasn’t a constraint (ahh, to dream), but the content is still pretty relevant.
One thing to note – often time a long running build is not necessarily a problem in itself, but may be more of an architecture issue. Meaning: if some particular component “used to” build pretty quick, and now it’s taking forever, then it’s likely time to refactor and decouple some stuff into smaller components or services or whatever. Note that this is a different kind of “breaking things into smaller pieces” than is described below as build pipelining…
That being said, here are some tips for speeding up a build:
1. Hardware investment
2. Dependency management
3. Establish build pipeline, or “chained” builds
Hardware – assuming budget isn’t a concern, the first thing I’d do is provide more processing power for these builds, and invest in faster disks. This is one of those rare cases where “throwing money at the problem” can actually help. I’d also purchase tons of RAM, and set up our builds to run on a RAM disk – which would really speed things up. With 1500 builds in a day, that’s just under 1 build per minute so we’d need to evaluate what hardware it’s running on now to assess/estimate how much we’d gain by using more processor (how much of that ~1 minute can we realistically shave off?), but it’s one of the simplest, quickest things we could consider. How fast would the build finish on a Cray?
Dependency management – Introduce some sort of dependency management so that we don’t need to build each and every artifact for each and every build. Now, this one may not be ideal for you as you didn’t state “we build only those components that have changed” as an assumption, but I’m going to go out on a limb and assume that you don’t need to rebuild a component/artifact to which there have been no changes made. If that’s a safe assumption, you can make huge gains by using a dependency management tool and a shared artifact repository (like Artifactory or Nexus) to manage and publish versioned artifacts. Some tools that provide this ability are Ivy (and ANT) and Maven, though Maven has other features/uses too.
So, for example, say we have 2 components/projects: A and B, where A depends on B. When initiating a build, rather than simply building B, then A, we define the dependency between them and let the build tool resolve those dependencies for us. Additionally, we could specify a version of the B module that A depends on (essentially peg A to a fixed version of B, say 1.0 for example), and then we just run a build of A. The build script now knows that A depends on v1.0 of B, and checks for it’s existence in the shared repository. If it finds it, it’ll simply grab that artifact and use that to compile A rather than rebuilding it. Alternatively, we could tell A to use always use the “latest” version of B, in which case it’ll just grab the most recently built version of B.
Ivy and Maven make this possible by publishing metadata about the resulting artifacts along with the actual artifact itself. When a build is initiated, it first maps out it’s dependencies, and attempts to resolve those dependencies with existing (pre-built) artifacts in the shared repository.
This approach will reduce overall build times by simply skipping those components that haven’t changed. If we assume that all 1500 projects have been changed since the last build and need to be rebuilt, then I’m afraid that this approach wouldn’t help much.
Build Pipeline – break the build into smaller discrete chunks, and run them in a “pipeline” or a “build chain”. Again, I’m not sure this is an assumption I can make, as you stated a “build” consists of only compiling and linking, but there are often things you can do to “move around the load” if not outright reduce it. So, for example, you may have a “quick” build that simply checks out the source, compiles/links, and runs unit tests, whereas running a “full, clean build” will do much more (delete local source, check out, compile/link, run unit tests, package, run integration tests, deploy, regress, etc).
We could define a “pipeline” that code moves though on it’s way from source to deployment/release, and each stage of the pipeline would be it’s own discrete activity, providing different types of feedback (eg if compile passes, move on to assembly/packaging. if assembly/packaging passes, move on to deployment, etc, etc).
If the goal is to provide more rapid feedback to developers, we might look to see what’s the most useful for a developer and try to isolate that feedback (maybe something like “your code won’t compile as is!”) and provide it faster, eg – make that the “quick build” to give faster feedback to developers.
Excellent article to help evaluate your CI fu:
Got an email from a buddy about versions and source control tagging.. thought I’d share:
I was just wondering, in writing code deployment scripts, is there a compelling reason to use a separate or proprietary “tagging” system rather than rely on source control tags? For example creating code release versioning that is independent from source control tagged versions, and using the release versions when specifying what code to deploy.
I’m curious because my old company did this and I wonder if that abstraction is useful or necessary with more complicated code deployment schemes.
Interesting question. As with anything like this, the answer is “it depends”.
If I get what you’re asking, you’re wondering about the usefulness/necessity of separating out “versioned” artifacts for deployment – e.g., having a versioning scheme for “deployable” artifacts that deviates from the “tagging” convention you use in ur svn.
This sounds like something that you see a lot with Maven – the “maven way” almost requires the shoving off of artifacts to a shared location, to be picked up and deployed at a later time (“snapshots”, “releases”, etc), which sort of mandates a way of managing/naming these artifacts separately from svn tags. The very concept of an artifact repository is central to the “maven way”.
In general, yes it is useful, though maybe not always necessary. One compelling reason is the “rollback” scenario – it’s really handy to have an archive of “certified” deployable artifacts readily available when you gotta abort a deployment ad rollback to a previous version (rather than having to wait to re-build/package off of a tag, which in large systems could take a long time).
Obviously, there are lots of approaches to dealing with this scenario, but this seems to work pretty well. A side benefit is that you can readily deploy any particular historical version of the app for, say, QA developers to identify/isolate a particular bug in a particular version.
Also, the concept of a “build pipeline” is very powerful, and is most useful when the different stages of a build/package/test/deploy are performed using the exact same artifacts – so you may have a build step that creates a war, then another step picks up that specific war and deploys it, tests against it, etc, and then further down the line you take that same exact artifact and deploy that where it needs to go (staging, prod, whatever). This helps minimize the risk of inadvertently introducing unknown/undesired code and/or property changes as the code moves through it’s lifecycle.
One more thing this helps with is speeding up build times (both locally and on a build server) for large complex systems in a shared/distributed development environment through better management of dependencies. Say, for example, you’re working on a module in a project with dozens (or hundreds perhaps) of other, shared modules. As a best practice, you should be compiling and running unit tests several times a day against the stuff you’re changing. If you need to build the entire stack, on every change, prior to every commit, that could get a little out of control and may discourage frequent local builds. However, if you toss in an “artifact repository”, where you can keep fixed versions of all sorts of shared modules (your project dependencies), then you don’t need to compile (or even keep that source locally) every single thing in order to get a full project. You can just grab the pre-compiled, “versioned” binaries from the shared repo, and you’re set. The tradeoff of developer time for a little storage and network traffic is usually a no-brainer.
An actual e-mail with interview pre-screen questions about build and CI. Note the answer to #7, classic!
Below are the eight prequalifying questions that <name removed> requested you answer.
- What factors influence the opening of a feature branch?
- What is the purpose of continuous integration for a development team?
- Describe a branch structure for a highly iterative web product?
- Describe a set of release criteria for web calculator application?
- When are you available to start a new position?
- What are your top 3 strengths?
- What are your top 3 weaknesses?
Hi Ms. Recruiter chick,
Please see my responses below:
1) What factors influence the opening of a feature branch?
Typically, feature branches are created in cases where the new feature or enhancement has broad-sweeping changes to the code base such that introducing them in the trunk may be too disruptive. Also, feature branches may be used for prototyping or proof-of-concept for code that may never end up in trunk.
2) What is the purpose of continuous integration for a development team?
The primary purpose of CI is to provide regular, fast feedback to developers as they commit changes to the shared code repository (VCS).
The idea being that we’re always integrating our code on commit, so that when conflicts arise, they can be addressed more quickly and easily than if the changes had been made days, week, or even months ago.
3) Describe a branch structure for a highly iterative web product?
A common branching structure would be as follows: one primary line of development, called “trunk”. All iteration work should be done in trunk, except for cases as described above where branches are appropriate. For those cases, a private/feature branch is created, and the “owners” of that branch take responsibility for merging trunk into said branch periodically (at least weekly, maybe more, depends on churn/code change volume) so as to avoid too much drift.
Upon completion of the initiative which required a branch, those changes are merged back into trunk in time to perform regression testing ahead of the next production release.
Prior to production release, a “release branch” should be created. This is effectively the “release candidate”. Access to commit to the release branch should be limited and relatively controlled. Any changes made to the release branch should be merged back into trunk as well.
Builds, deployments, and regression testing should be performed against artifacts built off of the release branch. After successful release (“go-live”, “launch”), the branch gets locked down again and is kept around in case a high-sev bug is found in production and needs to be fixed asap.
4) Describe a set of release criteria for web calculator application?
Release criteria is fairly subjective and varies a lot depending on the organization and the app. That being said, software should not be released without the following:
- an understanding of the purpose of the release
- all high severity bugs closed or deferred
- release notes with a list of detailed fixes/enhancements (bug #’s perhaps?) and instructions for any server config or settings changes
- sign-off from a QA person (preferably some type of “lead”)
5) When are you available to start a new position?
6) What are your top 3 strengths?
Communication/collaboration, ANT scripting, troubleshooting/optimizing Java builds
7) What are your top 3 weaknesses?
I, like Chuck Norris, have no weaknesses.
Check out this nice write up about build automation. In particular, note the bit about keeping tabs on code quality.
I think it’s often overlooked that “quality injection” is a huge benefit of CI. Yes, it’s all well and good that your code compiles, but that doesn’t really tell you much about the quality or give you any useful metrics you can act on.
There’s a handful of utilities out there that you can tie into your build to collect info about your codebase (checkstyle, coverity, simian, findbugs to name a few).
Point is – when you start thinking about how you can leverage your automated build to inject quality into your process, things can get really interesting.
One of the challenges of investing the time and effort into pimping out your build and ci setup (or more generally, CM processes) is how to measure success. Where’s the ROI in having your top dude spend days writing ANT scripts?
Check out this great post about how to measure your success with change management.
We just released a redesign of justinlittle.com, check it out and let me know what you think!
As always, it’s a work in progress, but at least I’ve got a decent base to work with now…
If you like the design/layout, check in with Sleepless Media out of Santa Cruz, CA. They did the design for me, and they’re a great team to work with. I hacked up the html/css a bit (hey, what can I say, I’m not a design guy), their original stuff was even tighter.
They do really, really nice stuff, check out their portfolio of work.
This sounds like a pretty good match-up:
This is hilarious. Agile Hitler.