Archive for November, 2010

Meetup talk on CI

November 22, 2010

I presented an overview of CI in the enterprise at last week’s Large Scale Production Engineering meetup.

It’s been posted on slideshare, check out CI In The Enterprise: Lessons Learned!

Advertisements

Speeding up a build

November 18, 2010

Recently, I was asked how one could speed up a build. In this particular scenario, cost wasn’t a constraint (ahh, to dream), but the content is still pretty relevant.

One thing to note – often time a long running build is not necessarily a problem in itself, but may be more of an architecture issue. Meaning: if some particular component “used to” build pretty quick, and now it’s taking forever, then it’s likely time to refactor and decouple some stuff into smaller components or services or whatever. Note that this is a different kind of “breaking things into smaller pieces” than is described below as build pipelining…

That being said, here are some tips for speeding up a build:

1. Hardware investment
2. Dependency management
3. Establish build pipeline, or “chained” builds

Hardware – assuming budget isn’t a concern, the first thing I’d do is provide more processing power for these builds, and invest in faster disks. This is one of those rare cases where “throwing money at the problem” can actually help. I’d also purchase tons of RAM, and set up our builds to run on a RAM disk – which would really speed things up. With 1500 builds in a day, that’s just under 1 build per minute so we’d need to evaluate what hardware it’s running on now to assess/estimate how much we’d gain by using more processor (how much of that ~1 minute can we realistically shave off?), but it’s one of the simplest, quickest things we could consider. How fast would the build finish on a Cray?

Dependency management – Introduce some sort of dependency management so that we don’t need to build each and every artifact for each and every build. Now, this one may not be ideal for you as you didn’t state “we build only those components that have changed” as an assumption, but I’m going to go out on a limb and assume that you don’t need to rebuild a component/artifact to which there have been no changes made. If that’s a safe assumption, you can make huge gains by using a dependency management tool and a shared artifact repository (like Artifactory or Nexus) to manage and publish versioned artifacts. Some tools that provide this ability are Ivy (and ANT) and Maven, though Maven has other features/uses too.

So, for example, say we have 2 components/projects: A and B, where A depends on B. When initiating a build, rather than simply building B, then A, we define the dependency between them and let the build tool resolve those dependencies for us. Additionally, we could specify a version of the B module that A depends on (essentially peg A to a fixed version of B, say 1.0 for example), and then we just run a build of A. The build script now knows that A depends on v1.0 of B, and checks for it’s existence in the shared repository. If it finds it, it’ll simply grab that artifact and use that to compile A rather than rebuilding it. Alternatively, we could tell A to use always use the “latest” version of B, in which case it’ll just grab the most recently built version of B.

Ivy and Maven make this possible by publishing metadata about the resulting artifacts along with the actual artifact itself. When a build is initiated, it first maps out it’s dependencies, and attempts to resolve those dependencies with existing (pre-built) artifacts in the shared repository.

This approach will reduce overall build times by simply skipping those components that haven’t changed. If we assume that all 1500 projects have been changed since the last build and need to be rebuilt, then I’m afraid that this approach wouldn’t help much.

Build Pipeline – break the build into smaller discrete chunks, and run them in a “pipeline” or a “build chain”. Again, I’m not sure this is an assumption I can make, as you stated a “build” consists of only compiling and linking, but there are often things you can do to “move around the load” if not outright reduce it. So, for example, you may have a “quick” build that simply checks out the source, compiles/links, and runs unit tests, whereas running a “full, clean build” will do much more (delete local source, check out, compile/link, run unit tests, package, run integration tests, deploy, regress, etc).

We could define a “pipeline” that code moves though on it’s way from source to deployment/release, and each stage of the pipeline would be it’s own discrete activity, providing different types of feedback (eg if compile passes, move on to assembly/packaging. if assembly/packaging passes, move on to deployment, etc, etc).

If the goal is to provide more rapid feedback to developers, we might look to see what’s the most useful for a developer and try to isolate that feedback (maybe something like “your code won’t compile as is!”) and provide it faster, eg – make that the “quick build” to give faster feedback to developers.