Repository Structures and CI

Everyone has their ideas on how the world should be organized. Today I am going to share mine with, well with the world of course.

How to organize a repository is very important to Continuous Integration-practicing teams. Granted, how to organize the repository is important to all repository users -- but I think there are certain themes that have particular importance if your team is practicing Continuous Integration (CI) that might be less noticeable (if not less important) than if you are not. CI teams use the repository differently than other teams, and that means there are certain usage-patterns that can cause you unnecessary pain if you try to apply them to a CI team's repository where things change frequently, and everyone is encouraged to get the "entire" tree when they work (remember that word "entire").

One thing I know for sure is that on every project I've done using CI, getting the repository to play nice with the daily development cycle has initially been a challenge, and once solved, provided a huge boost in productivity.

As an aside, (or perhaps a further aside, we were talking about the world a minute ago) here's the basic daily cycle for individuals on a team practicing CI (and presumably Test Driven Development). Starting at the beginning of the day, and going to the end it looks something like this:

First thing: check to ensure that the last build was successful. Broken code is useless to a team practicing CI. Getting that build machine passing (usually this means compiling and testing properly for CI-practicing teams) comes first.

OK, builds working? Next, grab the latest version of the entire code base (its the definition of "entire" that this blog post is really about. I promise I will get back to it)

Run the build locally to make sure all is well.

Do the TDD test/code/refactor cycle

Periodically sync up. To do that, you essentially repeat the pattern: get latest, run the build with your changes locally and if all's well check-in. If you are using a build tool like CruiseControl, or CC.NET make sure the build passes.

At the end of the day you should have done this larger cycle several times. Finish up with one last check-out/check-in/watch build pass cycle, and head home happy with the knowledge that you made progress today.

OK, but there's that small bit about the entire code base. Getting this right can be tricky.

Josh MacKenzie, one of my fellow ThoughtWorkers said it best: "When I am on a project I want the entire world in the repository, so that there is no question of where I need to go to get everything." And that's how I feel too -- but just exactly how the world gets put into the repository gets us into trouble sometimes.

(see I told you this was about my vision of world order)

Documentation seems to be the big culprit here. Storing docs in the repository makes sense, but having lots of docs in the same tree as the code makes for hefty check-outs, difficulty in cleaning (try checking out several Mb's of Word documents every time you do a clean build) and false positives on when to rebuild.

So, the principle that we seem to veer toward is: "Separate the things needed to build the code, from the things that are not needed." Simple, obvious -- too obvious for a long-winded blog entry. But perhaps a bit too simple.

On my latest project, one more refinement to "entire" has presented itself. One of things needed to build the code base are the development tools. Things like CruiseControl and Ant, or an xUnit tool like NUnit, and many other development tools, are often needed to build the code, often must be versioned along with the code, and thus often end up in the repository.

But (and finally, I arrive at my point) tools don't need to be included in that definition of "entire." These meta-build artifacts change relatively infrequently, and so can be given their own home in the repository.

Phew, glad I got that off my chest. OK, here's my current notion of how the world should be organized:

  • Source and dependent third-party binaries go in a tree or sub-tree.
  • At the root of this tree goes the master build script, solution file, make file, whatever
  • This tree is the basis of builds (and what gets deleted for clean builds).
  • Tools go in their own parallel tree that is visible to the build scripts, but not under that root tree.
  • Tools get checked-out and integrated when they are changed -- but separately from the code itself.
  • Everything else (documents, schedules, etc) gets put into its own parallel (or higher) tree

At least that's my current thoughts on the subject of world order -- and see I didn't mention George W. Bush (or his Father) once. Err well OK, twice then.