Monday, July 28, 2008

Last Week in Bazaar

Well, I'm late this week, so I'm officially marking this post as Last Week in Bazaar. In my defense, I got busy last Thursday, and then my cohort (Paul Hummer) flew off to New Zealand for a work-related sprint. So today, I (John Arbash Meinel, a developer on Bazaar) get to exercise full control over the content.


Keyword Expansion

People often request the ability to expand keywords, like they are used to in SVN and CVS. We've sort of postponed the implementation, because probably 90% of the time, it isn't really the right solution to the problem users are having. Also, they are kind of a mess in CVS anyway. Where I used to work we tried to use $Id$ style expansion, only to find out that they conflict on every attempt at merging, and we started working hard to strip them out of our files. In a distributed VCS, you usually merge at least an order-of-magnitude more often, which also tends to reveal this problem.

SVN at least works around the problem, in that when you commit, it actually strips the texts of their expanded keywords, so that the repository never stores the expanded form. And merges are also done on the non-expanded form. Which fixes that little problem. Though it introduces a couple others. Specifically, what you have on disk is not what sits in the repository, nor is it exactly what you will get out of a fresh checkout. The biggest reason is that if you commit revno 1354, it will update the tags of files that are touched. But if you checkout revno 1354 it will update the tags of *all* files. (I'm not positive on this, but I know there was a bug which was causing problems for people trying to do conversions. Because they couldn't quite find the right invocation to have 'update -r 1354' (from 1353) give the exact same tree as 'checkout -r 1354').

The other reason keyword expansion is not usually what you want, is because it expands only for the given file. If you make a commit to 5 other files, the *tree* is at revno 1359, but the file with your:

my_version = "$Version$";

Tag is still pointing at 1354. (Again, if 'svn update' would force all the tags to get re-expanded it might work correctly, though you run into performance problems expanding every keyword in every file on every update.) Bazaar has supported the bzr version-info command for a while, which lets you generate a version file (possibly from a template) which can store all the real details. Including the last-modified version for every file, whether any files in the working tree have been modified since commit, etc.

The only case that I've really heard a good reason for keyword expansion is for a Website. Where each individual file is spread out into the world. So having a little "last modified" at the bottom can be pretty convenient. You also don't tend to have a "build" process which lets you generate the version information at that time.

However, as Bazaar is meant to be a flexible system, Ian Clatworthy has done a wonderful job of adding the ability to support content munging via plugins. And has continued on to write a plugin specifically for expanding keywords.

http://bazaar-vcs.org/KeywordExpansion

So for all those people who feel they really need keyword expansion, look it up.I would imagine that once people get a good feel for it, and it matures a bit, it has a good chance to be brought into core code. Or at least make it into the release tarball as an "official" plugin.


Open Source, Python, and Counting My Blessings

Now onto something a bit more personal. This last week I had cause to re-visit an old library I had written, and try to get it up and running again. (Specifically, the project was pydcmtk, python wrappers for the Dicom Toolkit.)
It took me several hours times several days just to get it to build and run the test suite again. All without changing any of the code. It was simply a build-step problem.
Which revealed a couple wonderful things about my current work:

  1. I get to work in Python. Which is a nice language, flexible, and *doesn't* need a build step. Not having to deal with C/C++ and all the complexities of getting dependencies built, with the right version of the compiler, and the right flags to the compiler.
  2. Microsoft has a much harder time on their hands than Open Source does, at least when it comes to compatibility. Specifically, each version of their compiler comes with a different runtime. And code compiled for Visual Studio 7.1 doesn't like to work with the 8.0 objects nor the 9.0 objects. And they all have different msvcrtX.X.dll files. However, because the official method for getting your program to users is in binary (object) form, they have to provide ways to support your binary files for a long time. So in VS 8.0 they introduced a new step, which is to post-process your linked binaries with a manifest, declaring what runtimes they use. Further complicating this is that if you try to run a 8.0 compiled dll, it just gives an opaque "This process has tried to access the runtime incorrectly."
    Not realizing this, I spent a long time comparing the exact compiler flags with other examples to fix it. (The boost build tool, bjam, knows how to do it, but there was a line "if exists foo.manifest: do stuff", which I originally read as "if not exists foo.manifest: create the manifest.")
  3. Open source has generally handled the binary compatibility issue by punting and requesting software compatibility. And then you have a whole bunch of groups that spend their time recompiling everything for you (distributions like Ubuntu or Red Hat). And then they give you all the dependencies with a few simple commands (apt-get install zlib-dev dcmtk-dev boost-dev). On Windows, if you want to switch to developer mode, you generally have to grab the source code for all of those dependencies, and recompile them for your exact configuration.
    Software-level compatibility is *much* easier to handle. Not the least of which because if something becomes incompatible you can fix it. (I remember a Microsoft memory issue, where they had to switch in bug-for-bug compatibility because fixing it broke SimCity, how much better if they could have just patched SimCity.)
    Binary compatibility (for C/C++) means that you can't even add members to structs, because then the size changes and malloc starts failing (plus members are referenced by offset, so adding something in the *middle* is a big no-no).
    Source-level turns this way down into not removing things people are using. And, if something does change, with source-level you can even write a patch to fix the code. This does make it quite a bit harder for people who want to release binary-only packages, that they then don't have to modify for years. (Though when updating is a simple process, people are willing to do it more.)

1.6rc1 soon to come

We are working on putting the final polish to stacked branches. We are trying to release something that people can feel comfortable using right away, and there are a few tricks to get there. (For example, bzr has a general policy of always preserving the source format when you do 'bzr branch'. It helps maintain compatibility within a project that hasn't chosen to upgrade to a newer format yet. However if you do 'bzr branch --stacked' that indicates you want to use the new feature, so we have to work out logic to create an upgraded target at the right time. This also turns out to conflict a bit with bzr-svn, which had its own logic to trick 'bzr branch' into not copying the source format.)

You can already play with the Stacked Branches feature in the beta releases, but they'll appear much more polished in the final rc.

Thursday, July 17, 2008

This Week in Bazaar

Welcome back to the terrarium of the Bazaar distributed version control system. Written by co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad as he refines his plans for world domination from his shiny new lair.

Bazaar 1.6b3 released

The next beta release of Bazaar has just been cut, and is available at your local PPA:
https://launchpad.net/~bzr/+archive

The Windows installers should be available later today. This release provides lots of the shiny things that we've been talking about, like Stacked Branches, Real Weave Merge, more hooks for server-side operation, and lots of bug fixes and general polishing. The full UI for using stacked branches still needs a little bit of polishing, so the feature is not enabled by default. The functionality is all there, and if you are interested, we'd love to hear from you (kudos and complaints are equally welcome).


New updates to Gnome Bazaar Playground

Coming back from a very productive trip to Guadec, Tim Penhey has been overseeing some customizations to the Bazaar Playground for Gnome. All of the branches created at the local server in Turkey for Guadec have been added to the public playground. The Loggerhead installation has received some TLC by way of customizations to the UI. Accerciser's playground page is a good demonstration af the UI changes that have been made. The playground is actively being used by applications such as Brasero, jhbuild, Metacity and more.

One of the fun results of meeting with people at Guadec, is that it showed ways to improve Loggerhead when dealing with lots of projects and lots of branches. Work is continuing to make customizing Loggerhead's look-and-feel easier, and providing better tools for creating these "Bazaar Playgrounds" to use in evaluating Bazaar. The Bazaar developers are committed with making tools easier to use, and making the process as simple and powerful as possible.

Up and Coming Repository Format Updates

Robert Collins has been hard at work to refine how Bazaar stores its history information. We all like to have deep context, but we don't like to have to pay the penalty of downloading all of that context. Because Bazaar has a flexible repository structure, Robert has been able to play with changing the on-disk structure without major surgery to the rest of the code.

First is a change to how indexes are written, switching from a bisectable list to a btree structure. This paged structure allows us to compress the indexes, making them smaller, and faster to process remotely. It also reduces the number of lookups to find a key. (On average, a bisect search is log2N, while the btree is closer to log100N.) At the moment, he is testing this with a shared repository containing all of the projects available from in the Ubuntu apt repositories. This weighs in at around 13k branches, and somewhere around 20GB of disk space used.

Second is an update to how texts are stored. At the moment we use a simple format which places fulltexts periodically, and then stores deltas against those fulltexts. It has served us rather well, but can be improved upon. With his Group compress work, we can see a savings of as much as 2x-3x. Further, the data is stored such that you can do simple linear reads to get the base fulltext and all deltas necessary to generate a given fulltext. This reduces the pressure on indices, as you don't have to search for base texts. (Instead you just store a pointer to the start, and give the total length that needs to be read.)

These are still in development phase, but a format that uses them will likely appear in the next release (bzr 1.7).

Community Agile

Ian Clatworthy has recently released a wonderful document describing the workflow we (generally) use at Canonical. It describes how basic practices are similar to, and different from, other systems like Agile. The biggest (IMO) being a recognition that the community surrounding your project is one of the strongest and most important pieces. This has always been true in software development, but it has traditionally been somewhat hidden. Open Source has exposed just how powerful the community can be. For people interested in how software can be developed, rather than just what, I certainly recommend it.

Thursday, July 10, 2008

This Week in Bazaar

Here we are again, bringing you the gossip and dirty secrets in the development world of the Bazaar distributed version control system. In this, the 10th week, the series is now under new management, with co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad.


Bundle Buggy

Aaron Bentley has once again been improving his wonderful Bundle Buggy. He just introduced support for multiple projects using a single instance of Bundle Buggy. There are now 5 Bazaar projects using the main bundle buggy instance. (Bazaar, bzr-gtk, Bundle Buggy itself, Bzrtools, and PQM.) Of course, Daniel Watkins has made excellent use of his time, and has managed to crank out lots of updates for PQM. At this point it is code clean up, reducing the dependencies making it easier to set up and install.

Bazaar playground for Gnome


Originally, John Carr set up Bazaar mirrors of all the Gnome modules, which people could then use as a starting point for publishing code and collaborating. This week, the Bazaar playground for gnome was created so that any Gnome developers could be involved in pushing, branching, and sharing code through bazaar. This new server runs Loggerhead for viewing the code committed to these Bazaar branches. Damned Lies is also set up on the playground. This server was also reproduced locally at GUADEC because of the flaky internet connection at the conference, and all those local branches will be moved to the playground shortly.


Weave merging and handling "interesting" history

One of the great things about having a large project like MySQL using your software is that they push and stretch you in ways that you haven't necessarily encountered before. Specifically, their branch workflow looks a bit like a pile of spaghetti. With several long-term maintenance branches, team branches based off of that, and individual developer branches based off of that. Patches have a tendency to travel in unexpected ways (you may go user => team => release 1 => release 2, or you might go release 1 => team => team-2 => release 2, etc). They also are very fond of 'null merging' patches that aren't relevant to the next release. They merge the change and revert the text changes and commit.

Bazaar supports all of this, but it exposes weaknesses in simple 3-way merge logic. Because patches don't flow in anything considered orderly, you don't have the opportunity to select a "clean" base very often. Bazaar has long had an option for doing a "--weave" merge. It didn't receive much attention for a while, and had become rather slow. It turned out to be a good fit for MySQL's workflow, so John has spent a bit of time recently to make the functionality efficient and correct in some specific edge cases. Expect the improvements to show up in the next release.

Thursday, July 3, 2008

This Week in Bazaar

This is the 9th in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, unlicensed health professional. This week we are joined by Paul Hummer, who works on integrating Bazaar into Launchpad.

How to integrate bzr into your build and release process

Once you are happily using bzr on your project, the next step is some basic integration into your build process. A common desire is getting revision number to store during build process, so that you can tell what revision your program was built with. This is easy to do with 'bzr revno', which prints the current revision number. Thats not very exciting though.

There is a much more sophisticated command in bzr called version-info. For example, running:
 bzr version-info --custom \
--template="#define VERSION_INFO \"Project 1.2.3 (r{revno})\"\n"

Will produce a C header file with formatted string containing the current revision number. Other supported variables in the templates are: date, build date, revno, revision id, branch nickname, and clean (which shows whether the tree contained uncommitted changes). This makes integrating into make or another build system very easy. The templates make it very easy to generate a version file for whatever language you are writing in.

What else could be automated other than version info? The bzr-stats plugin has a credits command. This is useful for getting a list of contributors to fill out a credits page, easter egg, etc. Also, changelogs can be generated with the gnulog plugin.

Andrew Bennetts has been working on a new server side push hook that can be used to run tests before allowing a push to complete. Wow, this could replace PQM! Well, not quite. This is more of a poor-man's PQM. It doesn't scale as well, but would work for smaller teams that don't necessarily need PQM. Blocking push while tests are running is not a good idea if you have a very long test suite, and PQM will merge and commit, making it easier to deal with multiple people trying to merge changes at the same time. If you're working in a very small group (1-3 people) with a smaller test suite, using these hooks might be just the trick, but for a larger work group you should still set up PQM.

Right now PQM is a fair amount of work to set up, but that should be changing soon. Daniel Watkins has started work on making PQM easier to set up and use, and others have been submitting cleanup patches too.

Finally, if you are using bzr on a project that builds .deb packages, check out the builddeb plugin. It would be great to have plugins for other packaging tools as well! RPM, MSI, JAR, WAR, etc.