Monday, July 28, 2008

Last Week in Bazaar

Well, I'm late this week, so I'm officially marking this post as Last Week in Bazaar. In my defense, I got busy last Thursday, and then my cohort (Paul Hummer) flew off to New Zealand for a work-related sprint. So today, I (John Arbash Meinel, a developer on Bazaar) get to exercise full control over the content.


Keyword Expansion

People often request the ability to expand keywords, like they are used to in SVN and CVS. We've sort of postponed the implementation, because probably 90% of the time, it isn't really the right solution to the problem users are having. Also, they are kind of a mess in CVS anyway. Where I used to work we tried to use $Id$ style expansion, only to find out that they conflict on every attempt at merging, and we started working hard to strip them out of our files. In a distributed VCS, you usually merge at least an order-of-magnitude more often, which also tends to reveal this problem.

SVN at least works around the problem, in that when you commit, it actually strips the texts of their expanded keywords, so that the repository never stores the expanded form. And merges are also done on the non-expanded form. Which fixes that little problem. Though it introduces a couple others. Specifically, what you have on disk is not what sits in the repository, nor is it exactly what you will get out of a fresh checkout. The biggest reason is that if you commit revno 1354, it will update the tags of files that are touched. But if you checkout revno 1354 it will update the tags of *all* files. (I'm not positive on this, but I know there was a bug which was causing problems for people trying to do conversions. Because they couldn't quite find the right invocation to have 'update -r 1354' (from 1353) give the exact same tree as 'checkout -r 1354').

The other reason keyword expansion is not usually what you want, is because it expands only for the given file. If you make a commit to 5 other files, the *tree* is at revno 1359, but the file with your:

my_version = "$Version$";

Tag is still pointing at 1354. (Again, if 'svn update' would force all the tags to get re-expanded it might work correctly, though you run into performance problems expanding every keyword in every file on every update.) Bazaar has supported the bzr version-info command for a while, which lets you generate a version file (possibly from a template) which can store all the real details. Including the last-modified version for every file, whether any files in the working tree have been modified since commit, etc.

The only case that I've really heard a good reason for keyword expansion is for a Website. Where each individual file is spread out into the world. So having a little "last modified" at the bottom can be pretty convenient. You also don't tend to have a "build" process which lets you generate the version information at that time.

However, as Bazaar is meant to be a flexible system, Ian Clatworthy has done a wonderful job of adding the ability to support content munging via plugins. And has continued on to write a plugin specifically for expanding keywords.

http://bazaar-vcs.org/KeywordExpansion

So for all those people who feel they really need keyword expansion, look it up.I would imagine that once people get a good feel for it, and it matures a bit, it has a good chance to be brought into core code. Or at least make it into the release tarball as an "official" plugin.


Open Source, Python, and Counting My Blessings

Now onto something a bit more personal. This last week I had cause to re-visit an old library I had written, and try to get it up and running again. (Specifically, the project was pydcmtk, python wrappers for the Dicom Toolkit.)
It took me several hours times several days just to get it to build and run the test suite again. All without changing any of the code. It was simply a build-step problem.
Which revealed a couple wonderful things about my current work:

  1. I get to work in Python. Which is a nice language, flexible, and *doesn't* need a build step. Not having to deal with C/C++ and all the complexities of getting dependencies built, with the right version of the compiler, and the right flags to the compiler.
  2. Microsoft has a much harder time on their hands than Open Source does, at least when it comes to compatibility. Specifically, each version of their compiler comes with a different runtime. And code compiled for Visual Studio 7.1 doesn't like to work with the 8.0 objects nor the 9.0 objects. And they all have different msvcrtX.X.dll files. However, because the official method for getting your program to users is in binary (object) form, they have to provide ways to support your binary files for a long time. So in VS 8.0 they introduced a new step, which is to post-process your linked binaries with a manifest, declaring what runtimes they use. Further complicating this is that if you try to run a 8.0 compiled dll, it just gives an opaque "This process has tried to access the runtime incorrectly."
    Not realizing this, I spent a long time comparing the exact compiler flags with other examples to fix it. (The boost build tool, bjam, knows how to do it, but there was a line "if exists foo.manifest: do stuff", which I originally read as "if not exists foo.manifest: create the manifest.")
  3. Open source has generally handled the binary compatibility issue by punting and requesting software compatibility. And then you have a whole bunch of groups that spend their time recompiling everything for you (distributions like Ubuntu or Red Hat). And then they give you all the dependencies with a few simple commands (apt-get install zlib-dev dcmtk-dev boost-dev). On Windows, if you want to switch to developer mode, you generally have to grab the source code for all of those dependencies, and recompile them for your exact configuration.
    Software-level compatibility is *much* easier to handle. Not the least of which because if something becomes incompatible you can fix it. (I remember a Microsoft memory issue, where they had to switch in bug-for-bug compatibility because fixing it broke SimCity, how much better if they could have just patched SimCity.)
    Binary compatibility (for C/C++) means that you can't even add members to structs, because then the size changes and malloc starts failing (plus members are referenced by offset, so adding something in the *middle* is a big no-no).
    Source-level turns this way down into not removing things people are using. And, if something does change, with source-level you can even write a patch to fix the code. This does make it quite a bit harder for people who want to release binary-only packages, that they then don't have to modify for years. (Though when updating is a simple process, people are willing to do it more.)

1.6rc1 soon to come

We are working on putting the final polish to stacked branches. We are trying to release something that people can feel comfortable using right away, and there are a few tricks to get there. (For example, bzr has a general policy of always preserving the source format when you do 'bzr branch'. It helps maintain compatibility within a project that hasn't chosen to upgrade to a newer format yet. However if you do 'bzr branch --stacked' that indicates you want to use the new feature, so we have to work out logic to create an upgraded target at the right time. This also turns out to conflict a bit with bzr-svn, which had its own logic to trick 'bzr branch' into not copying the source format.)

You can already play with the Stacked Branches feature in the beta releases, but they'll appear much more polished in the final rc.

No comments: