John Arbash Meinel's Bazaar Blog: 2008

Thursday, August 14, 2008

This Week in Bazaar

Ah, to take a break from reporting to the world, but now we are back. This used to be a completely weekly series of posts about the on-going events in the world of Bazaar (and may be yet again). Written by co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad.

Bazaar 1.6rc3 Released

With Martin Pool going on vacation for the next two weeks, John has stepped up to marshall 1.6 out the door. And he started with not 1 but 2 release candidates in 2 days. We're trying hard to get back into a time-based release schedule. The problem with sneaking in a feature-based release, is that they always end up slipping, as everyone tries to get "one-more-thing" in to the delayed release. However, with RC3, we've actually gotten the list of things that must be in 1.6 down to 0, so there is a very good chance it will become 1.6-final next week.

Since it has been a delayed release, there are lots of goodies inside to partake of. Stacked Branches, improved Weave merge, significantly faster 'bzr log --short', improvements to the Windows installation, better server side hooks, and the list goes on. Most of this we have mentioned in previous "This Weeks", the big difference is that it is available in a release, rather than just in the bzr.dev trunk.

The Windows install is one of the major changes, in that it will now (by default) bundle TortoiseBzr as part of the standalone install. TortoiseBzr still needs work before it is as much of a joy to work with as the rest of the system, but this release is mostly about testing our ability to bundle them together.

Looking forward to Bazaar 1.7

As 1.6 nears it's official release, the development community has started planning the 1.7 development process. As it stands now, bzr 1.7 has a planned release date of September 8th. This means there are two whole weeks two get various bugfixes and contributions to bazaar in before getting down to release time (mentoring available).

Among the proposed potential features, there are a few that really stand out. Mark Hammond has been polishing Bazaar on Windows, and there is much desire for someone to help getting the bazaar test suite to run cleanly in Mac OS X. These features will greatly add to the existing portability strengths of Bazaar. While the majority of changes needed are actually in the test suite, and not the core functionality, the community could really use someone who could step up, and learn how to do unit testing in Python. Bazaar 1.7 will also see some increased merge flexibilities, especially with criss cross merges.

Improvements to the indexing layer are likely to land in 1.7, though as always, not on the default format. (We want at least 1 release supporting a format before we suggest it as the default, to give people time for compatibility.) The new b+tree layout for indexes makes them smaller (by approx 2:1) and makes them faster to search (eg, bzr log file being 3x faster).

We also have a chance to land Group Compress, which has shown to compress repositories by as much as 3:1 over their current size. This change needs a bit more tweaking, though. There are generally tradeoffs between how much time you spend compressing, and how small the result is. And we want to make sure that we make the right tradeoffs. It is currently being evaluated as a test plugin.

Bazaar Bug Day

As Bazaar development speeds up, so do the incoming bugs. There are currently 1062 open bugs in Launchpad, and 287 of them have a "New" status, meaning they have not yet been triaged and categorized. At a past Bazaar sprint, a "bug day" was talked about, and it has been brought up again on the mailing list. Often, we fix many bugs and just haven't gotten around to marking them fixed. This is a great opportunity for members of the community who use Bazaar but don't directly develop it t o contribute back to the Bazaar community. You can help out by verifying bugs have been fixed, that they no longer exist, or that they still exist and provide more information on them. Come give your karma a boost, and help us squish some bugs!

Monday, July 28, 2008

Last Week in Bazaar

Well, I'm late this week, so I'm officially marking this post as Last Week in Bazaar. In my defense, I got busy last Thursday, and then my cohort (Paul Hummer) flew off to New Zealand for a work-related sprint. So today, I (John Arbash Meinel, a developer on Bazaar) get to exercise full control over the content.

Keyword Expansion

People often request the ability to expand keywords, like they are used to in SVN and CVS. We've sort of postponed the implementation, because probably 90% of the time, it isn't really the right solution to the problem users are having. Also, they are kind of a mess in CVS anyway. Where I used to work we tried to use $Id$ style expansion, only to find out that they conflict on every attempt at merging, and we started working hard to strip them out of our files. In a distributed VCS, you usually merge at least an order-of-magnitude more often, which also tends to reveal this problem.

SVN at least works around the problem, in that when you commit, it actually strips the texts of their expanded keywords, so that the repository never stores the expanded form. And merges are also done on the non-expanded form. Which fixes that little problem. Though it introduces a couple others. Specifically, what you have on disk is not what sits in the repository, nor is it exactly what you will get out of a fresh checkout. The biggest reason is that if you commit revno 1354, it will update the tags of files that are touched. But if you checkout revno 1354 it will update the tags of *all* files. (I'm not positive on this, but I know there was a bug which was causing problems for people trying to do conversions. Because they couldn't quite find the right invocation to have 'update -r 1354' (from 1353) give the exact same tree as 'checkout -r 1354').

The other reason keyword expansion is not usually what you want, is because it expands only for the given file. If you make a commit to 5 other files, the *tree* is at revno 1359, but the file with your:

my_version = "$Version$";

Tag is still pointing at 1354. (Again, if 'svn update' would force all the tags to get re-expanded it might work correctly, though you run into performance problems expanding every keyword in every file on every update.) Bazaar has supported the bzr version-info command for a while, which lets you generate a version file (possibly from a template) which can store all the real details. Including the last-modified version for every file, whether any files in the working tree have been modified since commit, etc.

The only case that I've really heard a good reason for keyword expansion is for a Website. Where each individual file is spread out into the world. So having a little "last modified" at the bottom can be pretty convenient. You also don't tend to have a "build" process which lets you generate the version information at that time.

However, as Bazaar is meant to be a flexible system, Ian Clatworthy has done a wonderful job of adding the ability to support content munging via plugins. And has continued on to write a plugin specifically for expanding keywords.

http://bazaar-vcs.org/KeywordExpansion

So for all those people who feel they really need keyword expansion, look it up.I would imagine that once people get a good feel for it, and it matures a bit, it has a good chance to be brought into core code. Or at least make it into the release tarball as an "official" plugin.

Open Source, Python, and Counting My Blessings

Now onto something a bit more personal. This last week I had cause to re-visit an old library I had written, and try to get it up and running again. (Specifically, the project was pydcmtk, python wrappers for the Dicom Toolkit.)
It took me several hours times several days just to get it to build and run the test suite again. All without changing any of the code. It was simply a build-step problem.
Which revealed a couple wonderful things about my current work:

I get to work in Python. Which is a nice language, flexible, and *doesn't* need a build step. Not having to deal with C/C++ and all the complexities of getting dependencies built, with the right version of the compiler, and the right flags to the compiler.
Microsoft has a much harder time on their hands than Open Source does, at least when it comes to compatibility. Specifically, each version of their compiler comes with a different runtime. And code compiled for Visual Studio 7.1 doesn't like to work with the 8.0 objects nor the 9.0 objects. And they all have different msvcrtX.X.dll files. However, because the official method for getting your program to users is in binary (object) form, they have to provide ways to support your binary files for a long time. So in VS 8.0 they introduced a new step, which is to post-process your linked binaries with a manifest, declaring what runtimes they use. Further complicating this is that if you try to run a 8.0 compiled dll, it just gives an opaque "This process has tried to access the runtime incorrectly."
Not realizing this, I spent a long time comparing the exact compiler flags with other examples to fix it. (The boost build tool, bjam, knows how to do it, but there was a line "if exists foo.manifest: do stuff", which I originally read as "if not exists foo.manifest: create the manifest.")
Open source has generally handled the binary compatibility issue by punting and requesting software compatibility. And then you have a whole bunch of groups that spend their time recompiling everything for you (distributions like Ubuntu or Red Hat). And then they give you all the dependencies with a few simple commands (apt-get install zlib-dev dcmtk-dev boost-dev). On Windows, if you want to switch to developer mode, you generally have to grab the source code for all of those dependencies, and recompile them for your exact configuration.
Software-level compatibility is *much* easier to handle. Not the least of which because if something becomes incompatible you can fix it. (I remember a Microsoft memory issue, where they had to switch in bug-for-bug compatibility because fixing it broke SimCity, how much better if they could have just patched SimCity.)
Binary compatibility (for C/C++) means that you can't even add members to structs, because then the size changes and malloc starts failing (plus members are referenced by offset, so adding something in the *middle* is a big no-no).
Source-level turns this way down into not removing things people are using. And, if something does change, with source-level you can even write a patch to fix the code. This does make it quite a bit harder for people who want to release binary-only packages, that they then don't have to modify for years. (Though when updating is a simple process, people are willing to do it more.)

1.6rc1 soon to come

We are working on putting the final polish to stacked branches. We are trying to release something that people can feel comfortable using right away, and there are a few tricks to get there. (For example, bzr has a general policy of always preserving the source format when you do 'bzr branch'. It helps maintain compatibility within a project that hasn't chosen to upgrade to a newer format yet. However if you do 'bzr branch --stacked' that indicates you want to use the new feature, so we have to work out logic to create an upgraded target at the right time. This also turns out to conflict a bit with bzr-svn, which had its own logic to trick 'bzr branch' into not copying the source format.)

You can already play with the Stacked Branches feature in the beta releases, but they'll appear much more polished in the final rc.

Thursday, July 17, 2008

This Week in Bazaar

Welcome back to the terrarium of the Bazaar distributed version control system. Written by co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad as he refines his plans for world domination from his shiny new lair.

Bazaar 1.6b3 released

The next beta release of Bazaar has just been cut, and is available at your local PPA:
https://launchpad.net/~bzr/+archive

The Windows installers should be available later today. This release provides lots of the shiny things that we've been talking about, like Stacked Branches, Real Weave Merge, more hooks for server-side operation, and lots of bug fixes and general polishing. The full UI for using stacked branches still needs a little bit of polishing, so the feature is not enabled by default. The functionality is all there, and if you are interested, we'd love to hear from you (kudos and complaints are equally welcome).

New updates to Gnome Bazaar Playground

Coming back from a very productive trip to Guadec, Tim Penhey has been overseeing some customizations to the Bazaar Playground for Gnome. All of the branches created at the local server in Turkey for Guadec have been added to the public playground. The Loggerhead installation has received some TLC by way of customizations to the UI. Accerciser's playground page is a good demonstration af the UI changes that have been made. The playground is actively being used by applications such as Brasero, jhbuild, Metacity and more.

One of the fun results of meeting with people at Guadec, is that it showed ways to improve Loggerhead when dealing with lots of projects and lots of branches. Work is continuing to make customizing Loggerhead's look-and-feel easier, and providing better tools for creating these "Bazaar Playgrounds" to use in evaluating Bazaar. The Bazaar developers are committed with making tools easier to use, and making the process as simple and powerful as possible.

Up and Coming Repository Format Updates

Robert Collins has been hard at work to refine how Bazaar stores its history information. We all like to have deep context, but we don't like to have to pay the penalty of downloading all of that context. Because Bazaar has a flexible repository structure, Robert has been able to play with changing the on-disk structure without major surgery to the rest of the code.

First is a change to how indexes are written, switching from a bisectable list to a btree structure. This paged structure allows us to compress the indexes, making them smaller, and faster to process remotely. It also reduces the number of lookups to find a key. (On average, a bisect search is log₂N, while the btree is closer to log₁₀₀N.) At the moment, he is testing this with a shared repository containing all of the projects available from in the Ubuntu apt repositories. This weighs in at around 13k branches, and somewhere around 20GB of disk space used.

Second is an update to how texts are stored. At the moment we use a simple format which places fulltexts periodically, and then stores deltas against those fulltexts. It has served us rather well, but can be improved upon. With his Group compress work, we can see a savings of as much as 2x-3x. Further, the data is stored such that you can do simple linear reads to get the base fulltext and all deltas necessary to generate a given fulltext. This reduces the pressure on indices, as you don't have to search for base texts. (Instead you just store a pointer to the start, and give the total length that needs to be read.)

These are still in development phase, but a format that uses them will likely appear in the next release (bzr 1.7).

Community Agile

Ian Clatworthy has recently released a wonderful document describing the workflow we (generally) use at Canonical. It describes how basic practices are similar to, and different from, other systems like Agile. The biggest (IMO) being a recognition that the community surrounding your project is one of the strongest and most important pieces. This has always been true in software development, but it has traditionally been somewhat hidden. Open Source has exposed just how powerful the community can be. For people interested in how software can be developed, rather than just what, I certainly recommend it.

Thursday, July 10, 2008

This Week in Bazaar

Here we are again, bringing you the gossip and dirty secrets in the development world of the Bazaar distributed version control system. In this, the 10th week, the series is now under new management, with co-authors John Arbash Meinel, one of the primary developers on Bazaar, and Paul Hummer, who works on integrating Bazaar into Launchpad.

Bundle Buggy

Aaron Bentley has once again been improving his wonderful Bundle Buggy. He just introduced support for multiple projects using a single instance of Bundle Buggy. There are now 5 Bazaar projects using the main bundle buggy instance. (Bazaar, bzr-gtk, Bundle Buggy itself, Bzrtools, and PQM.) Of course, Daniel Watkins has made excellent use of his time, and has managed to crank out lots of updates for PQM. At this point it is code clean up, reducing the dependencies making it easier to set up and install.

Bazaar playground for Gnome

Originally, John Carr set up Bazaar mirrors of all the Gnome modules, which people could then use as a starting point for publishing code and collaborating. This week, the Bazaar playground for gnome was created so that any Gnome developers could be involved in pushing, branching, and sharing code through bazaar. This new server runs Loggerhead for viewing the code committed to these Bazaar branches. Damned Lies is also set up on the playground. This server was also reproduced locally at GUADEC because of the flaky internet connection at the conference, and all those local branches will be moved to the playground shortly.

Weave merging and handling "interesting" history

One of the great things about having a large project like MySQL using your software is that they push and stretch you in ways that you haven't necessarily encountered before. Specifically, their branch workflow looks a bit like a pile of spaghetti. With several long-term maintenance branches, team branches based off of that, and individual developer branches based off of that. Patches have a tendency to travel in unexpected ways (you may go user => team => release 1 => release 2, or you might go release 1 => team => team-2 => release 2, etc). They also are very fond of 'null merging' patches that aren't relevant to the next release. They merge the change and revert the text changes and commit.

Bazaar supports all of this, but it exposes weaknesses in simple 3-way merge logic. Because patches don't flow in anything considered orderly, you don't have the opportunity to select a "clean" base very often. Bazaar has long had an option for doing a "--weave" merge. It didn't receive much attention for a while, and had become rather slow. It turned out to be a good fit for MySQL's workflow, so John has spent a bit of time recently to make the functionality efficient and correct in some specific edge cases. Expect the improvements to show up in the next release.

Thursday, July 3, 2008

This Week in Bazaar

This is the 9th in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, unlicensed health professional. This week we are joined by Paul Hummer, who works on integrating Bazaar into Launchpad.

How to integrate bzr into your build and release process

Once you are happily using bzr on your project, the next step is some basic integration into your build process. A common desire is getting revision number to store during build process, so that you can tell what revision your program was built with. This is easy to do with 'bzr revno', which prints the current revision number. Thats not very exciting though.

There is a much more sophisticated command in bzr called version-info. For example, running:

 bzr version-info --custom \
  --template="#define VERSION_INFO \"Project 1.2.3 (r{revno})\"\n"

Will produce a C header file with formatted string containing the current revision number. Other supported variables in the templates are: date, build date, revno, revision id, branch nickname, and clean (which shows whether the tree contained uncommitted changes). This makes integrating into make or another build system very easy. The templates make it very easy to generate a version file for whatever language you are writing in.

What else could be automated other than version info? The bzr-stats plugin has a credits command. This is useful for getting a list of contributors to fill out a credits page, easter egg, etc. Also, changelogs can be generated with the gnulog plugin.

Andrew Bennetts has been working on a new server side push hook that can be used to run tests before allowing a push to complete. Wow, this could replace PQM! Well, not quite. This is more of a poor-man's PQM. It doesn't scale as well, but would work for smaller teams that don't necessarily need PQM. Blocking push while tests are running is not a good idea if you have a very long test suite, and PQM will merge and commit, making it easier to deal with multiple people trying to merge changes at the same time. If you're working in a very small group (1-3 people) with a smaller test suite, using these hooks might be just the trick, but for a larger work group you should still set up PQM.

Right now PQM is a fair amount of work to set up, but that should be changing soon. Daniel Watkins has started work on making PQM easier to set up and use, and others have been submitting cleanup patches too.

Finally, if you are using bzr on a project that builds .deb packages, check out the builddeb plugin. It would be great to have plugins for other packaging tools as well! RPM, MSI, JAR, WAR, etc.

Thursday, June 26, 2008

This Week in Bazaar

This is the eighth (wow, 2 whole months of solid updates, yipee!) in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who drinks the rain. This week we are joined by Martin Albisetti, talking about Loggerhead, and dreaming of a cold pint.

bzr-search, loggerhead, gnome, and you

Robert Collins recently published his awesome bzr-search plugin, and John Carr has been doing a lot of work on setting up a bzr mirror of Gnome. A neat search module and a bunch of source trees is just begging to be combined in some sort of web interface!

There are a few web front ends for Bazaar at the moment, such as Loggerhead, webserve, viewbzr, and bzrweb. Today we are going to be focusing on Loggerhead (you can also go to its Launchpad project page to watch the development activity). It is probably the one with the most active development at the moment. An installation of the latest stuff in action is available at the bzr mirror of Gnome. Loggerhead shows side-by-side diffs, has RSS feeds, and lets you download specific changes, just like you would expect.

You can get the latest version of it yourself by doing:
bzr branch lp:loggerhead
You'll need python-simpletal and python-paste. Then by running "serve-branches.py" in the directory where you're branches live, you should be up and running with your own web interface. Eventually serve-branches.py is to expected to become a bzr plugin which will let you easily serve your branches with a single bzr command.

We hinted at it above; recent versions have started integrating with bzr-search. So for branches that you've run "bzr index" on, it can give hints in the search dialog, and quickly find revisions that match your search terms. You can try it yourself by just typing a few letters into the search dialog.

In the coming weeks, Loggerhead will be getting a bit of a face lift with a new theme to make its externals as shiny and new as its internals.

So give it a poke, and send any feedback to either bazaar@lists.canonical.com, or https://bugs.launchpad.net/loggerhead.

Thursday, June 19, 2008

This Week in Bazaar

This is the seventh in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who is sentimental today.

MySQL Switches to Bazaar

Very big news for the Bazaar team today, as MySQL announces switching from Bitkeeper to Bazaar.

One of the things that was important in doing this conversion was doing a very high quality import of all the existing history. John did a great job working on that, and even added a new feature to Bazaar and bzr-gtk to enable this: per-file commit messages. Since per-file commit messages had been used for years in the MySQL code base, it was not acceptable to lose them, and none of the DVCS systems under consideration supported these messages. Although this feature is debated by some, it was important to preserve that history, and so support for per-file commit messages was added to Bazaar in a non-invasive way, where projects who wanted to use them could, but existing projects were not forced to adopt them. At the moment, to enter per-file commit messages you need to use the bzr-gtk GUI commit tool, but we'd love it if someone came up with a clean way to enable this in the standard CLI also.

It was also important to have a smooth transition period that did not interrupt delivering MySQL releases. This meant we needed a stable importer where the imports could be periodically refreshed without causing all of the developers around the world working on the project to re-download all their trees. At one point we were doing continuous imports of over 30 trees.

It's been a fun and challenging project providing support to MySQL during this time. Although we're really excited about this milestone, we still have plenty of work to do. Here are a few things we've learned, where we are working to make Bazaar even better.

Stacked Branches. We've talked previously about stacked branches, and for a project like MySQL this new feature will make uploading a new branch to Launchpad much faster.

Merging - Bazaar has several good merge algorithms, but we still have some ideas to make merging go even smoother, particularly for some of the complicated ancestries that MySQL has. All merge algorithms have their own set of trade offs, edge cases that they handle better or worse than other algorithms.

We also need to continue to add GUI tools, and make further enhancements to existing tools. If you are looking for a valuable way to contribute to Bazaar, try lending a hand to one of the Bazaar GUI projects.

Last week we asked about bzr screencasts, and James Westby told me about a screencast that he recorded - if anyone else is interested in getting involved in producing a series of screencasts, please do let us know.

Wednesday, June 11, 2008

This Week in Bazaar

This is the sixth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who just wants a nice story and a nap.

1.6 on the way

We decided to change the release process a bit for the bzr 1.6 release. We're introducing a bit more than normal in this relase (such as Stacked Branches), so we've decided to delay the final release a couple of weeks to ensure that everything gets an extra coat of polish. We've already had 2 beta releases, which are available in the Source.

Please give it a poke and let us know what you think.

Diff and Merge Tools

When you start working with other people on a project, you need some way of seeing what code has changed, doing code reviews, resolving conflicts, etc. The 'bzr diff' command has a '--using=foo' argument that allows you to plug in your favorite diff/merge tool if you don't want the built-in text based diff. You can also add an alias for your favorite tool. For example, Elliot uses meld all the time, so he has 'alias mdiff=diff --using=meld'. You also might want to install the difftools plugin, which adds some smarts to Bazaar about whether a particular tool understands how to diff a full tree or needs to handle the files one at a time. Here are some of the more interesting diff tools that you might want to try out:

Meld
Kdiff3
vimdiff
Wikipedia lists many more file comparision tools

One technique for easily reviewing a lot of incoming code is to keep around a pristine branch of your project that you use for conducting reviews. You can apply a patch to the tree, then run 'bzr mdiff' (or your own favorite tool), and take a look at all the changes in the patch with a lot more context than is included in the patch itself. This also gives you a spot to run the automated tests for that project, see if it compiles, etc. Once you are done with the review you can simply 'bzr revert' to get back to a clean tree and move on to the next patch to be reviewed.

Another neat trick is to use the 'merge --preview' switch. You might want to use this command to take a look at any conflicts that might have been introduced if there have been changes since the patch was generated. It shows you the patch of exactly what would be merged into the branch at that moment in time, which can sometimes have differences from what you would be reviewing by reading the patch.

Another interesting (but commercial) tool is Changes.app. It is a Mac OS X client which integrates with Finder and provides a comparison tool. It has direct support for Bazaar as well as several other version control systems.

Screencasts

Screencasts are becoming a very popular way to show people how to use your fancy tool, and we'd like to get some volunteers to help with putting together some screencasts explaining how to use various parts of bzr and related tools. If you want to help with this, email elliot at canonical dot com. The great thing about screencasts is that they use a different avenue for conveying information (audio, motion, etc) so while it won't replace a written tutorial, it is a wonderful supplement.

Thursday, June 5, 2008

DVCS Comparison: On mainline merges and fast forwards

DVCS Comparison: On mainline merges and fast forwards has a discussion about whether 'fast forward' is a "better" method for merging in a distributed topology.

I can understand where he is coming from, and we respect that some users prefer other workflows. Bazaar even has direct support for 'fast forward' with 'bzr merge --pull', and with our aliasing functionality, you can set:

[ALIASES]
merge = merge --pull

In ~/.bazaar/bazaar.conf and change the default meaning of 'bzr merge'. However, I still fall of the side of the fence that fast forward should not be the default.

I can agree that if you have 2 people collaborating on the same feature that you would want fast forward. Though I would argue that is because they are effectively working on the same branch. For my personal workflow, I have a different alias set:

log = log --short -r -10..-1 --forward

What this means is that when I type 'bzr log' I see just the mainline commits of a branch, without the merge cruft. (Where I define the merge cruft as the individual revisions that make up a feature change, not the 'merge foo' node.)

Take this view of bzr.dev:

3466 Canonical.com Patch Queue Manager 2008-06-02 [merge]
    (jam) Give Aaron the benefit of bug #202928

3467 Canonical.com Patch Queue Manager 2008-06-03 [merge]
    (Martin Albisetti) Better message when a repository is locked.

3468 Canonical.com Patch Queue Manager 2008-06-03 [merge]
    (mbp) merge 1.6b1 back to trunk

3469 Canonical.com Patch Queue Manager 2008-06-04 [merge]
    (mbp) Update more users of default file modes from control_files to bzrdir

3470 Canonical.com Patch Queue Manager 2008-06-04 [merge]
    (Jelmer) Move update_revisions() implementation from BzrBranch to
      Branch.

3471 Canonical.com Patch Queue Manager 2008-06-04 [merge]
    (vila) Split a test

3472 Canonical.com Patch Queue Manager 2008-06-04 [merge]
    (jam) Fix bug #235407, if someone merges the same revision twice,
      don't record the second one.

3473 Canonical.com Patch Queue Manager 2008-06-05 [merge]
    Isolate the test HTTPServer from chdir calls (Robert Collins)

3474 Canonical.com Patch Queue Manager 2008-06-05 [merge]
    Add the 'alias' command (Tim Penhey)

3475 Canonical.com Patch Queue Manager 2008-06-05 [merge]
    (mbp) #234748 fix problems in final newline on Knit add_lines and
      get_lines

You get to see a nice short summary of everything that has been happening (in proper chronological order.) Admittedly, seeing "Patch Queue Manager" on each of those commits is less optimal (which is why we add the author names.) That is just a temporary limitation of our PQM. Bazaar already supports setting an --author, separate from the committer, we just need to teach our integration bot to use it.

The big difference, IMO, is whether you are bringing in someone else's changes to enhance your work, or whether you are collaborating on the same item. I would argue that collaborating on the same item is slightly less common. It also depends what you do with the merge commits. Just saying "merge from branch A" is certainly not helpful. But when you can say "merge Jerry's changes to Command.foo", it can indeed be helpful when tracking back through and figuring out where and when "foo" changed, without being lost in the forest for having too many trees.

This Week in Bazaar

This is the fifth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot "fresh needle" Murphy. The two topics for this week are not related, but it's our blog and we get to write what we want.

Hosting of Bazaar branches

One of the first questions people ask when moving to Bazaar is "Where can I host my branches?" Even with distributed revision control, it is often handy to have a shared location where you publish your code, and merge code from others. Canonical has put a lot of work into making launchpad.net an excellent place to host code, but there are many other options available.

Because bazaar supports "dumb" transports like sftp, you can publish your branches anywhere that you can get write access to some disk space. For example, sourceforce.net gives projects some web space with sftp access, and you can easily push branches up over sftp. It's also easy to use bzr on any machine that you have ssh access to, you don't even need to install bazaar on the remote machine.

As bazaar is a GNU project, we've been working with the Savannah team to enable bazaar hosting on Savannah also.

Another option is serving bazaar branches over HTTP. You can do this for both read and write access, and there is a great HOWTO in the bazaar documentation. Do you know of anywhere else that is offering Bazaar hosting? Let us know in the comments!

Bazaar review and integration process

How do you ensure high quality code, when working on a fast moving codebase in a widely distributed team? Here are some things that we've been doing with the Bazaar project, and we think they are useful practices for most projects.

Automated Test Suite

One very important key towards having a stable product is proper testing. As people say "untested code is broken code". In the Bazaar project, we recommend that developers use Test Driven Development as much as possible. However, what we *require* is that all new code has tests. The reason it is important for the tests to be automated, is because it transfers knowledge about the code base between developers. I can run someone else's test, and know if I conformed to their expectations about what this piece of code should do.

This actually frees up development tremendously. Especially when you are doing large changes. With a good test suite, you can be confident that after your 2000 line refactoring, everything still works as before.

Code Review

Having other people look at your changes is a great way to catch the little things that aren't always directly testable. It also helps transfer knowledge between developers. So one developer can spend a couple weeks experimenting with different changes, and then there is at least one other person who is aware of what those are.

The basic rules for getting code merged into Bazaar is that:

It doesn't reduce code clarity
It improves on the previous code
It doesn't reduce test coverage
It must be approved by 2 developers who are familiar with the code base.

We try to apply those rules to avoid hitting the rule "The code must be perfect before it is merged", and the associated project stagnation. Code review is a very powerful tool, but you have to be cautious of "oh, and while you are there, fix this, and that, and this thing over here." Sometimes that is useful to catch things that are easy (drive-by fixes). It can also lead to long delays before you actually get to see the improvements from someone's work, and long delays are demotivating.

Item number 3 is a pragmatic way to approach how much testing is required. In general, the test coverage should improve, jsut like the code quality. But that doesn't mean you have to test all code 100 different ways.

Integration Robot (PQM)
Now that you have a good automated test suite, and proper code reviews, the next step is to make sure that you have a version of the code base which has all the tests passing. Often when developing a new feature, it is quite reasonable to "break" things for a bit, while you work out just how everything should fit together. Requiring the tests to pass at each level of development puts an undue burden on developers, preventing them from publishing their changes (to get feedback, to snapshot their progress, etc.) Very often I commit when things are still somewhat broken, as it gives me a way to 'bzr revert' back to something I wanted to try.

However, you don't want the official releases of your project to have all of these broken bits in them. The Bazaar project uses a "Patch Queue Manager". Which is simply a program that responds to requests to merge changes. When your patch has passed the review stages, you submit it to the PQM, which grabs your changes, applies them, runs the full test suite, and commits the changes to "mainline" if everything is clean.

The reasons to use a robot are:

Humans are very tempted to say "ah, this is a trivial fix, I'll just merge it". Without realizing there is a subtle typo or far-reaching effect. When you have a large test suite, it can often take a while to run all the tests (the bzr.dev test suite runs in 5-10 minutes, but some projects have test suites that take hours.) Having a program doing the work means a human is relieved of the tedium of checking it.
There is generally only a single mainline, but there may be 50 developers doing work on different branches. When they all want to merge things, it isn't feasible to require the users to run the test suite with the latest development tip. If the development pace is fast enough versus the time to run the test suite, you can get into an "infinite loop". Where you merge in the latest tip, wait for the tests to pass, and by the time you go to mark this as the new mainline tip, someone else beat you to it. And you go around again. PQM does this for you in a fairly natural way.
Running in a "clean" environment is a safety net for when you forget about a new dependency that you added.
There are similar ways to do this, such as Cruise Control for Subversion. There is one key difference, though. With Cruise Control, you find out after the fact that your mainline has broken. With PQM, we know that every commit to the mainline of Bazaar passed the test suite at that time. This helps a lot when tracking down bugs. It also helps with "dogfooding"...

Dogfooding

If you want people to do regular testing of the development version, it must be easy to run different versions of the project without needing a complex install. Bazaar does this by being runnable directly out of the source tree, without any need to set $PYTHONPATH or mess around with installing different versions. You can also easily change out the plugins that are loaded using the BZR_PLUGIN_PATH env variable. This means that developers can run the latest development version, and easily switch to a particular version when trying to reproduce a bug or help a user.

By having the PQM running the test suite, developers can run on the bleeding edge, and know that they won't get random breakage. It is always possible that something will break, but the chance is quite low. (In 2000 or so commits since we started using PQM, I believe bzr.dev has never been completely unusable, and has had < 5 revisions which we would not recommend people use.)

It also means that you can be fairly confident in creating a release directly from your integration branch (mainline).

Thursday, May 29, 2008

This Week in Bazaar

This is the fourth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, imaginary boy and part-time impostor.

Stacked branches

Some projects are very big with lots of files and lots of history. Many projects want to maintain the policy that development is done on independent branches, which are then merged back when the development is complete. However, the overhead of downloading, branching, and uploading the full history is prohibitive. There are a couple of different ways to solve this problem.

Dealing with a large branch can be split into two problems: downloading and uploading.

Bazaar has had a storage optimization called shared repositories for quite a while. This serves to dramatically reduce the amount of data downloaded for the second, third, etc branches of a project. A shared repository is a big pool of revisions which multiple branches point to. When you grab a new branch into a shared repository, bzr figures out how much of the history it already has, and only downloads the new revisions. So the first branch of a large project transfers most of the data, and grabbing additional branches is very cheap. In extreme cases, like working on a multi-gigabyte project from a 56k dial-up connection, you could even do things like distribute the initial data on a DVD to prime the shared repository, and then the user only needs to download incremental changes.

This technique can also be used for solving the uploading problem. If the upload location uses a shared repository, then uploading a new branch can just copy the new data. The problem with this, is once you start introducing multiple users, who decide that they may not want to give access to other people to push data into their repository.

Another approach to minimizing the data uploaded is called server side forking, and you can see a nice implementation of this on github.com. The user places a request with the code host to do the copy for them, and when it finishes, they have their own location already primed with the current branch.

The Bazaar project is approaching it in a different way. If some data is already public, then you can just reference the other public location when you start uploading your new branch. The first steps in this direction are being termed "Stacked Branches". Basically, instead of requiring all branches to contain the full history, you are allowed to "Stack" a branch on top of another. Because the uploader does not have write access to the lower levels of the stack, this addresses the security risks of shared repositories.

Stacking also opens up possibilites for the "download" side of the equation. For many users, they don't need a very deep copy of history to get their work done. If there is a location that can be trusted to be available when they need it, they can copy just the tip revisions. Which would allow them to do most of their work (commit, diff, etc) without consulting the remote host. And when they need more information (such as during a complex merge), the bzr client is able to fall back to the original source to get any needed information.

The goal of all this is to make it very easy to start working with a large project, while still making all the history available in a meaningful way. The bulk of this work has been completed, and it is likely that it will land in bzr 1.6 (to be released in a couple of weeks.)

Thursday, May 22, 2008

This Week in Bazaar

This is the third in an amazingly regular weekly series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, Launchpad developer and relentless agitator. This week we have a special guest, Jelmer Vernooij, Samba developer, and author of the Bazaar Subversion plugin.

In last week's episode, our fearless explorers braved the new world of plugins. Today we will focus on a specific plugin, and talk about how you can use Bazaar with Subversion. Earlier this week there was a very nice blog post about using Git with the Subversion servers on Google Code Hosting, and plenty of interesting discussion afterwards.

Rationale

If you have Bazaar installed, why would you want to work with Subversion? Well, it's nice not to have to force the whole world to change at once. Bazaar-Subversion integration allows you to use Bazaar without any changes required from the project administrators to the central Subversion server.

There are three general cases, where you would want to use bzr-svn:

Upstream uses Subversion, and you don't yet have commit access. With bzr-svn, you are able to still make your improvements with all the benefits of a great VCS.
Project has chosen to use Subversion, you want something better, but still want to play nice with your fellow developers. You can commit to your local Bazaar branch, and push those changes back into Subversion. You can even do "bzr commit" in your Subversion checkout and have it commit those changes to the Subversion server.
Migration from Subversion to Bazaar. Often when migrating from once VCS to another, there is a period of time where people are adjusting to the new system. bzr-svn allows you to continue allowing people to commit to Subversion, it's just another branch with changes to be merged.

Overview
Currently the bzr-svn dependencies can be a bit tricky to install on some platforms, but that should be much easier once Subversion 1.5 is released. Once you get things installed, it's pretty amazing what you can do. On most debian based systems, it is a simple "apt-get install bzr-svn" away.

Once you have bzr-svn installed, you can start using Subversion branches as though they were regular Bazaar branches.

General usage

Now that you have bzr-svn installed, how do you get a local copy of your Subversion project? Generally, it is just a "bzr checkout URL" away.

$ bzr checkout svn+https://your-project.googlecode.com/svn/trunk

This will create a local checkout of your project that contains a local copy of the history present remotely.

You should now be able to use this branch like any regular Bazaar branch. Since this is a bound branch, any commits you make will also be show up in the Subversion repository.

It is possible to create new local branches from this branch, for example for feature branches::

$ bzr branch trunk feature-1

And to merge the branch back into Subversion once it is finished, you can use merge like you would with any ordinary Bazaar branch

$ bzr merge ../feature-1
$ bzr ci -m "Merge feature-1"

In addition to the code changes, bzr-svn will write metadata about the history of the new commit into Subversion. This means that your merge history is available, so when someone else comes along and grabs a copy of the branch using Bazaar, they can see what happened. To a normal Subversion client this is transparent, the custom properties are simply ignored.

It is also possible to push directly from the feature branch into Subversion::

$ bzr push http://subversion/project

This will preserve all of the history from the branch you are pushing - there is no need to rebase your local branch after pushing.

Since bzr-svn allows access to Subversion protocols and file formats using the standard Bazaar API, it is possible to use most standard Bazaar commands directly on Subversion formats and URLs. Commands like "bzr missing", "bzr log", or even "bzr viz" work out of the box.

Miscellaneous

Some bits and pieces to pique your interest in bzr-svn.

Subversion 1.5 introduces custom revision properties - this should allow bzr-svn to hide the properties used to store merge information. (At the moment, the file properties used show up in commit emails.)
Bazaar will soon be introducing shallow (stacked) branches. This will allow you to have a fully functioning local branch (including offline commits, etc), without needing to download the complete history to your local machine.
Bzr for GNOME developers is a quick guide for people who want to use Bazaar for developing with the Subversion Gnome repository.
Bazaar branches of Python are available. They are currently using bzr-svn to mirror the Subversion branches, allowing their developers to see what life is like developing with Bazaar.

For more information, check out the bzr-svn home page, FAQ, bug tracker, or join us on the Bazaar mailing list.

Next week: how to print money with Bazaar.

Thursday, May 15, 2008

This Week in Bazaar

This is the second in a mostly-every-week series of posts about whats been happening in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, Launchpad developer and compulsive conflict avoider.

Plugins

One of the nice things about Bazaar is the API, which enables new features to be added with plugins. Once a feature is polished and proves widely useful, it can move from a plugin into core bazaar. Most of the plugins are hosted/mirrored on Launchpad, and are a simple "bzr branch lp:bzr-plugin ~/.bazaar/plugins/plugin" away. For the rest, they are indexed at http://bazaar-vcs.org/BzrPlugins. Here's a quick summary of some of the plugins we are using on our laptops right now:

bookmarks: This allows me to store an alias for a branch location, so it is easier to branch/push to a common location. So I can type 'bzr get bm:work/foo' instead of 'bzr get bzr+ssh://server.example.com/dev/stuff/foo'

bzrtools: a collection of commands which provide extended functionality. Such as 'bzr cdiff' to display colored diffs and 'bzr shelve' to temporarily revert sections of changes.

difftools and extmerge: These plugins let me view differences in meld or kdiff3 (or anything that you want to configure, really), and do merges via meld.

email: Keep people informed of what you are working on by sending an email after every commit.

fastimport: This plugin allows me to import code from my friends mercurial repository and push it to launchpad.

git: this gives me read access to a local git repository

gtk: This is the Bazaar Gtk GUI, which has some nice tools like visualize and gcommit.

htmllog: Useful for generating html formatted logs for publishing on the web.

loom: Allows me to manage several "layers" of development in a single branch, and colloborate on those layers with other people.

notification: Gives a GUI popup when a pull or push completes

pqm: This formats a merge request to PQM. PQM then takes my branch, merges to main, runs tests, and commits the merge if all was well. This ensures that we always have passing tests in the main tree!

push_and_update: This updates the working tree when I push my branch to a remote server. Very useful for doing website updates.

removable: I try to keep all branches very small for easier review, so I have a lot of branches at one time. This tells me which branches have already been merged to the main tree (and thus can be removed). It can also let me know why something is not ready to be removed.

stats: Provides 'bzr stats' which gives a simple view of how many people have committed to your project and how many commits each has done.

update_mirrors: 'bzr update-mirrors' recursively scans for Bazaar branches and updates them to their latest upstream.

vimdiff: Adds the commands 'bzr vimdiff' and 'bzr gvimdiff'. Which opens vim in side-by-side mode, showing you your changes.

qbzr: Another great GUI for bzr, this one is written using Qt.

1.5rc1, 1.5 this Friday

Continuing our pattern of having time-based releases, bzr 1.5rc1 was released last Friday, and 1.5 final should be released tomorrow. Ever wonder how we churn out releases so regularly? The biggest factor enabling us to make consistent releases is our use of a Patch Queue Manager. It ensures that all of our 11,724 unit tests pass before allowing any merge into mainline. Even when lots of changes are landing, the trunk can be considered release quality. Most of the developers use the tip of mainline for their day-to-day work, which means that any changes get immediate use, rather than waiting for a release candidate.

By releasing every month, we have reduced the tendency to rush patches, trying to sneak them in before the next release. We know that there will be another release just around the corner, so we can land complex patches right after a release. For each release cycle, we have 3 weeks of "open" development, where any approved (peer reviewed) patch can be merged. Then we have a feature freeze week, where only bug fixes are supposed to be merged. At the end of the freeze week, we release RC1 and reopen mainline for development. If no regressions are found in RC1, it is tagged as final and released after one week.

The bzr-1.5 release is mostly focused on fixing small ui bugs, a couple of performance improvements, and some documentation updates.

(edit: 2008-05-16, the merged plugin changed and is now called bzr-removable)

Wednesday, May 14, 2008

Creating a new Launchpad Project (redux)

A while back I posted about how to set up a new launchpad project. At the time it took quite a few steps to set everything up that you wanted. I'm happy to report that a lot of those steps have been streamlined, so I posting a new step-by-step instruction for setting up your project in Launchpad.

Make sure the project isn't already registered. A lot of upstream projects have already been registered in Launchpad, as it is used to track issues in Ubuntu. So it is always good to start on the main page and use the search box "Is your project registered yet?".
If you don't find your project, there will be a link to Register a new project
The form for filling out your project details has been updated a bit, but you should know the answers. (I still use 'bazaar' as the "part of" super-project, and bzr-plugin-name for my plugins)
This is where things start to get easier. After you have registered the project you can follow the Change Details link. This is generally https://launchpad.net/PROJECT/+edit. It was the same before, but now more information is on a single page, so you can set up more at once. Here I always set the bug tracker to be Launchpad, I click the boxes to opt-in for extra launchpad features.
Optionally you can assign the project to a shared group. Follow the "Change Maintainer" link (https://launchpad.net/PROJECT/+reassign). I generally assign them to the bzr group, because I don't want to be the only one who can update information.
At this point you should be able to push up a branch to be used as the mainline using:
bzr push lp:///~GROUP/PROJECT/BRANCH
in my example, this is lp:///~bzr/PROJECT/trunk. (You may need to run 'bzr launchpad-login' so that bzr knows who to connect as, rather than using anonymous http:// urls)
You now want to associate your mainline branch with the project, so that people can use the nice lp:///PROJECT urls. You can follow the link on your project page for the "trunk" release series (usually this is https://launchpad.net/PROJECT/trunk) On that page is a "Edit Source" link, or https://launchpad.net/PROJECT/trunk/+source.
Set the official release series branch to your new ~GROUP/PROJECT/BRANCH.

See, now it is only 7 steps instead of 11. (Though only really one or two steps has actually changed.)

Thursday, May 8, 2008

This Week In Bazaar First Edition

This is the first in a mostly-every-week series of posts about whats been happening in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, Launchpad developer and wanted criminal.

We get to talk about anything we want. This week:

What's been happening for a better GUI on Windows
What's new in the 1.4 release
Importing from other VCS's with bzr fast-import

... details ...

GUI on Windows

We found this guy named Mark Hammond who claims to know how to make python stuff work well on windows. There is an existing GUI tool for Bazaar on Windows called TortoiseBZR now, modeled after TortoiseSVN. If you haven't used a Tortoise before, they are extensions that integrate into Windows Explorer; allowing you to see and control the versioning of your files without needing to change to a separate tool.

Mark has taken a look and proposed a series of enhancements to make the tool work even better. Bazaar already works very well from the Windows command prompt, but we want to provide excellent GUI tools as well. Take a look at the TortoiseBZR web page for screenshots of it in action.

What's new in the 1.4 release

The Bazaar team releases a new version of Bazaar just about every month, with both bugfixes and new features. The bzr-1.4 release came out last Thursday, May 1st.

The major changes for 1.4 include improvements in performance of 'log' and 'status', and a new Branch hook called post-change-branch-tip, which will trigger any time a Branch is modified (push, commit, etc). This should enable server generated emails whenever somebody publishes their changes. Write something cool with it and tell us what you did!

The full list of changes for 1.4 can be found at: https://launchpad.net/bzr/1.4/1.4
The list of all changes is at http://doc.bazaar-vcs.org/bzr.dev/en/release-notes/NEWS.html

bzr fast-import

Bazaar fast-import is a plugin for bazaar that allows you to import from many different version control systems. The fast-import stuff is intended to support any system that can use the fast-export format. This format was originated by git developers, and quickly adopted elsewhere. So if a source format can generate a "fast-import" stream, you should be able
to import it into Bazaar.

CVS
To convert from cvs, you currently use the cvs2svn converter. Which has a flag to generate a "fast-import" stream.
Mercurial
There is a script called hg-fast-export.py bundled with the plugin (in the exporters/ directory).
SVN
The svn-fast-export script is also bundled with the bzr-fastimport plugin.
git
Bundled with the standard git distribution is the git-fast-export command.
Your own exotic system here.

Give fast-import a try. It's mostly designed for 1-time conversions, rather than mirroring, but there are already some rudimentary mirroring capabilities.

That's all for the first installment of "This Week in Bazaar".

(edited for formatting)

John Arbash Meinel's Bazaar Blog

Blog Archive

Contributors

Thursday, August 14, 2008

This Week in Bazaar

Monday, July 28, 2008

Last Week in Bazaar

Thursday, July 17, 2008

This Week in Bazaar

Thursday, July 10, 2008

This Week in Bazaar

Thursday, July 3, 2008

This Week in Bazaar

Thursday, June 26, 2008

This Week in Bazaar

Thursday, June 19, 2008

This Week in Bazaar

Wednesday, June 11, 2008

This Week in Bazaar

Thursday, June 5, 2008

DVCS Comparison: On mainline merges and fast forwards

This Week in Bazaar

Thursday, May 29, 2008

This Week in Bazaar

Thursday, May 22, 2008

This Week in Bazaar

Thursday, May 15, 2008

This Week in Bazaar

Wednesday, May 14, 2008

Creating a new Launchpad Project (redux)

Thursday, May 8, 2008

This Week In Bazaar First Edition