Thursday, June 26, 2008

This Week in Bazaar

This is the eighth (wow, 2 whole months of solid updates, yipee!) in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who drinks the rain. This week we are joined by Martin Albisetti, talking about Loggerhead, and dreaming of a cold pint.

bzr-search, loggerhead, gnome, and you

Robert Collins recently published his awesome bzr-search plugin, and John Carr has been doing a lot of work on setting up a bzr mirror of Gnome. A neat search module and a bunch of source trees is just begging to be combined in some sort of web interface!

There are a few web front ends for Bazaar at the moment, such as Loggerhead, webserve, viewbzr, and bzrweb. Today we are going to be focusing on Loggerhead (you can also go to its Launchpad project page to watch the development activity). It is probably the one with the most active development at the moment. An installation of the latest stuff in action is available at the bzr mirror of Gnome. Loggerhead shows side-by-side diffs, has RSS feeds, and lets you download specific changes, just like you would expect.

You can get the latest version of it yourself by doing:
bzr branch lp:loggerhead
You'll need python-simpletal and python-paste. Then by running "serve-branches.py" in the directory where you're branches live, you should be up and running with your own web interface. Eventually serve-branches.py is to expected to become a bzr plugin which will let you easily serve your branches with a single bzr command.

We hinted at it above; recent versions have started integrating with bzr-search. So for branches that you've run "bzr index" on, it can give hints in the search dialog, and quickly find revisions that match your search terms. You can try it yourself by just typing a few letters into the search dialog.

In the coming weeks, Loggerhead will be getting a bit of a face lift with a new theme to make its externals as shiny and new as its internals.

So give it a poke, and send any feedback to either bazaar@lists.canonical.com, or https://bugs.launchpad.net/loggerhead.

Thursday, June 19, 2008

This Week in Bazaar

This is the seventh in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who is sentimental today.

MySQL Switches to Bazaar

Very big news for the Bazaar team today, as MySQL announces switching from Bitkeeper to Bazaar.

One of the things that was important in doing this conversion was doing a very high quality import of all the existing history. John did a great job working on that, and even added a new feature to Bazaar and bzr-gtk to enable this: per-file commit messages. Since per-file commit messages had been used for years in the MySQL code base, it was not acceptable to lose them, and none of the DVCS systems under consideration supported these messages. Although this feature is debated by some, it was important to preserve that history, and so support for per-file commit messages was added to Bazaar in a non-invasive way, where projects who wanted to use them could, but existing projects were not forced to adopt them. At the moment, to enter per-file commit messages you need to use the bzr-gtk GUI commit tool, but we'd love it if someone came up with a clean way to enable this in the standard CLI also.

It was also important to have a smooth transition period that did not interrupt delivering MySQL releases. This meant we needed a stable importer where the imports could be periodically refreshed without causing all of the developers around the world working on the project to re-download all their trees. At one point we were doing continuous imports of over 30 trees.

It's been a fun and challenging project providing support to MySQL during this time. Although we're really excited about this milestone, we still have plenty of work to do. Here are a few things we've learned, where we are working to make Bazaar even better.

Stacked Branches. We've talked previously about stacked branches, and for a project like MySQL this new feature will make uploading a new branch to Launchpad much faster.

Merging - Bazaar has several good merge algorithms, but we still have some ideas to make merging go even smoother, particularly for some of the complicated ancestries that MySQL has. All merge algorithms have their own set of trade offs, edge cases that they handle better or worse than other algorithms.

We also need to continue to add GUI tools, and make further enhancements to existing tools. If you are looking for a valuable way to contribute to Bazaar, try lending a hand to one of the Bazaar GUI projects.

Last week we asked about bzr screencasts, and James Westby told me about a screencast that he recorded - if anyone else is interested in getting involved in producing a series of screencasts, please do let us know.

Wednesday, June 11, 2008

This Week in Bazaar

This is the sixth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot Murphy, who just wants a nice story and a nap.

1.6 on the way

We decided to change the release process a bit for the bzr 1.6 release. We're introducing a bit more than normal in this relase (such as Stacked Branches), so we've decided to delay the final release a couple of weeks to ensure that everything gets an extra coat of polish. We've already had 2 beta releases, which are available in the Source.

Please give it a poke and let us know what you think.


Diff and Merge Tools

When you start working with other people on a project, you need some way of seeing what code has changed, doing code reviews, resolving conflicts, etc. The 'bzr diff' command has a '--using=foo' argument that allows you to plug in your favorite diff/merge tool if you don't want the built-in text based diff. You can also add an alias for your favorite tool. For example, Elliot uses meld all the time, so he has 'alias mdiff=diff --using=meld'. You also might want to install the difftools plugin, which adds some smarts to Bazaar about whether a particular tool understands how to diff a full tree or needs to handle the files one at a time. Here are some of the more interesting diff tools that you might want to try out:


One technique for easily reviewing a lot of incoming code is to keep around a pristine branch of your project that you use for conducting reviews. You can apply a patch to the tree, then run 'bzr mdiff' (or your own favorite tool), and take a look at all the changes in the patch with a lot more context than is included in the patch itself. This also gives you a spot to run the automated tests for that project, see if it compiles, etc. Once you are done with the review you can simply 'bzr revert' to get back to a clean tree and move on to the next patch to be reviewed.

Another neat trick is to use the 'merge --preview' switch. You might want to use this command to take a look at any conflicts that might have been introduced if there have been changes since the patch was generated. It shows you the patch of exactly what would be merged into the branch at that moment in time, which can sometimes have differences from what you would be reviewing by reading the patch.

Another interesting (but commercial) tool is Changes.app. It is a Mac OS X client which integrates with Finder and provides a comparison tool. It has direct support for Bazaar as well as several other version control systems.


Screencasts

Screencasts are becoming a very popular way to show people how to use your fancy tool, and we'd like to get some volunteers to help with putting together some screencasts explaining how to use various parts of bzr and related tools. If you want to help with this, email elliot at canonical dot com. The great thing about screencasts is that they use a different avenue for conveying information (audio, motion, etc) so while it won't replace a written tutorial, it is a wonderful supplement.

Thursday, June 5, 2008

DVCS Comparison: On mainline merges and fast forwards

DVCS Comparison: On mainline merges and fast forwards has a discussion about whether 'fast forward' is a "better" method for merging in a distributed topology.

I can understand where he is coming from, and we respect that some users prefer other workflows. Bazaar even has direct support for 'fast forward' with 'bzr merge --pull', and with our aliasing functionality, you can set:

[ALIASES]

merge = merge --pull

In ~/.bazaar/bazaar.conf and change the default meaning of 'bzr merge'. However, I still fall of the side of the fence that fast forward should not be the default.

I can agree that if you have 2 people collaborating on the same feature that you would want fast forward. Though I would argue that is because they are effectively working on the same branch. For my personal workflow, I have a different alias set:

log = log --short -r -10..-1 --forward

What this means is that when I type 'bzr log' I see just the mainline commits of a branch, without the merge cruft. (Where I define the merge cruft as the individual revisions that make up a feature change, not the 'merge foo' node.)

Take this view of bzr.dev:
3466 Canonical.com Patch Queue Manager 2008-06-02 [merge]
(jam) Give Aaron the benefit of bug #202928

3467 Canonical.com Patch Queue Manager 2008-06-03 [merge]
(Martin Albisetti) Better message when a repository is locked.

3468 Canonical.com Patch Queue Manager 2008-06-03 [merge]
(mbp) merge 1.6b1 back to trunk

3469 Canonical.com Patch Queue Manager 2008-06-04 [merge]
(mbp) Update more users of default file modes from control_files to bzrdir

3470 Canonical.com Patch Queue Manager 2008-06-04 [merge]
(Jelmer) Move update_revisions() implementation from BzrBranch to
Branch.

3471 Canonical.com Patch Queue Manager 2008-06-04 [merge]
(vila) Split a test

3472 Canonical.com Patch Queue Manager 2008-06-04 [merge]
(jam) Fix bug #235407, if someone merges the same revision twice,
don't record the second one.

3473 Canonical.com Patch Queue Manager 2008-06-05 [merge]
Isolate the test HTTPServer from chdir calls (Robert Collins)

3474 Canonical.com Patch Queue Manager 2008-06-05 [merge]
Add the 'alias' command (Tim Penhey)

3475 Canonical.com Patch Queue Manager 2008-06-05 [merge]
(mbp) #234748 fix problems in final newline on Knit add_lines and
get_lines
You get to see a nice short summary of everything that has been happening (in proper chronological order.) Admittedly, seeing "Patch Queue Manager" on each of those commits is less optimal (which is why we add the author names.) That is just a temporary limitation of our PQM. Bazaar already supports setting an --author, separate from the committer, we just need to teach our integration bot to use it.

The big difference, IMO, is whether you are bringing in someone else's changes to enhance your work, or whether you are collaborating on the same item. I would argue that collaborating on the same item is slightly less common. It also depends what you do with the merge commits. Just saying "merge from branch A" is certainly not helpful. But when you can say "merge Jerry's changes to Command.foo", it can indeed be helpful when tracking back through and figuring out where and when "foo" changed, without being lost in the forest for having too many trees.

This Week in Bazaar

This is the fifth in a series of posts about current topics in the development world of the Bazaar distributed version control system. The series is co-authored by John Arbash Meinel, one of the primary developers on Bazaar, and Elliot "fresh needle" Murphy. The two topics for this week are not related, but it's our blog and we get to write what we want.

Hosting of Bazaar branches

One of the first questions people ask when moving to Bazaar is "Where can I host my branches?" Even with distributed revision control, it is often handy to have a shared location where you publish your code, and merge code from others. Canonical has put a lot of work into making launchpad.net an excellent place to host code, but there are many other options available.

Because bazaar supports "dumb" transports like sftp, you can publish your branches anywhere that you can get write access to some disk space. For example, sourceforce.net gives projects some web space with sftp access, and you can easily push branches up over sftp. It's also easy to use bzr on any machine that you have ssh access to, you don't even need to install bazaar on the remote machine.

As bazaar is a GNU project, we've been working with the Savannah team to enable bazaar hosting on Savannah also.

Another option is serving bazaar branches over HTTP. You can do this for both read and write access, and there is a great HOWTO in the bazaar documentation. Do you know of anywhere else that is offering Bazaar hosting? Let us know in the comments!


Bazaar review and integration process

How do you ensure high quality code, when working on a fast moving codebase in a widely distributed team? Here are some things that we've been doing with the Bazaar project, and we think they are useful practices for most projects.


Automated Test Suite

One very important key towards having a stable product is proper testing. As people say "untested code is broken code". In the Bazaar project, we recommend that developers use Test Driven Development as much as possible. However, what we *require* is that all new code has tests. The reason it is important for the tests to be automated, is because it transfers knowledge about the code base between developers. I can run someone else's test, and know if I conformed to their expectations about what this piece of code should do.

This actually frees up development tremendously. Especially when you are doing large changes. With a good test suite, you can be confident that after your 2000 line refactoring, everything still works as before.


Code Review

Having other people look at your changes is a great way to catch the little things that aren't always directly testable. It also helps transfer knowledge between developers. So one developer can spend a couple weeks experimenting with different changes, and then there is at least one other person who is aware of what those are.

The basic rules for getting code merged into Bazaar is that:
  1. It doesn't reduce code clarity
  2. It improves on the previous code
  3. It doesn't reduce test coverage
  4. It must be approved by 2 developers who are familiar with the code base.
We try to apply those rules to avoid hitting the rule "The code must be perfect before it is merged", and the associated project stagnation. Code review is a very powerful tool, but you have to be cautious of "oh, and while you are there, fix this, and that, and this thing over here." Sometimes that is useful to catch things that are easy (drive-by fixes). It can also lead to long delays before you actually get to see the improvements from someone's work, and long delays are demotivating.

Item number 3 is a pragmatic way to approach how much testing is required. In general, the test coverage should improve, jsut like the code quality. But that doesn't mean you have to test all code 100 different ways.


Integration Robot (PQM)
Now that you have a good automated test suite, and proper code reviews, the next step is to make sure that you have a version of the code base which has all the tests passing. Often when developing a new feature, it is quite reasonable to "break" things for a bit, while you work out just how everything should fit together. Requiring the tests to pass at each level of development puts an undue burden on developers, preventing them from publishing their changes (to get feedback, to snapshot their progress, etc.) Very often I commit when things are still somewhat broken, as it gives me a way to 'bzr revert' back to something I wanted to try.

However, you don't want the official releases of your project to have all of these broken bits in them. The Bazaar project uses a "Patch Queue Manager". Which is simply a program that responds to requests to merge changes. When your patch has passed the review stages, you submit it to the PQM, which grabs your changes, applies them, runs the full test suite, and commits the changes to "mainline" if everything is clean.

The reasons to use a robot are:
  1. Humans are very tempted to say "ah, this is a trivial fix, I'll just merge it". Without realizing there is a subtle typo or far-reaching effect. When you have a large test suite, it can often take a while to run all the tests (the bzr.dev test suite runs in 5-10 minutes, but some projects have test suites that take hours.) Having a program doing the work means a human is relieved of the tedium of checking it.
  2. There is generally only a single mainline, but there may be 50 developers doing work on different branches. When they all want to merge things, it isn't feasible to require the users to run the test suite with the latest development tip. If the development pace is fast enough versus the time to run the test suite, you can get into an "infinite loop". Where you merge in the latest tip, wait for the tests to pass, and by the time you go to mark this as the new mainline tip, someone else beat you to it. And you go around again. PQM does this for you in a fairly natural way.
  3. Running in a "clean" environment is a safety net for when you forget about a new dependency that you added.
  4. There are similar ways to do this, such as Cruise Control for Subversion. There is one key difference, though. With Cruise Control, you find out after the fact that your mainline has broken. With PQM, we know that every commit to the mainline of Bazaar passed the test suite at that time. This helps a lot when tracking down bugs. It also helps with "dogfooding"...

Dogfooding


If you want people to do regular testing of the development version, it must be easy to run different versions of the project without needing a complex install. Bazaar does this by being runnable directly out of the source tree, without any need to set $PYTHONPATH or mess around with installing different versions. You can also easily change out the plugins that are loaded using the BZR_PLUGIN_PATH env variable. This means that developers can run the latest development version, and easily switch to a particular version when trying to reproduce a bug or help a user.

By having the PQM running the test suite, developers can run on the bleeding edge, and know that they won't get random breakage. It is always possible that something will break, but the chance is quite low. (In 2000 or so commits since we started using PQM, I believe bzr.dev has never been completely unusable, and has had < 5 revisions which we would not recommend people use.)

It also means that you can be fairly confident in creating a release directly from your integration branch (mainline).