Wednesday, October 17, 2007

Bazaar vs Subversion

Every so often someone comes along wanting to know which VCS they should use. I won't claim to be an impartial observer, but this is a list of things I put together for the last discussion, that I thought I would share here.
  1. SVN requires all commits to go to a central location, and tends to favor having multiple people working on the same branch.

    This is both a positive and a negative depending on what you are trying to do.

    When you have a bunch of developers that don't know a lot about VCS, it simplifies things for them. They don't have to worry about branches, they just do their work and check it in.
    The disadvantage is that they can tread on each other's toes (by committing a change
    that breaks someone else's work), and their work immediately gets
    mixed together and can't be integrated separately.

    Bazaar has chosen to address this with workflows. You can explicitly have a branch set up to send all commits to a central location (bzr checkout), just as you do with SVN. Also, if two people checkout the same branch, they must stay in sync. (Bazaar actually has a stronger restriction here than SVN does, because SVN only complains if they modify the same files, whereas Bazaar requires that the whole tree be up to date.)

    However, with a Bazaar checkout, there is always the possibility to either bzr unbind or just bzr commit --local when you are on a plane, or just want to record in-progress work before integrating it into the master branch.

  2. SVN has a lot more 3rd party support.

    SVN has just been around longer, and is pretty much the dominant open source centralized VCS. There are a lot of DVCSes at the moment, all doing things a little bit differently. Competition is good, but it makes it a bit more difficult to pick one over the other, and 3rd party tools aren't going to build for everyone.

    However, Bazaar already has several good third party tools. For viewing changes to a single file, bzr gannotate can show when each line was modified, and what the associated commit message was. It even allows drilling back in history to prior versions of the file.
    For viewing the branch history (showing all the merged branches, etc) there is bzr viz.
    There are both gtk and qt GUIs, a Patch Queue Manager (PQM) for managing an
    integration branch (where the test suite always must pass or the patch is
    rejected.)
    There is even basic Windows Shell integration (TortoiseBzr), a Visual Studio plugin, and an Eclipse plugin.

  3. Bazaar is generally much easier to set up.

    SVN can only really be set up by an administrator. Someone who has a bit more of an idea what they are doing. Setting up WebDAV over http is easier than it used to be, but it isn't something you would ask just anyone to do. Getting a project using Bazaar is usually as simple as bzr init; bzr add; bzr commit -m "initial import".

    You can push and pull over simple transports (ftp, sftp, http).

    Because SVN is centralized, you only really set it up one time anyway, so as long as you have one competent person on your team, you can probably get started.

  4. It is easier to get 3rd party contributions.

    If you give a user commit access to your SVN repository, then you have their changes available whenever they commit. But usually this also means that they have access to change things that you don't really want them to touch. (Yes, there are ACLs that you can set up, but I don't know many projects that go to that trouble for casual contributors.)

    If you haven't given them commit access, then they have to work on their own, and the VCS doesn't give you a direct way to collaborate with them. You are back to using something like diff+patch.

    Because Bazaar supports intelligent merging between "repositories" integrating other people's work is usually a bzr merge away. SVN 1.5 is supposed to address the merge issue, but at best it helps within a repository. So if someone is developing stuff on their own side, you are still stuck with diff + patch.

    Just to reiterate, Bazaar can make it much easier for getting users to give "drive-by" contributions. Which can be a good stepping stone towards increasing your development community.

  5. Subversion's model is a giant versioned filesystem. Bazaar uses a concept of a Tree.

    I have little doubt that this made tracking merging more difficult in SVN, since there isn't a clear 'top' that has been merged with the other 'top'.

    It also means that SVN commits aren't atomic in the same way that Bazaar commits are. In Bazaar, when you commit, you are guaranteed to be able to get back to that same revision. With SVN, if people are working on different files, both can commit, and when you checkout the final tree, it will not match either side.
    This has some implications for assuring that the test suite passes on a given branch,
    since the test suite can pass on my machine, and on their machine, but after we both commit, it won't pass after doing a checkout.

  6. SVN supports partial checkouts better than Bazaar does.

    This is mostly a consequence of the above point, rather than an explicit thing. But because SVN doesn't label anything as a special Tree, you can check out project/doc just as easily as project

    We are looking into ways to at least fake this with Bazaar (we secretly check out the whole tree, but hide bits that you don't care about). Because we are aware of use cases where it is important. (A documentation team that doesn't want or need to see all the code, etc.)

  7. SVN stores history on the server.

    In the standard workflows, Bazaar has you copy the full project history to your local machine. For most projects, this isn't a big deal, because the delta compressed history is only a small multiple of a checked out tree. (Plus SVN always checks out 2 copies anyway.)
    But there are times when people abuse the VCS, and check in a CD ISO (which gets deleted shortly thereafter). Suddenly you have more garbage data in your repository than you have desirable data.

    Bazaar does have support for "lightweight checkouts" which are SVN style working directories. Where all the history is on the server, and only the working tree is local. Of course if you do this, you lose some flexibility (offline commits), but you get to chose when that fits your needs.

    We also have "shared repositories" which can be used to share storage between branches. So even though you have 10 branches, you only have 1 copy of the history.

    We are working on having a Shallow Branch/History Horizon which should be a very good compromise between the two. The basic idea is that it can pull down data that you are using, without needing the full history.

  8. Storage of Binary Files

    At the moment SVN's delta algorithm for binary files is able to give smaller deltas than ours does. This is likely to change in coming releases, but at the moment there will be times when SVN requires less disk space for binary files that you modify often. For binary files that change infrequently, or for compressed ones, there is likely to be less of a difference. (Most compressed formats don't delta well because a small change causes ripples in the compressed stream.)

  9. Handling large files

    At the moment, Bazaar has the expectation that you can fit a small number of copies of the contents of any file in memory. (The merge algorithm needs a BASE, THIS, and OTHER copy.)
    So when you need to version 1GB movies, etc, SVN is probably a better choice at the moment. You might consider if it is actually the right way to handle those files.

    We are certainly considering changing some parts of our code to be able to only read parts of files. But it is lower on our list of priorities.

  10. Building up a project out of subprojects

    At the moment SVN's externals handle more use cases than we do.
    We are working on more complete support with Nested Trees. The internal data structures are present, but not all of the push/pull/merge/etc commands have been updated.

    We already have good support for merging in a project into another project, so you get 1 large tree. And then you can continue to merge new changes from upstream, and it will apply to the correct files. However, once you have aggregated a project, it is harder to send any of your own changes upstream, independent of all the other files. (It is possible to do so, but it requires you to cherry pick the changes, and track when you modify which files.)

    Also, Nested Trees are designed to allow you to easily checkout an exact copy of the full project at the exact revision of every sub-project, while still allowing you to 'bzr update' them to the current version of all the sub projects.

  11. Clarity of "log"

    One major difficulty with CVS is just figuring out what has been changing. With
    Bazaar, you can do a simple bzr log and it shows you what has been
    changing for the whole branch. SVN has a similar svn log which shows
    you what has been changing underneath the current directory. (So they are
    approximately the same,if you are in the root of an SVN branch.)

    However, if you use feature branches to develop, and then have an integration
    branch (trunk) with Bazaar you can do bzr log --short which shows only
    the mainline revisions. In this case, that would be just the integration summary
    messages. So you can see a single "merged feature X" message, rather than the
    50 small commit messages that build up into that feature.

  12. Plugin Architecture

    One of Bazaar's main strengths is the ability for third party developers to add
    commands or customization through the use of plugins. Plugins can provide
    simple extensions (a different log format to conform to a companies particular
    style expectations), new commands (history introspection, extra patch management,
    integration with the PQM), or even support for a different repository format (at the
    moment bzr-svn provides a way to treat an SVN repository as just another Bazaar branch, allowing you to push, pull and merge.)

    While not every user is going to want to write a plugin, it does provide ways
    for administrators to customize the behavior of Bazaar, so that the tool can be
    slimmed down to provide just the basics, or expanded to provide specific
    workflows customized to the situation.

  13. Rename support

    This is another place where SVN is much better than CVS, but Bazaar is even better still.
    SVN has support for the basic concept of renaming, though it is implemented as a copy+delete pair. "copy" allows 2 files to have the same history prior to the point of copying. Which means commands like svn log and svn annotate use the full history of the file, but there is more that can be done.

    One of the reasons projects hesitate to rename files, is because then it becomes difficult to accept changes from elsewhere. Suddenly the change has nowhere to go, because the target file is not there anymore. And this is where Bazaar has a distinct advantage over SVN. When you rename a file, Bazaar knows that any patches to that file belong in the new destination. Which means that when you need to refactor your code to clean up the overall structure, you can still merge changes that were created before the restructuring. I know I didn't realize how differently I worked with my code before I had the ability to fix simple name errors. (This file is 'Bars.c' when it should just be 'bar.c', etc.)



  14. I also wanted to point to a pretty good blog post about Subversion and the rest of the world here. A lot of that is why Bazaar has a centralized workflow you can use, and why we are trying to make sure things like bzr-gtk (which is the parent project for Olive and TortoiseBzr) are fully functional.

In summary, SVN may be a better choice if you have large binary files, projects with subprojects, need partial checkout support or more mature integration with 3rd party tools than Bazaar currently has. OTOH, if workflow flexibility is important, collaborating with others and increasing community participation matter, low administration is appealing or you care about quality branching/merging and correct rename handling, then Bazaar can help make life more enjoyable and ought to be seriously considered either now or in the future, depending on how comfortable you are with its maturity.