Wednesday, October 17, 2007

Bazaar vs Subversion

Every so often someone comes along wanting to know which VCS they should use. I won't claim to be an impartial observer, but this is a list of things I put together for the last discussion, that I thought I would share here.
  1. SVN requires all commits to go to a central location, and tends to favor having multiple people working on the same branch.

    This is both a positive and a negative depending on what you are trying to do.

    When you have a bunch of developers that don't know a lot about VCS, it simplifies things for them. They don't have to worry about branches, they just do their work and check it in.
    The disadvantage is that they can tread on each other's toes (by committing a change
    that breaks someone else's work), and their work immediately gets
    mixed together and can't be integrated separately.

    Bazaar has chosen to address this with workflows. You can explicitly have a branch set up to send all commits to a central location (bzr checkout), just as you do with SVN. Also, if two people checkout the same branch, they must stay in sync. (Bazaar actually has a stronger restriction here than SVN does, because SVN only complains if they modify the same files, whereas Bazaar requires that the whole tree be up to date.)

    However, with a Bazaar checkout, there is always the possibility to either bzr unbind or just bzr commit --local when you are on a plane, or just want to record in-progress work before integrating it into the master branch.

  2. SVN has a lot more 3rd party support.

    SVN has just been around longer, and is pretty much the dominant open source centralized VCS. There are a lot of DVCSes at the moment, all doing things a little bit differently. Competition is good, but it makes it a bit more difficult to pick one over the other, and 3rd party tools aren't going to build for everyone.

    However, Bazaar already has several good third party tools. For viewing changes to a single file, bzr gannotate can show when each line was modified, and what the associated commit message was. It even allows drilling back in history to prior versions of the file.
    For viewing the branch history (showing all the merged branches, etc) there is bzr viz.
    There are both gtk and qt GUIs, a Patch Queue Manager (PQM) for managing an
    integration branch (where the test suite always must pass or the patch is
    rejected.)
    There is even basic Windows Shell integration (TortoiseBzr), a Visual Studio plugin, and an Eclipse plugin.

  3. Bazaar is generally much easier to set up.

    SVN can only really be set up by an administrator. Someone who has a bit more of an idea what they are doing. Setting up WebDAV over http is easier than it used to be, but it isn't something you would ask just anyone to do. Getting a project using Bazaar is usually as simple as bzr init; bzr add; bzr commit -m "initial import".

    You can push and pull over simple transports (ftp, sftp, http).

    Because SVN is centralized, you only really set it up one time anyway, so as long as you have one competent person on your team, you can probably get started.

  4. It is easier to get 3rd party contributions.

    If you give a user commit access to your SVN repository, then you have their changes available whenever they commit. But usually this also means that they have access to change things that you don't really want them to touch. (Yes, there are ACLs that you can set up, but I don't know many projects that go to that trouble for casual contributors.)

    If you haven't given them commit access, then they have to work on their own, and the VCS doesn't give you a direct way to collaborate with them. You are back to using something like diff+patch.

    Because Bazaar supports intelligent merging between "repositories" integrating other people's work is usually a bzr merge away. SVN 1.5 is supposed to address the merge issue, but at best it helps within a repository. So if someone is developing stuff on their own side, you are still stuck with diff + patch.

    Just to reiterate, Bazaar can make it much easier for getting users to give "drive-by" contributions. Which can be a good stepping stone towards increasing your development community.

  5. Subversion's model is a giant versioned filesystem. Bazaar uses a concept of a Tree.

    I have little doubt that this made tracking merging more difficult in SVN, since there isn't a clear 'top' that has been merged with the other 'top'.

    It also means that SVN commits aren't atomic in the same way that Bazaar commits are. In Bazaar, when you commit, you are guaranteed to be able to get back to that same revision. With SVN, if people are working on different files, both can commit, and when you checkout the final tree, it will not match either side.
    This has some implications for assuring that the test suite passes on a given branch,
    since the test suite can pass on my machine, and on their machine, but after we both commit, it won't pass after doing a checkout.

  6. SVN supports partial checkouts better than Bazaar does.

    This is mostly a consequence of the above point, rather than an explicit thing. But because SVN doesn't label anything as a special Tree, you can check out project/doc just as easily as project

    We are looking into ways to at least fake this with Bazaar (we secretly check out the whole tree, but hide bits that you don't care about). Because we are aware of use cases where it is important. (A documentation team that doesn't want or need to see all the code, etc.)

  7. SVN stores history on the server.

    In the standard workflows, Bazaar has you copy the full project history to your local machine. For most projects, this isn't a big deal, because the delta compressed history is only a small multiple of a checked out tree. (Plus SVN always checks out 2 copies anyway.)
    But there are times when people abuse the VCS, and check in a CD ISO (which gets deleted shortly thereafter). Suddenly you have more garbage data in your repository than you have desirable data.

    Bazaar does have support for "lightweight checkouts" which are SVN style working directories. Where all the history is on the server, and only the working tree is local. Of course if you do this, you lose some flexibility (offline commits), but you get to chose when that fits your needs.

    We also have "shared repositories" which can be used to share storage between branches. So even though you have 10 branches, you only have 1 copy of the history.

    We are working on having a Shallow Branch/History Horizon which should be a very good compromise between the two. The basic idea is that it can pull down data that you are using, without needing the full history.

  8. Storage of Binary Files

    At the moment SVN's delta algorithm for binary files is able to give smaller deltas than ours does. This is likely to change in coming releases, but at the moment there will be times when SVN requires less disk space for binary files that you modify often. For binary files that change infrequently, or for compressed ones, there is likely to be less of a difference. (Most compressed formats don't delta well because a small change causes ripples in the compressed stream.)

  9. Handling large files

    At the moment, Bazaar has the expectation that you can fit a small number of copies of the contents of any file in memory. (The merge algorithm needs a BASE, THIS, and OTHER copy.)
    So when you need to version 1GB movies, etc, SVN is probably a better choice at the moment. You might consider if it is actually the right way to handle those files.

    We are certainly considering changing some parts of our code to be able to only read parts of files. But it is lower on our list of priorities.

  10. Building up a project out of subprojects

    At the moment SVN's externals handle more use cases than we do.
    We are working on more complete support with Nested Trees. The internal data structures are present, but not all of the push/pull/merge/etc commands have been updated.

    We already have good support for merging in a project into another project, so you get 1 large tree. And then you can continue to merge new changes from upstream, and it will apply to the correct files. However, once you have aggregated a project, it is harder to send any of your own changes upstream, independent of all the other files. (It is possible to do so, but it requires you to cherry pick the changes, and track when you modify which files.)

    Also, Nested Trees are designed to allow you to easily checkout an exact copy of the full project at the exact revision of every sub-project, while still allowing you to 'bzr update' them to the current version of all the sub projects.

  11. Clarity of "log"

    One major difficulty with CVS is just figuring out what has been changing. With
    Bazaar, you can do a simple bzr log and it shows you what has been
    changing for the whole branch. SVN has a similar svn log which shows
    you what has been changing underneath the current directory. (So they are
    approximately the same,if you are in the root of an SVN branch.)

    However, if you use feature branches to develop, and then have an integration
    branch (trunk) with Bazaar you can do bzr log --short which shows only
    the mainline revisions. In this case, that would be just the integration summary
    messages. So you can see a single "merged feature X" message, rather than the
    50 small commit messages that build up into that feature.

  12. Plugin Architecture

    One of Bazaar's main strengths is the ability for third party developers to add
    commands or customization through the use of plugins. Plugins can provide
    simple extensions (a different log format to conform to a companies particular
    style expectations), new commands (history introspection, extra patch management,
    integration with the PQM), or even support for a different repository format (at the
    moment bzr-svn provides a way to treat an SVN repository as just another Bazaar branch, allowing you to push, pull and merge.)

    While not every user is going to want to write a plugin, it does provide ways
    for administrators to customize the behavior of Bazaar, so that the tool can be
    slimmed down to provide just the basics, or expanded to provide specific
    workflows customized to the situation.

  13. Rename support

    This is another place where SVN is much better than CVS, but Bazaar is even better still.
    SVN has support for the basic concept of renaming, though it is implemented as a copy+delete pair. "copy" allows 2 files to have the same history prior to the point of copying. Which means commands like svn log and svn annotate use the full history of the file, but there is more that can be done.

    One of the reasons projects hesitate to rename files, is because then it becomes difficult to accept changes from elsewhere. Suddenly the change has nowhere to go, because the target file is not there anymore. And this is where Bazaar has a distinct advantage over SVN. When you rename a file, Bazaar knows that any patches to that file belong in the new destination. Which means that when you need to refactor your code to clean up the overall structure, you can still merge changes that were created before the restructuring. I know I didn't realize how differently I worked with my code before I had the ability to fix simple name errors. (This file is 'Bars.c' when it should just be 'bar.c', etc.)



  14. I also wanted to point to a pretty good blog post about Subversion and the rest of the world here. A lot of that is why Bazaar has a centralized workflow you can use, and why we are trying to make sure things like bzr-gtk (which is the parent project for Olive and TortoiseBzr) are fully functional.

In summary, SVN may be a better choice if you have large binary files, projects with subprojects, need partial checkout support or more mature integration with 3rd party tools than Bazaar currently has. OTOH, if workflow flexibility is important, collaborating with others and increasing community participation matter, low administration is appealing or you care about quality branching/merging and correct rename handling, then Bazaar can help make life more enjoyable and ought to be seriously considered either now or in the future, depending on how comfortable you are with its maturity.

9 comments:

Anonymous said...

It also means that SVN commits aren't atomic in the same way that Bazaar commits are. In Bazaar, when you commit, you are guaranteed to be able to get back to that same revision. With SVN, if people are working on different files, both can commit, and when you checkout the final tree, it will not match either side.
This has some implications for assuring that the test suite passes on a given branch,
since the test suite can pass on my machine, and on their machine, but after we both commit, it won't pass after doing a checkout.


I don't understand your comment here, presuming you are talking about people committing to the same branch.

With SVN, the second committer is forced to update (and receive the first committers changes) before they can commit. Thus the second committer will have to check the combined code against the test suite.

jam said...

The last time I used SVN, this was only true if the person changed the same file.

So if you edit 'foo' and commit revision 10, and I edit 'bar' and commit 11, then doing a checkout of revision 11 from the repository will have both the updated 'foo' and the updated 'bar'. Though that did not exist on my local system.

For a specific example...
you update 'function.h' and 'function.c' to change

int function(int arg);

to be

int function(int arg, char *arg2);

And commit that change.

I change "program.c" to add a new line:

...
/* And now we should run function */
if (function(10) == 1) {
printf("success\n");
}
...

When you compile and test the program it will succeed. So you commit.

When I compile and test the program it will succeed. So I commit.

When someone does a checkout of the program, it will fail to compile.
Even though the changes are in physically separate files, they are not logically separate.

So Bazaar requires that all files be up-to-date before you can commit. Not just the file you are editing.

It does this at a "Branch" granularity. So you don't have to have all up-to-date in all of your branches.

Berto said...

As of version: svn, version 1.4.6 (r28521)

jam's comments are correct.

I created a repository, added a couple of files, checked out two working copies, edited one file in one working copy, then committed.

In the second working copy I edited the other file and committed without getting an error about updating first.

John said...

I am trying to find out how to use/test the NestedTrees feature, as I use externals heavily in subversion. Do you have and info or docs on this feature or do you know someone who does?

Anonymous said...

It would be nice during these discussions if it were made clear the type of application and development team that is assumed. What has been stated as a plus for Bazaar, such as regarding atomic commits and automatic passing of some test suite, is not always desirable or even possible. In that situation Bazaar is not feasible, as far as I can see, simply because it does appear to be targeted currently at integrated applications like a website, or some more monolithic application than many business systems I've worked on.

For instance, my day job as a consultant involves many developers working on a large healthcare insurance application involving over 2,000 separate pieces of code - Cobol programs (each one many 1,000's of lines), 4GL screens, report writer, PL/SQL procedures, shell scripts run as batch jobs (again, some are 1,000's of lines). In this environment each developer does not need or want all the code from a branch on their machine, and can not possibly run a full test suite themselves. Developers only perform unit tests on the code/process they are modifying. There is a QA department that tests the application as a whole (and has to set up data and maybe run processes in a particular order). Even a fully automated test of the application running every job and fully testing every screen, which I haven't ever seen for this type of system, would run for 24-48 hours. This is typical for many (most?) large corporations running large internal applications (accounting, manufacturing, marketing, and healthcare are some of the ones I've worked on).

For that type of system, Subversion is at least possible, now that it has sparse directories. While I like some of the features of Bazaar a lot, and its flexibility of versioning local changes in particular, to help me keep track when I'm working on an enhancement or bug, it can't be used easily at all in the type of environment I have at work.

Anyway, I'm just saying it'd be nice if people that are comparing VCS systems keep in mind at least the one other type of application/enterprise system that many of the programmers I know have been working on for years.

Thanks for taking the time to write the review, for me at least the comparisons do save time and help to clarify things.

Bob

tires 12r 22.5 said...
This comment has been removed by a blog administrator.
Anonymous said...

I'd just like to say, after using SVN and BZR for a while. Bazaar bites. Sorry. It's just extremely frustrating to use. Sometimes it makes it nearly impossible for someone to get the latest code.

Say you change a file on your machine, but you don't want to commit it, you want to delete it and get the latest update. Good luck. Bazaar will make you work and do run arounds to get the latest code, because it tells the user "You have an up-to-date copy of revision 12" when its really revision 18 or so. Now you have to do a revert, then a merge, no that didnt work, still at r12... now give up and delete everything and get a new branch out. Dont even get me started on using a revert on a subdirectory - it will revert your ENTIRE tree and fry any other code changes. WTF, Bazaar? This program is definitely not easy to use, because there are way too many obscure commands needed to do simple things. There just is no bzr equivalent to "svn update", and that is a huge, major, glaring omission.

SVN wins the ease-of-use category, hands down.

Granted I havent used Bazaar on a very large project and I'm guessing that's where it's strength lies. Otherwise, I hate the stupid thing.

Eric J. Schwarzenbach said...

Though this is quite an old blog post, it is still one of the first search results you get is you google Subversion vs Bazaar, and the Canonical page of he same topic links to it.

Given that, I think it would be nice to see either an update posted here indicating how much of this is still true and what has changed, or if more appropriate a new blog post made with a link here to that post.

Eric J. Schwarzenbach said...

Though this is quite an old blog post, it is still one of the first search results you get is you google Subversion vs Bazaar, and the Canonical page of he same topic links to it.

Given that, I think it would be nice to see either an update posted here indicating how much of this is still true and what has changed, or if more appropriate a new blog post made with a link here to that post.