Tuesday, March 27, 2007

Test DRIVEN Development

For the Bazaar project we have a general goal that all code should be tested. We have an explicit development workflow that all new code must have associated tests before it can be merged into mainline.

Our latest release (0.15, rc3 is currently out the door, final should happen next week), introduces a new disk format, and a plethora of new tests. ('bzr selftest' in 0.14 has approx 4400 tests, and 5900 in 0.15). A lot of these are interface tests. Since we support multiple working tree, branch, and repository formats, we want to make sure that they all work the same way. (So only 1 tests is written, but it may be run against 4 or 5 different formats).

It means that we have a very good feeling that our code is doing what we expect it to (all of the major developers dogfood on the current mainline). However, it comes at a bit of a cost. In that running the full test suite gets slower and slower.

Further, I personally follow more of a 'test-after-development'. And I'm trying to get into the test driven development mindset. I don't know how I feel just yet, but I was reading this. And whether you agree with all of it, it makes it pretty clear how different the mindset can be. It goes through several iterations of testing, coding, and refactoring before it ends up anywhere I consider "realistic". And a lot of that comes at the 'refactoring' step, not at the coding step.

I have a glimpse at how it could be useful, as the idea is to have very small iterations. Such that it can be done in the 3-5 minute range. And every 3-5 minutes you should have a new test which passes. It means that you frequently have hard-coded defaults, since that is all the tests require at this point. But it might also help you design an interface, without worrying about actually implementing everything.

He also makes comments about keeping a TODO list. Which was part that made the most sense to me. Because you can't every write all the code fast enough to get all the ideas out of your head. So you keep a TODO so you don't forget, and also so you don't feel like you need to track down that path right now.

The other points that stuck with me are that most tests should be "unit tests". Which by his definition means they are memory only very narrow in scope. And that the test suite should be fast to run, because once it gets under a threshold (his comment was around 10 seconds, not minutes) then you can actually run all of them, all the time.

And since a development 'chunk' is supposed to be 3-5 minutes, it is pretty important that the test suite only take seconds to run. The 10s mark is actually reasonable, because it is about as long as you would be willing to give to that single task. Any longer and you are going to be context switching (email, more code, IRC, whatever).

The next level of test-size that he mentions is an "integration" test. I personally prefer the term "functional" test. But the idea is that a "unit" test should be testing the object (unit) under focus, and nothing else. Versus a functional test that might make use of other objects, and disk, database, whatever. And then the top level is doing an "end-to-end" test. Where you do the whole setup, test, and tear down. And these have purpose (like for conformance testing, or use case testing), but they really shouldn't be the bulk of your tests. If there is a problem here, it means your lower level tests are incomplete. They are good from a "the customer wants to be able to do 'X', this test shows that we do it the way they want" viewpoint.

I think I would like to try real TDD sometime, just to get the experience of it. I'll probably try it out on my next plugin, or some other small script I write. I have glimpses of how these sorts of things could be great. Often I'm not sure how to proceed while developing because the idea hasn't solidified in my head. One possibility here is "don't worry about it", create a test for what you think you want, stub out what you have to, and get something working.

Of course, the more I read, the more questions spring up. For example, there is a lot of discussion about test frameworks. Python comes with 'unittest', which is based on the general JUnit (or is it SUnit) framework. Where you subclass from a TestCase base class, and have a setUp(), and tearDown(), and a bunch of test_foo() tests.

But there is also nose and py.test, which both try to overcome unittest's limitations. And through reading about them, there is a discussion that python 3000 will actually have a slightly different default testing library. (For a sundry of technical and political reasons).

And then there is the mock versus stub debate. As near as I can tell, it seems to fall around how to create a unit test when the object under test depends on another object. And which method is more robust, easier to maintain, and easier to understand. That link lends some interesting thought about Mock objects. That instead of testing the state of objects, you are actually making an explicit assertion that the object being tested will make specific calls on the dependency.

I'm not settled on my decision, there. Because it feels like you are testing an exact implementation, rather than testing the side effect (interface). Some of what I read says "yes, that is what you are doing, and that is the point." I can understand testing side-effects. I guess part of it is how comfortable are you with having your test suite evolve. At least some tests need to be there to say that the interface hasn't changed since the previous release. (Or that a bug hasn't been reintroduced). If that edge case was tested by a particular test, and that test gets refactored, do you have confidence you didn't re-introduce the bug?

I guess you could have specific conventions about what tests are testing the current implementation, versus the overall interface of a function or class. I can understand that you want your test suite to evolve and stay maintainable. But at the other end, it is meant to test that things are conforming to some interface, so if you change the test suite, you are potentially breaking what you meant to maintain.

Maybe it just means you need several tiers of tests, each one less likely to be refactored.

No comments: