Tests gone bad (part 2) – code coverage cargo cult

from on 15.04.2013
1
The importance of not repeating yourself in test setups, and how to achieve that.
How do I avoid repeating myself in test setups?!

Automated testing is mandatory for successful software development in an agile process. However, with some man years poured into a project, teams often find themselves slowed down by a tangle of tests that no-one understands any more, cause builds to take a lot of time, and that randomly fail without any conceivable reason. In this series of blog posts, I’m going to dig up some anti-patterns, and discuss possible remedies.

Part 1: Make sure it does what it does

Part 3: Set-up Overkill

This time, we’ll take a look at how you can use the code coverage metric to hurt your quality.

Code coverage is a code metric that tells which parts of code are executed (“covered”) by a test1. A common use of the metric is that developers can see which parts of code are being tested, and more importantly, which parts have not been tested. The metric can also be aggregated over a class, module, or whole project, as a percentage number. A project with 0% code coverage does not have automated tests at all, 100% means all parts of the code are being executed in a test.

Mission: achieve perfect code coverage

The equation “high code coverage equals good quality” sounds good enough. 100% is better than 0%, so 90% is surely better then 70% too, right?

The logical next step is to set up a policy to enforce automated testing, and use the code coverage metric to measure how good a developer or a team is performing. For example, a team can agree on a code coverage of 90% in their Definition of Done. Or a development manager can tie a developer’s bonus payment to the average code coverage of owned code. In any case, a strong motivation to write automated tests will raise software quality – or won’t it?

There are two kinds of developers that usually turn such a policy into disaster: the lazy ones, and the eager ones.

How to lazily achieve perfect code coverage

A developer who perceives writing tests as wasteful, or is just under high stress of finishing a feature, may resort to a adopting pattern for tests like this:

[Fact]
				public void Achieve_perfect_code_coverage()
				{
				    var subject = new Subject();
				    subject.DoSomeWork();
				}

This test runs the method DoSomeWork. As long as this method has a cyclomatic complexity of 1 (simplified, that is: there’s no if statements) all code in that method has been covered by the test. Great! We’ve got 100% code coverage!

Now what the test obviously does not do is test anything. There is not even any assertion. The whole purpose of the test is to game the metric. Nothing is gained for the project. With enough test like this, the code coverage metric itself becomes unusable, which is a shame because if used right, the metric can be useful indeed.

In addition, these tests form useless burden to your project. Someone will have to review the test suite to comb this kind of test out, and there’ll be some fall-out that’s bad for the team’s morale. Absolutely no-one is going to be happy.

How to eagerly achieve perfect code coverage

A developer eager to reach a code coverage close to 100% will pour all her skills and motivation into this task, and when all lines show up green, it’ll mean a great satisfaction of work well done.

This sounds great, and it mostly is great. The problem comes from the fact that the last few percent points are much harder to achieve. Getting the last 8% is hard work, and the last 2% are close to impossible. A lot of work can be lost when a senior developer takes three days to fulfill the feat to write a test against that one line where there is this side-effect in the error handler of a static method in a sealed third-party class… where with a little less dedication for getting the code coverage high score trophy, the tricky problem could be circumvented at all, or a different way of ensuring quality could have been found2.

In addition to wasted effort, the eager hunt for code coverage is likely to produce a slew of tests that mirror the exact structure of the code (“make sure it does what it does“).

Measure relative, not absolute

A too great focus on the absolute code coverage numbers is distracting from the key function of automated testing – to reliably build high quality software.

However, taking a look at relative numbers can give hints for some really interesting questions:

  • Why is so much more code in module X covered than in module Y? Does this point to an architecture inconsistency? Are the tests structured in a different way?
  • Did this Sprint’s work increase or decrease code coverage? If we see a strong decrease, we might want to look at the code structure, or maybe we’re not Done yet. If there’s an increase, is there something to learn from how it was achieved?
  • Do teams or individual developers consistently achieve a higher or lower code coverage? If so, in which way do they work differently? Is there something to learn to become better as a whole?

The difference is: looking at the metric is the begin of the analysis, not its end. The metric does not provide value as an absolute number, it hints to some problem or chance. It is smoke, not fire.

Conclusion

Bringing absolute code coverage metric values into the focus is likely to have a negative impact on a test suite’s quality. It exerts a force to game the metric, and to waste resources to race for high coverage to the detriment of thoughtfulness.

The metric is extremely helpful as an indicator for problems and possible improvements.

And: teams consequently using Test Driven Development will achieve a high code coverage without ever looking at it.


1 There are different ways to record code coverage (by function, by statement, by decision) that yield a different granularity of what’s considered as covered – but for the sake of this article, that’s all the same.
2 For example, a function not covered by a unit test could be included in an automated UI test. And sometimes, having a manual test plan isn’t so bad, especially when the requirements are still unstable and the pesky part may just go away.

Write a comment

Your e-mail address will not be published or shared with third parties. Fields marked with * are required.

One comment for “Tests gone bad (part 2) – code coverage cargo cult

  1. Pingback: Tests gone bad (part 3) - setup overkill