Mutation Testing — Who will test the tests?

Oliver Martin-Hirsch
Dunelm Technology
Published in
4 min readApr 10, 2024

--

No one wants to break the production environment on purpose (well, at least no one will admit that they want to). Automating the regression testing of code is a fundamental concept that gives developers as much confidence as possible when launching their shiny new feature. Writing code and testing it go hand-in-hand, but how far do we take this concept of testing? How do we know if the unit tests written to cover the logic are actually testing it and are of a high quality themselves? Well, there are obviously more people than just me that are losing sleep over this conundrum, as there are libraries out there that exist to test tests.

I bet this is the exact scenario Plato philosophised about

Enter ‘mutation testing’. Test coverage is a quick way of seeing where the unit tests might be lacking in terms of raw coverage, but mutation testing, in theory, lets us assess if the tests that are already there are actually testing the logic correctly and therefore highlighting issues before they get into production. It’s worth noting that test coverage can sometimes be misleading, as 100% coverage just means all the code was executed, but not necessarily tested!

The general concept is fairly straightforward — the mutation library scans over your code and does little changes here and there, such as changing operators, inverting Boolean values, changing the return value of functions and other things which directly affect how the logic of the application works, then the existing unit tests are run against this altered, ‘mutated’ code.

// original code

function hasOnlyOddNumbers(array) {
return array.every(number => number % 2 === 1);
}

// 'mutated' code

function hasOnlyOddNumbers(array) {
return array.every(number => number % 2 !== 1); // <--- mutated operator
}

These introduced bugs are the ‘mutants’ that your tests should ‘kill’ (this is all very violent, but it’s the terminology used in the actual documentation and logging output of some libraries) The bit that pains me on some deep, existential level is you then want your tests to fail here, as it shows that the tests are actually useful at finding subtle bugs in your code. If the unit tests still pass with the mutations present in the code, the mutants are deemed to have ‘survived’ and therefore your tests aren’t actually testing the code properly.

Of course, if we were to manually do this by hand every time we wanted to test our tests it would be extremely time consuming as well as potentially dangerous (‘oops, I accidentally committed the bugs I put in there on purpose’), hence why aforementioned libraries such as Stryker Mutator exist. Stryker is fairly easy to install and implement on a simple codebase and there are several existing plugins to add compatibility for various testing frameworks such as Jest and Vitest, as well as some established support for mutating Typescript also. The library is also quite configurable; you are able to easily exclude certain mutants you deem acceptable via patterns or explicit exclusion.

‘oops, I accidentally committed the bugs I put in there on purpose’

There are a few drawbacks; jamming a new library into a repo isn’t always plain sailing. I mentioned it being easy to implement into a simple codebase — if your repository has multiple workspaces with multiple package.json files, it becomes a little bit more unwieldly to set up and get running properly. When attempting to add it to one of our bigger projects as a proof of concept, I ended up giving up due to the sheer amount of messing around and referring to the FAQs I had to do. It seems the appetite for this kind of testing hasn’t really taken off as there are very few resources other than the official documentation and a few posts floating around the internet. The other major drawback is of course, the fact it’s more bloat on your pipeline which causes longer pipeline runs in total. It may be anecdotal, but when the tests do run they did take quite a long time to run too, but I am guessing this is mostly codebase dependent. A more effective method may involve applying these kinds of tests sparingly to areas of a codebase that need it the most, rather than taking a blanket approach to mutating all of the code.

In conclusion, anything that gets your dev teams talking about how effective the tests they are writing actually are is a win in my book — mutation testing might not be suitable to add to every project for the long term, but it can highlight gaps in your testing and promote a culture where people aim for quality over quantity, which should persist without the need for mutating your code in the future!

--

--