Testing

I'm going to dispense with the terminology around testing schools¹ Which is worth a look-up and a deep dive if you're curious about the different philosophies and their backgrounds. Instead I'm just going to give you my spark notes version for writing robust tests that can inspire confidence in your ability to change the implementation details while protecting the features of the system.

Implementation Detail vs Feature

Ok. lets pretend that we're building a calculator. We've sat down to discuss the in-scope features of a calculator and you start listing "Addition, subtraction ..." and I interject with "Polish notation, and a stack". You would probably, rightly, look at me like I've lost my mind because those are implementation details not features.

But I've already started on the test 'pushOperatorsToStack', and am asserting that given an operation with arity two is performed, when you access the stack it has three members, and the first member is an instance of 'operator' and the next two are instances of 'number', (I note that we'll need a follow on test to expand the functionality to allow that to be 'operator', but this is a degenerate first test so it's no big deal).

So what does this test do?

It dictates that there must be a stack
It guarantees that the system must represent operators and numbers in a particular way
It also dictates that there must be a direct accessor method for the stack

Consider that all of those things are implementation details. None of those are worth testing unless we're under contractual obligation to use Polish notation and a stack and now our tests are tightly coupled to those implementation details that you may or may not even fully understand yet.

We continue writing our tests and get to the point of implementing all the sensible features we can think of, and build some spy objects that test to make sure things flow through the stack correctly.

A Sea of Change

Pretend you do some research and discover Hewlett-Packard calculators in 1968 used reverse Polish notation²and convince me that we should update our solution to follow that precedent. Or worse, perhaps, you decide that such things aren't really useful and we should use infix representations because it's easier for our junior devs to reason about.

Ugh. All those tests we wrote have to be updated and rewritten. Yuck.

Then word comes in that we're on the hook to expand our implementation to accommodate surreal numbers³for the game theory geeks who want to use our calculator so we have to tweak our number implementation to embrace infinitesmals.

Now all of the spies and asserts have to be refactored so our commit just becomes an unreadable ocean of little point changes touching everything.

Sadly, just the cost of doing testing, right? Wrong, merely the cost of tightly coupling your tests to implementation details!

On Your Best Behavior

The solution, is to not test implementation details. Easier said than done? Kinda. The key is behavior driven design. Put on the hat of a user of the system and express a need, or a use case:

As a calculator admin, I want to be able to lock down certain functions so student's can't use them on tests
As a calculator user, I want to be able to perform addition so that I can get arithmetic sums
As a calculator developer, I want to see diagnostic data so that I can streamline bottlenecks

All of those are behaviors. There is no user who is going to have a story that dictates a stack. If you take all of those stories "As a [user] I want to [action] so that I can [reason]", and put in a test for them at the level of the API (not necessarily the UI), you have tests validating everything a user can do.

Life on the Edge

Maybe you miss some edge cases, or there's a bug somewhere and you need to add some cases:

As a calculator user, I want to recieve a sensible error when I add two numbers that result in an integer overflow so that I am not confused by a large negative number

But that's still at the level of behavior. Asserting just that, and keeping your tests at the 'user' level you avoid coupling them to implementation details.

It does not matter to anyone if the system is using prefix, infix or postfix notation. The calculator could be using an absurdly large look-up table that codifies every possible combination of operators and numbers and as long as it returns the correct result who cares?

Now that might not be performant, and performance is an important requirement. But you can write a behavioral test for that which shapes the implementation details without being coupled to them:

As a calculator user, I want any operation I type in to return in less than five seconds so that I can perform operations quickly

So you write a test to time some operations and now have a test that pushes you to write an implemenation. Perhaps we discover our monstrous abomination of a look-up table meets that requirement, or perhaps we have to do the sane thing and instead of hardcoding every possible input combination we have to implement operators.

Or perhaps we discover later that the hardware we're on doesn't meet that benchmark for really large linear algebra equations so we relax it to six seconds and add a test capturing that new requirement, or we find a clever representation of numbers in matricies that we uniquely use for linear algebra and make it under the bar.

One Reason to Change

One thing we know is that if all of our tests passed before we wrote this failing test, and our test suite covers all of the behavior that we know we care about, we can change every implementation detail without needing to touch our other existing tests.

To put it another way, each of our tests now has only one reason to change:

We only have to touch tests when we change the end-user behavior they capture

Which at the end of the day, is exactly what we want our tests to flag: Unintentional changes to end-user visible behavior. We don't want our tests to have to change if only developer visible implementation details are changing.

Footnotes

1: Roughly, and briefly there are two 'schools' of test philosophy, London (where your unit tests test a unit of code) and Detroit or Classical school (where your unit tests test a unit of behavior). My test strategy is more in line with the Classical school of test design as one can probably glean above.^return

^hp_calculators]: [https://en.wikipedia.org/wiki/HP_calculators#Characteristics

^surreal_numbers]: A good friend of mine, who is a mathematician that is an engineer because it pays a livable wage introduced me to these not too long ago [https://en.wikipedia.org/wiki/Surreal_number