Using types to unit-test in Haskell
Object-oriented programming languages make unit testing easy by providing obvious boundaries between units of code in the form of classes and interfaces. These boundaries make it easy to stub out parts of a system to test functionality in isolation, which makes it possible to write fast, deterministic test suites that are robust in the face of change. When writing Haskell, it can be unclear how to accomplish the same goals: even inside pure code, it can become difficult to test a particular code path without also testing all its collaborators.
Fortunately, by taking advantage of Haskell’s expressive type system, it’s possible to not only achieve parity with object-oriented testing techniques, but also to provide stronger static guarantees as well. Furthermore, it’s all possible without resorting to extra-linguistic hacks that static object-oriented languages sometimes use for mocking, such as dynamic bytecode generation.
First, an aside on testing philosophy
Testing methodology is a controversial topic within the larger programming community, and there are a multitude of different approaches. This blog post is about unit testing, an already nebulous term with a number of different definitions. For the purposes of this post, I will define a unit test as a test that stubs out collaborators of the code under test in some way. Accomplishing that in Haskell is what this is primarily about.
I want to be clear that I do not think that unit tests are the only way to write tests, nor the best way, nor even always an applicable way. Depending on your domain, rigorous unit testing might not even make sense, and other forms of tests (end-to-end, integration, benchmarks, etc.) might fulfill your needs.
In practice, though, implementing those other kinds of tests seems to be well-documented in Haskell compared to pure, object-oriented style unit testing. As my Haskell applications have grown, I have found myself wanting a more fine-grained testing tool that allows me to both test a piece of my codebase in isolation and also use my domain-specific types. This blog post is about that.
With that disclaimer out of the way, let’s talk about testing in Haskell.
Drawing seams using types
One of the primary attributes of unit tests in object-oriented languages, especially statically-typed ones, is the concept of “seams” within a codebase. These are internal boundaries between components of a system. Some boundaries are obvious—interactions with a database, manipulation of the file system, and performing I/O over the network, to name a few examples—but others are more subtle. Especially in larger codebases, it can be helpful to isolate two related but distinct pieces of functionality as much as possible, which makes them easier to reason about, even if they’re actually part of the same codebase.
In OO languages, these seams are often marked using interfaces, whether explicitly (in the case of static languages) or implicitly (in the case of dynamic ones). By programming to an interface, it’s possible to create “fake” implementations of that interface for use in unit tests, effectively making it possible to stub out code that isn’t directly relevant to the code being tested.
In Haskell, representing these seams is a lot less obvious. Consider a fairly trivial function that reverses a file’s contents on the file system:
reverseFile :: FilePath -> IO ()
reverseFile path = do
contents <- readFile path
writeFile path (reverse contents)
This function is impossible to test without testing against a real file system. It simply performs I/O directly, and there’s no way to “mock out” the file system for testing purposes. Now, admittedly, this function is so trivial that a unit test might not seem worth the cost, but consider a slightly more complicated function that interacts with a database:
renderUserProfile :: Id User -> IO HTML
renderUserProfile userId = do
user <- fetchUser userId
posts <- fetchRecentPosts userId
return $ div
[ h1 (userName user <> "’s Profile")
, h2 "Recent Posts"
, ul (map (li . postTitle) posts)
]
It might now be a bit more clear that it could be useful to test the above function without running a real database and doing all the necessary context setup before each test case. Indeed, it would be nice if a test could just provide stubbed implementations for fetchUser
and fetchRecentPosts
, then make assertions about the output.
One way to solve this problem is to pass the results of those two functions to renderUserProfile
as arguments, turning it into a pure function that could be easily tested. This becomes obnoxious for functions of even just slightly more complexity, though (it is not unreasonable to imagine needing a handful of different queries to render a user’s profile page), and it requires significantly restructuring code simply because the tests need it.
The above code is not only difficult to test, however—it has another problem, too. Specifically, both functions return IO
values, which means they can effectively do anything. Haskell has a very strong type system for typing terms, but it doesn’t provide any guarantees about effects beyond a simple yes/no answer about function purity. Even though the renderUserProfile
function should really only need to interact with the database, it could theoretically delete files, send emails, make HTTP requests, or do any number of other things.
Fortunately, it’s possible to solve both problems—a lack of testability and a lack of type safety—using the same general technique. This approach is reminiscent of the interface-based seams of object-oriented languages, but unlike most object-oriented approaches, it provides additional type safety guarantees without the need to explicitly modify the code to support some kind of dependency injection.
Making implicit interfaces explicit
Statically typed, object-oriented languages provide interfaces as a language construct to encode certain kinds of contracts into the type system, and Haskell has something similar. Typeclasses are, in many ways, an analog to OO interfaces, and they can be used in a similar way. In the above case, let’s write down interfaces that the reverseFile
and renderUserProfile
functions can use:
class Monad m => MonadFS m where
readFile :: FilePath -> m String
writeFile :: FilePath -> String -> m ()
class Monad m => MonadDB m where
fetchUser :: Id User -> m User
fetchRecentPosts :: Id User -> m [Post]
The really nice thing about these interfaces is that our function implementations don’t have to change at all to take advantage of them. In fact, all we have to change is their types:
reverseFile :: MonadFS m => FilePath -> m ()
reverseFile path = do
contents <- readFile path
writeFile path (reverse contents)
renderUserProfile :: MonadDB m => Id User -> m HTML
renderUserProfile userId = do
user <- fetchUser userId
posts <- fetchRecentPosts userId
return $ div
[ h1 (userName user <> "’s Profile")
, h2 "Recent Posts"
, ul (map (li . postTitle) posts)
]
This is pretty neat, since we haven’t had to alter our code at all, but we’ve managed to completely decouple ourselves from IO
. This has the direct effect of both making our code more abstract (we no longer rely on the “real” file system or a “real” database, which makes our code easier to test) and restricting what our functions can do (just from looking at the type signatures, we know what side-effects they can perform).
Of course, since we’re now coding against an interface, our code doesn’t actually do much of anything. If we want to actually use the functions we’ve written, we’ll have to define instances of MonadFS
and MonadDB
. When actually running our code, we’ll probably still use IO
(or some monad transformer stack with IO
at the bottom), so we can define trivial instances for that existing use case:
instance MonadFS IO where
readFile = Prelude.readFile
writeFile = Prelude.writeFile
instance MonadDB IO where
fetchUser = SQL.fetchUser
fetchRecentPosts = SQL.fetchRecentPosts
Even if we go no further, this is already incredibly useful. By restricting the sorts of effects our functions can perform at the type level, it becomes a lot easier to see which code is interacting with what. This can be invaluable when working in a part of a moderately large codebase that you are unfamiliar with. Even if the only instance of these typeclasses is IO
, the benefits are immediately apparent.
Of course, this blog post is about testing, so we’re going to go further and take advantage of these seams we’ve now drawn. The question is: how?
Testing with typeclasses: an initial attempt
Given that we now have functions depending on an interface instead of IO
, we can create separate instances of our typeclasses for use in tests. Let’s start with the renderUserProfile
function. We’ll create a simple wrapper around the Identity
type, since we don’t actually care much about the “effects” of our MonadDB
methods:
import Data.Functor.Identity
newtype TestM a = TestM (Identity a)
deriving (Functor, Applicative, Monad)
unTestM :: TestM a -> a
unTestM (TestM (Identity x)) = x
Now, we’ll create a trivial instance of MonadDB
for TestM
:
instance MonadDB TestM where
fetchUser _ = return User { userName = "Alyssa" }
fetchRecentPosts _ = return
[ Post { postTitle = "Metacircular Evaluator" } ]
With this instance, it’s now possible to write a simple unit test of the renderUserProfile
function that doesn’t need a real database running at all:
spec = describe "renderUserProfile" $ do
it "shows the user’s name" $ do
let result = unTestM (renderUserProfile (intToId 1234))
result `shouldContainElement` h1 "Alyssa’s Profile"
it "shows a list of the user’s posts" $ do
let result = unTestM (renderUserProfile (intToId 1234))
result `shouldContainElement` ul [ li "Metacircular Evaluator" ]
This is pretty nice, and running the above tests reveals a nice property of these kinds of isolated test cases: the test suite runs really, really fast. Communicating with a database, even in extremely simple ways, takes a measurable amount of time, especially with dozens of tests. In contrast, even with hundreds of tests, our unit test suite runs in less than a tenth of a second.
This all seems to be successful, so let’s try and apply the same testing technique to reverseFile
.
Testing side-effectful code
Looking at the type signature for reverseFile
, we have a small problem:
reverseFile :: MonadFS m => FilePath -> m ()
Specifically, the return type is ()
. Making any assertions against the result of this function would be completely worthless, given that it’s guaranteed to be the same exact thing each time. Instead, reverseFile
is inherently side-effectful, so we want to be able to test that it properly interacts with the file system in the correct way.
In order to do this, a simple wrapper around Identity
won’t be enough, but we can replace it with something more powerful: Writer
. Specifically, we can use a writer monad to “log” what gets called in order to test side-effects. We’ll start by creating a new TestM
type, just like last time:
newtype TestM a = TestM (Writer [String] a)
deriving (Functor, Applicative, Monad, MonadWriter [String])
logTestM :: TestM a -> [String]
logTestM (TestM w) = execWriter w
Using this slightly more powerful type, we can write a useful instance of MonadFS
that will track the argument given to writeFile
:
instance MonadFS TestM where
readFile _ = return "hello"
writeFile _ contents = tell [contents]
Again, the instance is quite simple, but it now enables us to write a straightforward unit test for reverseFile
:
spec = describe "reverseFile" $
it "reverses a file’s contents on the filesystem" $ do
let calls = logTestM (reverseFile "foo.txt")
calls `shouldBe` ["olleh"]
Again, quite simple to both implement and use, and the test itself is blindingly fast. There’s another problem, though, which is that we have technically left part of reverseFile
untested: we’ve completely ignored the path
argument.
In this contrived example, it may seem silly to test something so trivial, but in real code, it’s quite possible that one would care very much about testing multiple different aspects about a single function. When testing renderUserProfile
, this was not hard, since we could reuse the same TestM
type and MonadDB
instance for both test cases, but in the reverseFile
example, we’ve ignored the path entirely.
We could adjust our MonadFS
instance to also track the path provided to each method, but this has a few problems. First, it means every test case would depend on all the various properties we are testing, which would mean updating every test case when we add a new one. It would also be simply impossible if we needed to track multiple types—in this particular case, it turns out that String
and FilePath
are actually the same type, but in practice, there may be a handful of disparate, incompatible types.
Both of the above issues could be fixed by creating a sum type and manually filtering out the relevant elements in each test case, but a much more intuitive approach would be to simply have a separate instance for each case. Unfortunately, in Haskell, creating a new instance means creating an entirely new type. To illustrate how much duplication that would entail, we could create the following type and instance for testing proper propagation of the path
argument:
newtype TestM' a = TestM' (Writer [FilePath] a)
deriving (Functor, Applicative, Monad, MonadWriter [FilePath])
logTestM' :: TestM' a -> [FilePath]
logTestM' (TestM' w) = execWriter w
instance MonadFS TestM' where
readFile path = tell [path] >> return ""
writeFile path _ = tell [path]
Now it’s possible to add an extra test case that asserts that the proper path is provided to the two filesystem functions:
spec = describe "reverseFile" $ do
it "reverses a file’s contents on the filesystem" $ do
let calls = logTestM (reverseFile "foo.txt")
calls `shouldBe` ["olleh"]
it "operates on the file at the provided path" $ do
let paths = logTestM' (reverseFile "foo.txt")
paths `shouldBe` ["foo.txt", "foo.txt"]
This works, but it’s ultimately unacceptably complicated. Our test harness code is now significantly larger than the actual tests themselves, and the amount of boilerplate is frustrating. Verbose test suites are especially bad, since forcing programmers to jump through hoops just to implement a single test reduces the likelihood that people will actually write good tests, if they write tests at all. In contrast, if writing tests is easy, then people will naturally write more of them.
The above strategy to writing tests is not good enough, but it does reveal a particular problem: in Haskell, typeclass instances are not first-class values that can be manipulated and abstracted over, they are static constructs that can only be managed by the compiler, and users do not have a direct way to modify them. With some cleverness, however, we can actually create an approximation of first-class typeclass dictionaries, which will allow us to dramatically simplify the above testing mechanism.
Creating first-class typeclass instances
In order to provide an easy way to construct instances, we need a way to represent instances as ordinary Haskell values. This is not terribly difficult, given that instances are conceptually just records containing a collection of functions. For example, we could create a datatype that represents an instance of the MonadFS
typeclass:
data MonadFSInst m = MonadFSInst
{ _readFile :: FilePath -> m String
, _writeFile :: FilePath -> String -> m ()
}
To avoid namespace clashes with the actual method identifiers, the record fields are prefixed with an underscore, but otherwise, the translation is remarkably straightforward. Using this record type, we can easily create values that represent the two instances we defined above:
contentInst :: MonadWriter [String] m => MonadFSInst m
contentInst = MonadFSInst
{ _readFile = \_ -> return "hello"
, _writeFile = \_ contents -> tell [contents]
}
pathInst :: MonadWriter [FilePath] m => MonadFSInst m
pathInst = MonadFSInst
{ _readFile = \path -> tell [path] >> return ""
, _writeFile = \path _ -> tell [path]
}
These two values represent two different implementations of MonadFS
, but since they’re ordinary Haskell values, they can be manipulated and even extended like any other records. This can be extremely useful, since it makes it possible to create a sort of “base” instance, then have individual test cases override individual pieces of functionality piecemeal.
Of course, although we’ve written these two instances, we have no way to actually use them. After all, Haskell does not provide a way to explicitly provide typeclass dictionaries. Fortunately, we can create a sort of “proxy” type that will use a reader to thread the dictionary around explicitly, and the instance can defer to the dictionary’s implementation.
Creating an instance proxy
To represent our proxy type, we’ll use a combination of a Writer
and a ReaderT
; the former to implement the logging used by instances, and the latter to actually thread around the dictionary. Our type will look like this:
newtype TestM log a =
TestM (ReaderT (MonadFSInst (TestM log)) (Writer log) a)
deriving ( Functor, Applicative, Monad
, MonadReader (MonadFSInst (TestM log))
, MonadWriter log
)
logTestM :: MonadFSInst (TestM log) -> TestM log a -> log
logTestM inst (TestM m) = execWriter (runReaderT m inst)
This might look rather complicated, and it is, but let’s break down exactly what it’s doing.
The
TestM
type includes two type parameters. The first is the type of value that will be logged (hence the namelog
), which corresponds to the argument toWriter
from previous incarnations ofTestM
. Unlike those types, though, we want this version to work with anyMonoid
, so we’ll make it a type parameter. The second parameter is simply the type of the current monadic value, as before.The type itself is defined as a wrapper around a small monad transformer stack, the first of which is
ReaderT
. The state threaded around by the reader is, in this case, the instance dictionary, which isMonadFSInst
.However, recall that
MonadFSInst
accepts a type variable—the type of a monad itself—so we must provideTestM log
as an argument toMonadFSInst
. This slight bit of indirection allows us to tie the knot between the mutually dependent instances and proxy type.The base monad in the transformer stack is
Writer
, which is used to actually implement the logging functionality, just like in prior cases. The only difference now is that thelog
type parameter now determines what the writer actually produces.Finally, as before, we use
GeneralizedNewtypeDeriving
to derive all the relevantmtl
classes, adding the somewhat wordyMonadReader
constraint to the list.
Using this single type, we can now implement a MonadFS
instance that defers to the dictionary carried around within TestM
’s reader state:
instance Monoid log => MonadFS (TestM log) where
readFile path = do
f <- asks _readFile
f path
writeFile path contents = do
f <- asks _writeFile
f path contents
This may seem somewhat boilerplate-y, and it is to some extent, but the important consideration is that this boilerplate only needs to be written once. With this in place, it’s now possible to write an arbitrary number of first-class instances that use the above mechanism without extending the mechanism at all.
To see what actually using this code would look like, let’s update the reverseFile
tests to use the new TestM
implementation, as well as the contentInst
and pathInst
dictionaries from earlier:
spec = describe "reverseFile" $ do
it "reverses a file’s contents on the filesystem" $ do
let calls = logTestM contentInst (reverseFile "foo.txt")
calls `shouldBe` ["olleh"]
it "operates on the file at the provided path" $ do
let paths = logTestM pathInst (reverseFile "foo.txt")
paths `shouldBe` ["foo.txt", "foo.txt"]
We can do a little bit better, though. Really, the definitions of contentInst
and pathInst
are specific to each test case. With ordinary typeclass instances, we cannot scope them to any particular block, but since MonadFSInst
is just an ordinary Haskell datatype, we can manipulate them just like any other Haskell values. Therefore, we can just inline those instances’ definitions into the test cases themselves to keep them closer to the actual tests.
spec = describe "reverseFile" $ do
it "reverses a file’s contents on the filesystem" $ do
let contentInst = MonadFSInst
{ _readFile = \_ -> return "hello"
, _writeFile = \_ contents -> tell [contents]
}
let calls = logTestM contentInst (reverseFile "foo.txt")
calls `shouldBe` ["olleh"]
it "operates on the file at the provided path" $ do
let pathInst = MonadFSInst
{ _readFile = \path -> tell [path] >> return ""
, _writeFile = \path _ -> tell [path]
}
let paths = logTestM pathInst (reverseFile "foo.txt")
paths `shouldBe` ["foo.txt", "foo.txt"]
This is pretty good. We’re now able to create inline instances of our MonadFS
typeclass, which allows us to write extremely concise unit tests using ordinary Haskell typeclasses as system seams. We’ve managed to cut down on the boilerplate considerably, though we still have a couple problems. For one, this example only uses a single typeclass containing only two methods. A real MonadFS
typeclass would likely have at least a dozen methods for performing various filesystem operations, and writing out the instance dictionaries for every single method, even the ones that aren’t used within the code under test, would be pretty frustratingly verbose.
This problem is solvable, though. Since instances are just ordinary Haskell records, we can create a “base” instance that just throws an exception whenever the method is called:
baseInst :: MonadFSInst m
baseInst = MonadFSInst
{ _readFile = error "unimplemented instance method ‘_readFile’"
, _writeFile = error "unimplemented instance method ‘_writeFile’"
}
Then code that only uses readFile
could only override that particular method, for example:
let myInst = baseInst { _readFile = ... }
Normally, of course, this would be a terrible idea. However, since this is all just test code, it can be extremely useful in quickly figuring out what methods need to be stubbed out for a particular test case. Since all the code actually gets run at test time, attempts to use unimplemented instance methods will immediately raise an error, informing the programmer which methods need to be implemented to make the test pass. This can also help to significantly cut down on the amount of effort it takes to implement each test.
Another problem is that our approach is specialized exclusively to MonadFS
. What about functions that use both MonadFS
and MonadDB
, for example? Fortunately, that is not hard to solve, either. We can adapt the MonadFSInst
type to include fields for all of the typeclasses relevant to our system, turning it into a generic test fixture of sorts:
data FixtureInst m = FixtureInst
{ -- MonadFS
_readFile :: FilePath -> m String
, _writeFile :: FilePath -> String -> m ()
-- MonadDB
, _fetchUser :: Id User -> m User
, _fetchRecentPosts :: Id User -> m [Post]
}
Updating TestM
to use FixtureInst
instead of MonadFSInst
is trivial, and all the rest of the infrastructure still works. However, this means that every time a new typeclass is added, three things need to be updated:
Its methods need to be added to the
FixtureInst
record.Those methods need to be given error-raising defaults in the
baseInst
value.An actual instance of the typeclass needs to be written for
TestM
that defers to theFixtureInst
value.
Furthermore, most of this manual manipulation of methods is required every time a particular typeclass changes, whether that means adding a method, removing a method, renaming a method, or changing a method’s type. This is especially frustrating given that all this code is really just mechanical boilerplate that could all be derived by the set of typeclasses being tested.
That last point is especially important: aside from the instances themselves, every piece of boilerplate above is obviously possible to generate from existing types alone. With that piece of information in mind, we can do even better: we can use Template Haskell.
Removing the boilerplate using test-fixture
The above code was not only rather boilerplate-heavy, it was pretty complicated. Fortunately, you don’t actually have to write it. Enter the library test-fixture
:
import Control.Monad.TestFixture
import Control.Monad.TestFixture.TH
mkFixture "FixtureInst" [''MonadFS, ''MonadDB]
spec = describe "reverseFile" $ do
it "reverses a file’s contents on the filesystem" $ do
let contentInst = def
{ _readFile = \_ -> return "hello"
, _writeFile = \_ contents -> log contents
}
let calls = logTestFixture (reverseFile "foo.txt") contentInst
calls `shouldBe` ["olleh"]
it "operates on the file at the provided path" $ do
let pathInst = def
{ _readFile = \path -> log path >> return ""
, _writeFile = \path _ -> log path
}
let paths = logTestFixture (reverseFile "foo.txt") pathInst
paths `shouldBe` ["foo.txt", "foo.txt"]
That’s it. The above code automatically generates everything you need to write fast, simple, deterministic unit tests in Haskell. The mkFixture
function is a Template Haskell macro that expands into a definition quite similar to the FixtureInst
type we wrote by hand, but since it’s automatically generated from the typeclass definitions, it never needs to be updated.
The logTestFixture
function replaces the logTestM
function we wrote by hand, but it works exactly the same. The Control.Monad.TestFixture
library also exports a log
function that is a synonym for tell . singleton
, but using tell
directly still works if you prefer.
The mkFixture
function also generates a Default
instance, which replaces the baseInst
value defined earlier. It functions the same way, though, producing useful error messages that refer to the names of unimplemented typeclass methods that have not been stubbed out.
This blog post is not a test-fixture
tutorial—indeed, it is much more complicated than a test-fixture
tutorial would be, since it covers what the library is really doing under the hood—but if you’re interested, I would highly recommend you take a look at the test-fixture
documentation on Hackage.
Conclusion, credits, and similar techniques
This blog post came about as the result of a need my coworkers and I found when writing Haskell code; we wanted a way to write unit tests quickly and easily, but we didn’t find much advice from the rest of the Haskell ecosystem. The test-fixture
library is the result of that exploratory work, and we currently use it to test a significant portion of our Haskell code.
It would be extremely unfair to suggest that I was the inventor of this technique or the inventor of the library. Two of my coworkers, Joe Vargas and Greg Wiley, came up with the general approach and wrote Control.Monad.TestFixture
, and I simply wrote the Template Haskell macro to eliminate the boilerplate. With that in mind, I think I can say with some fairness that I think this technique is a joy to use when unit testing is a desirable goal, and I would definitely recommend it if you are interested in doing isolated testing in Haskell.
The general technique of using typeclasses to emulate effects was in part inspired by the well-known mtl
library. An alternate approach to writing unit-testable Haskell code is using free monads, but overall, I prefer this approach over free monads because the typeclass constraints add type safety in ways that free monads do not (at least not without additional boilerplate), and this approach also lends itself well to static analysis-based boilerplate reduction techniques. It has its own tradeoffs, though, so if you’ve had success with free monads, then I certainly make no claim this is a superior approach, just one that I’ve personally found pleasant.
As a final note, if you do check out test-fixture
, feel free to leave feedback by opening issues on the GitHub issue tracker—even things like confusing documentation are worth a bug report.