Typeclass induction and developing a QuickCheck-like library
What is typeclass induction
In this post we’ll approach a technique called Typeclass induction, which allows us to take polymorphism to an entirely new level! Now, a typical typeclass will create a function that is able to handle a finite number of types - all the types that instantiate the typeclass. Typeclass induction, however, allows us to create a system in which an infinite amount of types could instantiate our typeclass (limited only by arbitrary compiler and memory limits, of course)!
Now, why would we want something like this, anyway? Of course, there’s no utility in having an infinite amount of types instantiating a typeclass, but this same mechanism allows us to get things like types that admit a “recursive shape” to instantiate a typeclass. But what do I mean by recursive shape? Multiparameter functions admit a recursive shape, for example:
twoArgumentFunction :: Int -> Char -> Bool
We know that this is the same as:
twoArgumentFunction :: Int -> (Char -> Bool)
Since (->) :: * -> * -> *
is a constructor for functions, twoArgumentFunction
can also be seen as:
twoArgumentFunction :: (->) Int ((->) Char Bool)
The line above is valid Haskell, go ahead and try it!
Now, do you see the recursive shape? We could take it a step further and get something like threeArgumentFunction :: (->) Integer ((->) Int ((->) Char Bool))
. For an N-argument function, we’ll always have a shape such as
nArgumentFunction :: (->) a1 ((->) a2 ((->) a3 ((->) ... ((->) aN r))))...)
And with typeclass induction we could create a Function
typeclass that is instantiated by twoArgumentFunction
, threeArgumentFunction
and all possible nArgumentFunction
s!
Practical utility
So why would we want to do this? Well, being able to pass an N-argument function is something we do when using QuickCheck, for instance! QuickCheck runs our test functions (with any number of arguments) generating random arguments and testing a property.
Suppose we write a function called even
which returns True
for even numbers and we want to test it with some values to make sure it’s correct. We know that any integer times two is even and all numbers of the form 2n − 1 are odd, so we could test our function with
-> even (n * 2))
test (\n -> not (even (n * 2 - 1))) test (\n
With typeclass induction we could create mechanisms to call these two functions with many random integers while making sure it passes all checks, helping us increase our confidence that our implementation is sound.
Coding a QuickCheck-like solution
You could go ahead and look at the code for QuickCheck, but here I’d like to show you my first efforts in trying to create a Testable
typeclass, so that we can learn two ways to do this and the differences between them. Either way, our final solution will be much, much simpler than QuickCheck - of course -, but it will still be impressive how little code we have to write in Haskell to get something reasonably powerful.
Enough talk: let’s give it a shot. The first attempt won’t generate random input data when testing a function; it will only know how to generate Int
values of 37
. We want Bool
-returning functions to instantiate Testable
.
Our first attempt
class Gen a where
gen :: a
instance Gen Int where
= 37
gen
class Testable a where
test :: a -> Bool
instance Gen a => Testable ((->) a Bool) where
= f gen test f
All right, but so far only one-argument functions can be used. That first instance of Testable
is our base case, so let us code the inductive step.
instance (Gen a, Testable b) => Testable ((->) a b) where
= let x = f gen in test x test f
This looks really nice! It takes some time to digest it but try to pay attention to the recursive shape of this last instance. We generate data with gen
and apply f
, a one-parameter function, to it, then get the result of that, which is Testable
itself and test it! This should be enough to capture N-argument functions. Let’s try loading this on GHCI:
• Illegal instance declaration for ‘Testable (a -> Bool)’
(All instance types must be of the form (T a1 ... an)
where a1 ... an are *distinct type variables*,
and each type variable appears at most once in the instance head.
Use FlexibleInstances if you want to disable this.)
• In the instance declaration for ‘Testable ((->) a Bool)’
Ok, so we need to use FlexibleInstances if we want to declare instances not in the strict form described in the error (for more context on this, see https://prime.haskell.org/wiki/FlexibleInstances). It seems to be a benign extension. Let’s just add {-# LANGUAGE FlexibleInstances #-}
, reload it and let’s test a simple function with it.
*First> let equals37 n = n == (37 :: Int)
*First> test equals37
<interactive>:6:1: error
• Overlapping instances for Testable (Int -> Bool)
arising from a use of ‘test’
Matching instances
instance [safe] (Gen a, Testable b) => Testable (a -> b)
-- Defined at First.hs:16:10
instance [safe] Gen a => Testable (a -> Bool)
-- Defined at First.hs:13:10
• In the expression: test equals37
In an equation for ‘it’: it = test equals37
Oh no.. it’s saying it doesn’t know which of our instances to apply to equals37
. But it should be able to! Because for equals37 :: Int -> Bool
to match our instance (Gen a, Testable b) => Testable ((->) a b)
instance would mean a ~ Int
and b ~ Bool
, but Bool
is not Testable
!
So why is this happening? Well, according to GHC’s manual, GHC’s instance resolution does not consider contexts (the contraints before =>
) when resolving which instance to apply. The two instances look like instance Testable ((->) a Bool)
and instance Testable ((->) a b)
to GHC, and because of that they both match equals37
.
To be honest, I was a little frustrated by this and it took me some time to find the instance resolution algorithm’s specifics that explain the error. Of course, Haskell’s developers are really smart, so there must be either a historical reason or some non-termination issue with instance resolution if it were to consider contexts (or something else, who knows). But at least I had the opportunity to learn about {-# Overlapping #-}
, which is almost self-explanatory: should the compiler find more than one matching instance, choose the one marked with Overlapping. Let’s change our instance to the following:
instance {-# Overlapping #-} Gen a => Testable ((->) a Bool) where
= f gen test f
Now reload it on GHCI and try again. It will finally work, and it will work with N-argument functions too, as one can easily see:
*First> let threeArgFunction a b c = a == (37 :: Int) && b /= a && c == a
*First> :t threeArgFunction
threeArgFunction :: Int -> Int -> Int -> Bool
*First> test threeArgFunction
Doing this without using {-# Overlapping #-}
Using {-# Overlapping #-}
doesn’t feel so good, of course. Is there another way?
It turns out there is, but our Testable
type class will no longer apply only to Bool
-returning functions. It will also apply to Bool
itself:
instance Testable Bool where
= x
test x
instance (Gen a, Testable b) => Testable ((->) a b) where
= let f' = f gen in test f' test f
Change only these two instances and there’s no longer any need for FlexibleInstances
nor {-# Overlapping #-}
, but now you can also test Bool
if you want, something I wanted to avoid since it’s kind of nonsensical to me (but not such a terrible consequence, to be honest).
Anyways, let’s add some random numbers to get our final solution! We’ll throw away the Gen
type class and use Random
directly instead. I wouldn’t recommend this approach as a good solution, since you’d need to create orphan instances for types such as String
or others that don’t instantiate Random
; we’re using it here just for the sake of brevity.
module ZabaCheck where
import System.Random
class Testable a where
testWith :: RandomGen g => g -> a -> (Bool, g)
instance Testable Bool where
= (x, g)
testWith g x
instance (Random a, Testable b) => Testable ((->) a b) where
= let (param, nextG) = random g in testWith nextG (f param)
testWith g f
test :: Testable a => a -> Bool
= and $ fst $ foldl (\(resList, g) _ -> let (res, g') = testWith g f in (res : resList, g')) ([], mkStdGen 0) [1..100] test f
Our test
function tests the input function 100 times, weaving RandomGen
through every randomly generated parameter. And this, my friends, is a 14-line property checking library (counting empty lines) written using typeclass induction. Again, it always amazes me how Haskell can be concise and extremely powerful at the same time.
I hope you liked this post. Suggestions and corrections are very very welcome.