Typeclass induction and developing a QuickCheck-like library

Posted on August 13, 2018

What is typeclass induction

In this post we’ll approach a technique called Typeclass induction, which allows us to take polymorphism to an entirely new level! Now, a typical typeclass will create a function that is able to handle a finite number of types - all the types that instantiate the typeclass. Typeclass induction, however, allows us to create a system in which an infinite amount of types could instantiate our typeclass (limited only by arbitrary compiler and memory limits, of course)!

Now, why would we want something like this, anyway? Of course, there’s no utility in having an infinite amount of types instantiating a typeclass, but this same mechanism allows us to get things like types that admit a “recursive shape” to instantiate a typeclass. But what do I mean by recursive shape? Multiparameter functions admit a recursive shape, for example:

twoArgumentFunction :: Int -> Char -> Bool

We know that this is the same as:

twoArgumentFunction :: Int -> (Char -> Bool)

Since (->) :: * -> * -> * is a constructor for functions, twoArgumentFunction can also be seen as:

twoArgumentFunction :: (->) Int ((->) Char Bool)

The line above is valid Haskell, go ahead and try it!

Now, do you see the recursive shape? We could take it a step further and get something like threeArgumentFunction :: (->) Integer ((->) Int ((->) Char Bool)). For an N-argument function, we’ll always have a shape such as

nArgumentFunction :: (->) a1 ((->) a2 ((->) a3 ((->) ... ((->) aN r))))...)

And with typeclass induction we could create a Function typeclass that is instantiated by twoArgumentFunction, threeArgumentFunction and all possible nArgumentFunctions!

Practical utility

So why would we want to do this? Well, being able to pass an N-argument function is something we do when using QuickCheck, for instance! QuickCheck runs our test functions (with any number of arguments) generating random arguments and testing a property.

Suppose we write a function called even which returns True for even numbers and we want to test it with some values to make sure it’s correct. We know that any integer times two is even and all numbers of the form 2n − 1 are odd, so we could test our function with

test (\n -> even (n * 2))
test (\n -> not (even (n * 2 - 1)))

With typeclass induction we could create mechanisms to call these two functions with many random integers while making sure it passes all checks, helping us increase our confidence that our implementation is sound.

Coding a QuickCheck-like solution

You could go ahead and look at the code for QuickCheck, but here I’d like to show you my first efforts in trying to create a Testable typeclass, so that we can learn two ways to do this and the differences between them. Either way, our final solution will be much, much simpler than QuickCheck - of course -, but it will still be impressive how little code we have to write in Haskell to get something reasonably powerful.

Enough talk: let’s give it a shot. The first attempt won’t generate random input data when testing a function; it will only know how to generate Int values of 37. We want Bool-returning functions to instantiate Testable.

Our first attempt

class Gen a where
    gen :: a

instance Gen Int where
    gen = 37

class Testable a where
    test :: a -> Bool

instance Gen a => Testable ((->) a Bool) where
    test f = f gen

All right, but so far only one-argument functions can be used. That first instance of Testable is our base case, so let us code the inductive step.

instance (Gen a, Testable b) => Testable ((->) a b) where
    test f = let x = f gen in test x

This looks really nice! It takes some time to digest it but try to pay attention to the recursive shape of this last instance. We generate data with gen and apply f, a one-parameter function, to it, then get the result of that, which is Testable itself and test it! This should be enough to capture N-argument functions. Let’s try loading this on GHCI:

• Illegal instance declaration for ‘Testable (a -> Bool)’
    (All instance types must be of the form (T a1 ... an)
     where a1 ... an are *distinct type variables*,
     and each type variable appears at most once in the instance head.
     Use FlexibleInstances if you want to disable this.)
• In the instance declaration for ‘Testable ((->) a Bool)’

Ok, so we need to use FlexibleInstances if we want to declare instances not in the strict form described in the error (for more context on this, see https://prime.haskell.org/wiki/FlexibleInstances). It seems to be a benign extension. Let’s just add {-# LANGUAGE FlexibleInstances #-}, reload it and let’s test a simple function with it.

*First> let equals37 n = n == (37 :: Int)
*First> test equals37

<interactive>:6:1: error
• Overlapping instances for Testable (Int -> Bool)
    arising from a use of ‘test’
  Matching instances
    instance [safe] (Gen a, Testable b) => Testable (a -> b)
      -- Defined at First.hs:16:10
    instance [safe] Gen a => Testable (a -> Bool)
      -- Defined at First.hs:13:10
• In the expression: test equals37
  In an equation for ‘it’: it = test equals37

Oh no.. it’s saying it doesn’t know which of our instances to apply to equals37. But it should be able to! Because for equals37 :: Int -> Bool to match our instance (Gen a, Testable b) => Testable ((->) a b) instance would mean a ~ Int and b ~ Bool, but Bool is not Testable!

So why is this happening? Well, according to GHC’s manual, GHC’s instance resolution does not consider contexts (the contraints before =>) when resolving which instance to apply. The two instances look like instance Testable ((->) a Bool) and instance Testable ((->) a b) to GHC, and because of that they both match equals37.

To be honest, I was a little frustrated by this and it took me some time to find the instance resolution algorithm’s specifics that explain the error. Of course, Haskell’s developers are really smart, so there must be either a historical reason or some non-termination issue with instance resolution if it were to consider contexts (or something else, who knows). But at least I had the opportunity to learn about {-# Overlapping #-}, which is almost self-explanatory: should the compiler find more than one matching instance, choose the one marked with Overlapping. Let’s change our instance to the following:

instance {-# Overlapping #-} Gen a => Testable ((->) a Bool) where
    test f = f gen

Now reload it on GHCI and try again. It will finally work, and it will work with N-argument functions too, as one can easily see:

*First> let threeArgFunction a b c = a == (37 :: Int) && b /= a && c == a
*First> :t threeArgFunction
threeArgFunction :: Int -> Int -> Int -> Bool
*First> test threeArgFunction

Doing this without using `{-# Overlapping #-}`

Using {-# Overlapping #-} doesn’t feel so good, of course. Is there another way?

It turns out there is, but our Testable type class will no longer apply only to Bool-returning functions. It will also apply to Bool itself:

instance Testable Bool where
    test x = x

instance (Gen a, Testable b) => Testable ((->) a b) where
    test f = let f' = f gen in test f'

Change only these two instances and there’s no longer any need for FlexibleInstances nor {-# Overlapping #-}, but now you can also test Bool if you want, something I wanted to avoid since it’s kind of nonsensical to me (but not such a terrible consequence, to be honest).

Anyways, let’s add some random numbers to get our final solution! We’ll throw away the Gen type class and use Random directly instead. I wouldn’t recommend this approach as a good solution, since you’d need to create orphan instances for types such as String or others that don’t instantiate Random; we’re using it here just for the sake of brevity.

module ZabaCheck where
import System.Random

class Testable a where
    testWith :: RandomGen g => g -> a -> (Bool, g)

instance Testable Bool where
    testWith g x = (x, g)

instance (Random a, Testable b) => Testable ((->) a b) where
    testWith g f = let (param, nextG) = random g in testWith nextG (f param)

test :: Testable a => a -> Bool
test f = and $ fst $ foldl (\(resList, g) _ -> let (res, g') = testWith g f in (res : resList, g')) ([], mkStdGen 0) [1..100]

Our test function tests the input function 100 times, weaving RandomGen through every randomly generated parameter. And this, my friends, is a 14-line property checking library (counting empty lines) written using typeclass induction. Again, it always amazes me how Haskell can be concise and extremely powerful at the same time.

I hope you liked this post. Suggestions and corrections are very very welcome.