Software development

More GADTs in PureScript

Topics
functional programming, haskell, purescript, types, technology

Some time ago, someone asked how to encoded GADTs, Generalized algebraic datatypes in PureScript. That's actually a very interesting question!

Warning: lot of code with higher order types ahead!

Let's start from the very beginning. What are generalised abstract datatypes, GADTs? Ordinary algebraic data types can be parametrised by a type parameter, for example Maybe or List. However GADTs gives us ability to be less parametric, so that some constructors can produce only a value with some type argument.

The often used example is an expression type, and we will use one later. But let's begin with a request type to do interact with GitHub:

data GitHubRequest
    = Repositories OwnerName
    | Issues OwnerName RepoName

and we want to execute this request:

exec :: GitHubRequest -> IO ???

We hit a problem we cannot state the type. One solution is to have separate execution functions, like

repositories :: OwnerName -> IO [Repository]
issues :: OwnerName -> RepoName -> IO [Issue]

but it's not optimal. We cannot abstract over the requests. We could end up with lots of similar functions, normal, one using authentication, limiting the response.

repositories   :: OwnerName -> IO [Repository]
repositories'  :: Auth -> OwnerName -> IO [Repository]
repositories'' :: Auth -> Maybe MaxSize -> IO [Repositories]

The next step, still Haskell98 is to use phantom types:

data GitHubRequest a =
    = Repositories OwnerName
    | Issues OwnerName RepoName

the only change is the additional type parameter, which isn't used anywhere. Now we can write function with type signature

repositoriesReq :: OwnerName -> GitHubRequest [Repository]

This is an improvement. Now we can combine GitHubRequest with Auth to perform authenticated requests But somehow this is still not good enough. We need to make Request type opaque (e.g. use smart constructors), otherwise someone might construct a request value of the meaningless type. And the compiler won't catch that:

Issues ownerName repoName :: GitHubRequest [Repositories]

GADTs gives us ability to restrict the type of constructors:

data GitHubRequest a where
    Repositories :: OwnerName -> GitHubRequest [Repository]
    Issues       :: OwnerName -> RepoName -> GitHubRequest [Issue]

Using this type, it's very straightforward to implement execution function

execRequest :: GitHubRequest a -> IO a

This pattern of defining "an expression language", in this case language of GitHub requests and separate interpreter, the execRequest, is very common pattern in (typed) functional programming.

For the rest of the gist, we'll however concentrate on a different expression language. THis way we can have the code self contained, and hopefully help concentrate on actual type juggling. One can mechanically apply the ideas the GitHubRequest type, if you want so.

As this post is born as a literate Haskell file, first we need few language extensions and imports:

{-# LANGUAGE GADTs, RankNTypes, TypeOperators, ScopedTypeVariables, EmptyCase #-}

-- Leibniz equality
import Data.Eq.Type         as Leibniz
import Data.Type.Equality   as Eq
import Data.Functor.Product

GADT

Let's start with GADTs as we do them in Haskell nowadays:

data GExpr a where
    GZero :: GExpr Int                         -- 0
    GSucc :: GExpr Int -> GExpr Int            -- succ
    GBool :: Bool -> GExpr Bool                -- True / False
    GEq   :: GExpr a -> GExpr a -> GExpr Bool  -- (==), this clause include existentials!

GZero produces an expression of type Int, and so on. In this example we also have an existential quantification in GEq constructor. The type variable a is not visible outside. Yet both parameters of GEq must have the same index.

So how to encode this in a language without support for GADTs? The only thing we need is RankNTypes (and some sort of impredicativity).

First let's talk about type equality. With GADTs we can have an actual witness of (a ~ b) type equality constraint:

data a :~: b where
    Refl :: a :~: a

but this is formulated as a GADT itself! Luckily we can use a different encoding of the equality, It seems a bit strange at first:

data a := b = Refl { subst :: forall c. c a -> c a }

Leibnizian equality states that two things are equal if you can substitute one for the other in all contexts.

The all is essential here.

Then we can encode GADTs as ADT by providing equality witness explicitly. For example the GZero constructor is implemented under the hood as

GZero :: Int ~ a => GExpr a

so by dictionary translation (https://skillsmatter.com/skillscasts/8978-checking-and-translating-type-classes) we get

LZero :: (Int := a) -> LExpr a

Cool isn't it?

Exists

The second complication, is that we need a way to encode existential quantification. Fortunately it's possible to encode using universal one. This is a real mind twister: The key idea is that f a appears in the negative position. As actual type a is existentially hidden, we need to provide functions which work with all possible a to make use of the value.

-- | Existential quantification using the universal one.
newtype Exists f = MkExists { runExists :: forall r. (forall a. f a -> r) -> r }
-- For example existential list is isomorphic to integer
mkExists :: forall f a. f a -> Exists f
mkExists x = MkExists $ \f -> f x

existsEx :: Exists []
existsEx = mkExists ("foobar" :: [Char])

existsEx2 :: Exists [] -> Int
existsEx2 e = runExists e length

-- | This is 6
existsEx3 :: Int
existsEx3 = existsEx2 existsEx

GADT as ADT

Now we have all needed ingredients to write GExpr as an "ordinary" ADT.

data LExpr a
    = LZero (Int := a)
    | LSucc (Int := a) (LExpr a)
    | LBool (Bool := a) Bool
    | LEq   (Bool := a) (Exists (LExpr :*: LExpr))

-- | Infix operators are handy!
type (:*:) = Product

-- To make work with this form a bit easier, we define smart constructors:
lzero :: LExpr Int
lzero = LZero refl

lsucc :: LExpr Int -> LExpr Int
lsucc = LSucc refl

lbool :: Bool -> LExpr Bool
lbool = LBool refl

leq :: LExpr a -> LExpr a -> LExpr Bool
leq u v = LEq refl (mkExists $ Pair u v)

And to proof that the forms are equivalent we can write functions transforming from one to another.

From GADT to ADT direction is easy, GHC does magic for us.

fromGExpr :: GExpr a -> LExpr a
fromGExpr GZero     = lzero
fromGExpr (GSucc n) = lsucc (fromGExpr n)
fromGExpr (GBool b) = lbool b
fromGExpr (GEq u v) = leq (fromGExpr u) (fromGExpr v)

The reverse direction is more tricky as now we need to use 'subst' explicitly. Also to go subst in opposite direction we need to manually construct another proof.

toGExpr :: LExpr a -> GExpr a
toGExpr (LZero proof)   = subst proof $ GZero
toGExpr (LSucc proof n) = subst proof $ GSucc $ subst (symm proof) $ toGExpr n -- symm :: a := b -> b := a
toGExpr (LBool proof b) = subst proof $ GBool b
toGExpr (LEq proof uv)  = subst proof $ runExists uv $ \(Pair u v) ->
    GEq (toGExpr u) (toGExpr v)

Using the GADT as ADT

Now we have encoded our types, let's use them. Not surprisingly using GADT version is quite convenient. For example getting a Int value out:

gToInt :: GExpr Int -> Int
gToInt GZero = 0
gToInt (GSucc n) = succ (gToInt n)

GHC is smart enough to not complain about GBool and GEq cases, as they cannot happen, because we have GExpr Int, not GExpr Bool.

Let's try with a LExpr version!

lToInt :: LExpr Int -> Int
lToInt (LZero _proof)    = 0
lToInt (LSucc _proof n)  = succ (lToInt n)
-- But now exhaustiveness checker complains about LBool and LEq cases?
-- Well, as we have Bool := Int "proof", we'll just use it!
lToInt (LBool proof b)   = coerce proof b  -- coerce :: a := b -> a -> b
lToInt l@(LEq proof _uv) = coerce proof $ lToBool $ subst (symm proof) l

Here we went a bit ahead, and used not yet defined lToBool. Let's define it's GADT counter part first, as it's simpler.

gType :: GExpr a -> Either (a :~: Int) (a :~: Bool)
gType GZero {} = Left Eq.Refl
gType GSucc {} = Left Eq.Refl
gType GBool {} = Right Eq.Refl
gType GEq {}   = Right Eq.Refl

gToBool :: GExpr Bool -> Bool
gToBool (GBool b) = b
gToBool (GEq u v) = case gType u of
    Left Eq.Refl -> gToInt u == gToInt v
    Right Eq.Refl -> gToBool u == gToBool v
-- Here we learn what existential parameter is, Int or Bool so we can use
-- the right equality!

Manual approach is actually not magical at all. We just extract the proof from inside the data type:

lType :: LExpr a -> Either (a := Int) (a := Bool)
lType (LZero proof)   = Left $ symm proof
lType (LSucc proof _) = Left $ symm proof
lType (LBool proof _) = Right $ symm proof
lType (LEq proof _ )  = Right $ symm proof

-- Then the lToBool is straight-forward, though tedious, to write:
lToBool :: LExpr Bool -> Bool
lToBool e@(LZero proof)   = coerce proof $ lToInt $ subst (symm proof) e
lToBool e@(LSucc proof _) = coerce proof $ lToInt $ subst (symm proof) e
lToBool (LBool proof b)   = coerce proof b
lToBool (LEq _proof e)    = runExists e $ \(Pair u v) -> case lType u of
    Left proof  -> lToInt (subst proof u) == lToInt (subst proof v)
    Right proof -> lToBool (subst proof u) == lToBool (subst proof v)

Interlude

We have shown how to encode GADTs with existentials using only RankNTypes. It's possible. However, there is non-trivial programming cost, as well, as probably non-trivial run-time performance cost as well.

The type equality constraints, and GADts do really help. We are able to express more problems in less space. Everything seems to be possible with Leibniz equality and Exists, but at what cost!

Few words about falsehood

As we have general recursion, we can cheat and construct a proof of any types being equal, for example

falsehood :: Int := Bool
falsehood = Leibniz.Refl proof
  where
    proof x = proof x

but this isn't very useful, obviously. Meta theory prevents us (or at least should!) from constructing terminating terms of such type.

On the other hand, with GADT equality definition, we can use ex falso quadlibet principle directly (we need EmptyCase extension for this):

boom :: Int :~: Bool -> a
boom p = case p of {}
bazinga :: Int := Bool -> a
bazinga (Leibniz.Refl _f) = undefined -- ???

With built-in type-equality, the compiler knows that Int ins't Bool, but we give it a proof that it is. The principle ex falso quodlibet P /\ ~P => _|_ says that if we know "P" and not "P" then any proposition can be proven (or value of any type constructed).

In the RankNType framework (i.e. System F), the falsehood is encoded as "value of any type": forall a. a. With GADTs GHC gives us the proof, but here we need to construct it ourselves. And it's something we cannot do.

Comparing expressions

One advanced exercise is heterogeonous equality. Given GExpr a and GExpr b we'd like to compare them. We have already seen that, we can recover the type argument via gType, but direct comparison is still interesting:

In the definition we use do-notation, Maybe is a monad! The trick I recently learn in Simon Peyton Jones keynote as Haskell eXchange There, by the way, similar function was defined: typeEq :: TypeRep a -> TypeRep -> Maybe (a :~: b) (There is a type class with such operation: GEq)

gequals :: GExpr a -> GExpr b -> Maybe (a :~: b)
gequals GZero GZero         = Just Eq.Refl
gequals (GSucc n) (GSucc m) = do
    Eq.Refl <- gequals n m
    pure Eq.Refl
gequals (GBool a) (GBool b)
    | a == b                = Just Eq.Refl
    | otherwise             = Nothing
gequals (GEq x y) (GEq u v) = do
    Eq.Refl <- gequals x u
    Eq.Refl <- gequals y v
    pure Eq.Refl
gequals _ _                 = Nothing -- otherwise we give up

The Leibniz variant is far more complicated, as you could guessed.

lequals :: LExpr a -> LExpr b -> Maybe (a := b)
lequals (LZero p)   (LZero q)     = Just (Leibniz.trans (symm p) q)
lequals (LSucc p n) (LSucc q m)   = do
    _proof <- lequals n m
    pure (Leibniz.trans (symm p) q)
lequals (LBool p a) (LBool q b)
    | a == b                      = Just (Leibniz.trans (symm p) q)
    | otherwise                   = Nothing
lequals (LEq p xy) (LEq q uv)
    | existsEquals lequals' xy uv = Just (Leibniz.trans (symm p) q)
    | otherwise                   = Nothing
lequals _ _                       = Nothing

lequals' :: (LExpr :*: LExpr) a -> (LExpr :*: LExpr) b -> Maybe (a := b)
lequals' (Pair x y) (Pair u v) = do
    proof  <- lequals x u
    _proof <- lequals y v -- it's irrelevant which proof we pick!
    pure proof

We used a little helper to compare two existentials:

existsEquals
    :: (forall a b. f a -> f b -> Maybe (a := b))
    -> Exists f -> Exists f
    -> Bool
existsEquals eq ex ey =
    runExists ex $ \x ->
    runExists ey $ \y ->
    case eq x y of
        Just _  -> True
        Nothing -> False

-- >>> gequals (GEq GZero GZero) (GEq GZero GZero)
-- Just Refl
-- >>> lequals (leq lzero lzero) (leq lzero lzero)
-- Just Leibniz.Refl

Before the conclusion, we used an orphan instance for above:

instance Show (a := b) where
    showsPrec _ _ = showString "Leibniz.Refl"

Conclusion

There are familiar posts: Approximating GADTs in PureScript and referred paper GADTless Programming in Haskell 98 by Martin Sulzmann and Meng Wang. Both however miss existentials, which I important for the encoding of many GADts.

There aren't many reddit comments for the post. I agree, it's great that it's great you can solve problems without extending the language. On the other hand, I have no idea how inconvenient it is to work with GADTs encoded this way in the long run. It helps when the compiler (= machine) does at least some work for you!

Writing the GADTs encoding in PureScript felt a bit like writing type-level trickery in Haskell. With latter you might tray with Agda to convince yourself it's possible, and then encode using features available in GHC Haskell. Direct exploratory programming is quite hard, you have to know what you are doing.

Also there were concerns about writing binary functions. Hopefully the last equals example shows it's possible. Even you cannot write unsafeCoerce from a Int := Bool fact, you can still coerce non-matching types to each other:

sum :: forall a . T a -> T a -> T a
sum (B b1 p) (B b2 _) = B (b1 || b2) p
sum (I n1 p) (I n2 _) = I (n1 + n2) p
sum (I n p) (B b q) _ = I (n + b') p
  where
    b' = coerce (trans ...) b

However the RankNTypes approach has limitations. SPJ mentioned the heterogeneous type equality: data (a :: k) :~~: (a :: k'). I'm not sure what's the story with polymorphic kinds in PureScript, but if at some point you want to have that, you cannot, as we'd need a way to abstract over kinds:

data _≅′_ {k : Set} (a : k) {k′ : Set} (b : k′) : Set₁ where
    subst' : (keq : (kc : Set  Set)  kc k  kc k′)
            (c : k′  Set)  (c (keq  t  t) a)  c b)
            a ≅′ b

And before I stop, you might wonder: this all is in Haskell, is it actually possible in PureScript? Yes it is. Check the gist: https://gist.github.com/phadej/1e3cbf35495c2951860cd9effb7723e1.