Writing a small llvm compiler front end in haskell

Posted on July 28, 2017

LLVM is an awesome framework, and there is a very nice tutorial out there that explains how to use llvm with haskell to write a compiler frontend. Said tutorial covers all aspects of using LLVM from haskell, starting from writing a lexer and parser. While this is certainly very good to have and the end result is a complete (albeit small) language, it was too much for my taste to get started with llvm. I wanted something simple, that would fit on a few pages and shows how LLVM is used from haskell, without having to understand the semantics of a particular language and so forth.

So I decided to write the smallest possible llvm front end in haskell that I can think of. The first idea was to calculate the birth year of Adam Weishaupt (hence the variable names) but I decided I’d keep it even simpler and just add a few values and emit the result as characters to the console by calling the libc function putchar(). My work is based on said tutorial and stripped down to the bare minimum. If you want more, go read that (excellent) tutorial, and the documentation of the llvm framework. :-)

I’m not using a state monad, which makes the code a little less convenient than the C++ equivalent, because we explicitly need to keep track of the state, as you will see in the code. Anyway, let’s get started.

First, the boring part, the imports:

{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Int
import Foreign.Ptr
import LLVM
import LLVM.Analysis
import LLVM.AST
import LLVM.AST.CallingConvention
import LLVM.AST.Global
import LLVM.AST.Linkage
import LLVM.Context
import LLVM.Module
import LLVM.PassManager
import qualified LLVM.AST.Constant as ASTC
import qualified LLVM.AST.Operand as ASTO
import qualified LLVM.ExecutionEngine as EE
import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Data.ByteString.Char8 as S8
import LLVM.AST.Instruction

Okay, now that that’s done, we need to define some types. We’ll be working with 64 bit integers. The putchar function has the following type signature in C:

void putchar(int character);

So, let’s define a 64 bit integer type and a type for the putchar function.

int64type :: Type
int64type = IntegerType 64
printT :: Type
printT = FunctionType VoidType [int64type] False

Next, some configuration for the llvm optimization pipeline and JIT compiler.

jit :: Context -> (EE.MCJIT -> IO a) -> IO a
jit c = EE.withMCJIT c optlevel model pretlim fastins
  where
    optlevel    = Just 3
    model       = Nothing
    pretlim     = Nothing
    fastins     = Nothing

Since we use a JIT and run the generated machine code directly from our haskell program, we need to import the generated main function (which does not even exist at this time):

foreign import ccall "dynamic" haskFun :: FunPtr (IO Int64) -> (IO Int64)

And a haskell helper function to call into that foreign function:

run :: FunPtr a -> IO Int64
run fn = haskFun (castFunPtr fn :: FunPtr (IO Int64))

Next comes a big main function. We start by defining some constants:

main :: IO ()
main = do
    let adam        = ASTC.Int 64 67
        adam2       = ASTC.Int 64 10
        weishaupt   = ASTC.Int 64 23
        mul         = ASTC.Int 64 100
        success     = ASTC.Int 64 1723

Next, let’s perform a multiplication and an addition. As I mentioned earlier, because I don’t use a state monad, this is a little awkward. First, we define the instruction itself and then an “unnamed” label to reference it. This would be abstracted away in a real world example as is done in the tutorial I mentioned earlier.

tmp1        = Mul False False (ASTO.ConstantOperand adam)
    (ASTO.ConstantOperand mul) []
nameTmp1    = UnName 1
sum         = Add False False (LocalReference int64type nameTmp1)
    (ASTO.ConstantOperand weishaupt) []
nameSum     = UnName 2
success     = ASTC.Int 64 1723

Now it’s time to define the external print function (putchar(), as defined by stdlibc) and call it.

printFun    = ASTC.GlobalReference printT $ Name "putchar"
printit     = Call Nothing C [] (Right $ ASTO.ConstantOperand printFun)
                    [(LocalReference int64type nameTmp1, [])] [] []
printItLabel = UnName 5
printit2    = Call Nothing C [] (Right $ ASTO.ConstantOperand printFun)
                    [(ASTO.ConstantOperand adam2, [])] [] []
printItLabel2 = UnName 6

Where did all the instructions go? Well, we need to put them all in a list and assign them to their respective “anonymous” labels:

instrs      = [ (nameTmp1 := tmp1), (nameSum := sum)
              , (printItLabel := printit)
              , (printItLabel2 := printit2) ]

Finally, we define the main function:

mainblock   = BasicBlock (Name "hello") instrs
    (UnName 3 := Ret (Just $ ASTO.ConstantOperand success) [])

As you can see, we specify a list of instructions as the body of the function and an instruction that computes the function’s return value.

Now we need to put all that into llvm’s structures.

moduleDef   = GlobalDefinition $ functionDefaults {
                    name        = Name "haskmain"
                  --, linkage     = AvailableExternally
                  , parameters  = ([], False)
                  , returnType  = int64type
                  , basicBlocks = [mainblock] }
printDef    = GlobalDefinition $ functionDefaults {
                    name        = Name "putchar"
                  , linkage     = External
                  , parameters  = ([Parameter int64type (Name "val") []], False)
                  , returnType    = int64type }
fun         = defaultModule { moduleName = "HelloWorld"
                            , moduleDefinitions = [moduleDef, printDef] }

Almost there. The final step is to call llvm, pass it these structures and call the resulting just-in-time compiled function.

withContext $ \context ->
    withModuleFromAST context fun $ \m ->
        withPassManager passes $ \pm -> do
            putStrLn "Verifying IR..."
            verify m
            putStrLn "Optimizing IR..."
            runPassManager pm m
            s <- moduleLLVMAssembly m
            S8.putStrLn s
            putStrLn "Generating object code..."
            optMod <- moduleAST m
            jit context $ \ee ->
                EE.withModuleInEngine ee m $ \eee -> do
                    putStrLn "Loading function into executable memory..."
                    mainfn <- EE.getFunction eee (Name "haskmain")
                    case mainfn of
                        Just fn -> do
                            putStrLn "Running!"
                            res <- run fn
                            putStrLn $ "Retval: " ++ show res
                        Nothing -> do
                            putStrLn "no fun"
            print "ok"

That’s it. Please do contact me with any questions, bug reports or toy languages you have discovered. :-)

Update: I’ve updated the source to the github.