I'm working through Types and Programming Languages, by Benjamin Pierce.
I'm up to somewhere around chapter 13, References, but it's starting not to make sense. Which means it's time to back up and do more of the work, instead of just nodding as though I really understand it.
One of the things he does is build typecheckers for the languages he describes, in the language ocaml, or Objective Caml, a version of the language ML. For a variety of reasons, I'm trying to implement in Haskell. Mostly to give me something concrete to work on that isn't too large in scope to learn the language a little bit. aSomething more than 'Hello, World!', strip_spaces, or traverse a linked list, but less than a 'real application'.
For those not familiar, a typechecker is somewhere between a compiler and a grqammar driven parser. A type checker inspects what you give it for type correctness and consitency. It makes sure that you don't assign a Foo to a Bar, unless there's a rule in the typesystem that says you can. It may, in the process of typechecking, do some steps that a compiler also does, like reduce expressions, but it does this in the interest of determining what the type, not the value, is. Of course, if I were writing a compiler, it would make sense not to throw away that information, and do a bit of both at once.
That does lead me to a bit of a sub-rant. The first step in the process I'm working on is parsing the textual expression. Which means using a parsing library. (Pierce does, so it isn't cheating) Haskell has two; happy, a yacc analogue, and parsec, a 'monadic parser combinator' library. Since the point of doing this in Haskell is to get a better idea what phrases like 'monadic parser combinator' libraries mean, I was a bit biased towards Parsec. I already know and loathe yacc.
So I start in on the documentation. At least it has some. That's a nice benfit of the fact that Haskell grew up in the acadamic community. They need to publish or perish, and the publication serves as documentation. Somewhat, sort of. Although Parsec doesn't really suffer in that respect. The docs are pretty clear. They just suffer from the same problem that all parser lib docs do. They want to show that you can implement an entire compiler inside the parser.
And that's usually a bad idea.
The docs show you how you can attach interesting semantic actions to events in the parser, like evaluating the expression that's currently being parsed. However, in practice, that's hardly ever what you want to do. You want the parse to return something that abstracts the textual expression into a data structure, usually some kind of abstract syntax tree, where the nodes in the tree reflect elements in the grammar. Then you can write other stuff that accepts the AST and processes it in interesting ways, like compiling or typechecking it.
That's certainly what Pierce's code does in ML. And I'm trying to avoid being too inventive.
In any case, it turned out to be pretty trivial to return an AST from a Parsec parser, and in fact, all the examples of real parsers that come with Parsec take that approach. Which gave me some comfort about being on the right track.
Now, the arith language that we're starting with is pretty primitive. It has booleans and non-negative integers, aka natural numbers . And the latter are all of the form successor 0' or 'successor successor 0', meaning 1 and 2, respectively. Not a real language, but a place to start. Complicated enough that a few theorems could be proved non-trivially, but not so complicated you couldn't easily work everything by hand.
The language's syntax can be described with the following grammar
t ::=
true
false
if t then t else t
0
succ t
pred t
isZero t
data ArithExpr
= TmTrue
| TmFalse
| TmIfExpr ArithExpr ArithExpr ArithExpr
| TmZero
| TmSucc ArithExpr
| TmPred ArithExpr
| TmIsZero ArithExpr
deriving (Show, Eq)
arithExpr :: Parser ArithExpr
arithExpr =
trueExpression
falseExpression
ifExpression
zeroExpression
succExpression
predExpression
isZeroExpression
parens arithExpr
"expression"
trueExpression =
do{ reserved "true"
; return TmTrue
}
falseExpression =
do{ reserved "false"
; return TmFalse
}
ifExpression :: Parser ArithExpr
ifExpression =
do{ reserved "if"
; condition <- arithExpr
; reserved "then"
; thenClause <- arithExpr
; reserved "else"
; elseClause <- arithExpr
; return (TmIfExpr condition thenClause elseClause)
}
t ::=
true
false
if t then t else t
v ::=
true
false
if true then t2 else t3 -> t2
if false then t2 else t3 -> t3
t1 -> t1'
----------
if t1 then t2 else t3 -> if t1' then t2 else t3
t ::= ...
0
succ t
pred t
iszero t
v ::= ...
nv
nv ::=
0
succ nv
t1 -> t1'
---------
succ t1 -> succ t1'
pred 0 -> 0
pred (succ nv1) -> nv1
t1 -> t1'
--------
pred t1 -> pred t1'
iszero 0 -> true
iszero (succ nv1) -> false
t1 -> t1'
--------
iszero t1 -> iszero t1'
eval1 :: ArithExpr -> Maybe ArithExpr
eval1 (TmIfExpr TmTrue t _) = Just t
eval1 (TmIfExpr TmFalse _ t) = Just t
eval1 (TmIfExpr t1 t2 t3) =
let t1' = eval1 t1
in case t1' of
{ Just t1'' -> Just $ TmIfExpr t1'' t2 t3
; Nothing -> Nothing
}
eval t =
let t' = eval1 t
in case t' of
{ Just t'' -> eval t''
; Nothing -> t
}
Here's all the code so far:
ArithParser.hs
module ArithParser ( parseArith
, parseArithFromFile
, arithExpr
, ParseError
) where
import Char
import Monad
import Arith
-- Parsec
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellStyle)
parseArithFromFile :: String -> IO (Either ParseError ArithExpr)
parseArithFromFile fname =
parseFromFile arithExpr fname
parseArith sourceName source =
parse arithExpr sourceName source
arithExpr :: Parser ArithExpr
arithExpr =
trueExpression
falseExpression
<|> ifExpression
<|> zeroExpression
<|> succExpression
<|> predExpression
<|> isZeroExpression
<|> parens arithExpr
> "expression"
trueExpression =
do{ reserved "true"
; return TmTrue
}
falseExpression =
do{ reserved "false"
; return TmFalse
}
zeroExpression :: Parser ArithExpr
zeroExpression =
do{ reserved "0"
; return TmZero
}
ifExpression :: Parser ArithExpr
ifExpression =
do{ reserved "if"
; condition <- arithExpr
; reserved "then"
; thenClause <- arithExpr
; reserved "else"
; elseClause <- arithExpr
; return (TmIfExpr condition thenClause elseClause)
}
succExpression =
do{ reserved "succ"
; expr <- arithExpr
; return (TmSucc expr)
}
predExpression =
do{ reserved "pred"
; expr <- arithExpr
; return (TmPred expr)
}
isZeroExpression =
do{ reserved "iszero"
; expr <- arithExpr
; return (TmIsZero expr)
}
-----------------------------------------------------------
-- Tokens
-- Use qualified import to have token parsers on toplevel
-----------------------------------------------------------
tokenParser = P.makeTokenParser haskellStyle
parens = P.parens tokenParser
braces = P.braces tokenParser
semiSep1 = P.semiSep1 tokenParser
whiteSpace = P.whiteSpace tokenParser
symbol = P.symbol tokenParser
identifier = P.identifier tokenParser
reserved = P.reserved tokenParser
natural = P.natural tokenParser
charLiteral = P.charLiteral tokenParser
stringLiteral = P.stringLiteral tokenParser
Arith.hs
module Arith where
data ArithExpr
= TmTrue
| TmFalse
| TmIfExpr ArithExpr ArithExpr ArithExpr
| TmZero
| TmSucc ArithExpr
| TmPred ArithExpr
| TmIsZero ArithExpr
deriving (Show, Eq)
isNumericalVal :: ArithExpr -> Bool
isNumericalVal TmZero = True
isNumericalVal (TmSucc t) = isNumericalVal t
isNumericalVal (TmPred t) = isNumericalVal t
isNumericalVal _ = False
isVal :: ArithExpr -> Bool
isVal TmTrue = True
isVal TmFalse = True
isVal t
| isNumericalVal t = True
| not (isNumericalVal t) = False
isVal _ = False
eval1 :: ArithExpr -> Maybe ArithExpr
eval1 (TmIfExpr TmTrue t _) = Just t
eval1 (TmIfExpr TmFalse _ t) = Just t
eval1 (TmIfExpr t1 t2 t3) =
let t1' = eval1 t1
in case t1' of
{ Just t1'' -> Just $ TmIfExpr t1'' t2 t3
; Nothing -> Nothing --Just $ TmIfExpr t1 t2 t3
}
eval1 (TmSucc t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmSucc t''
; Nothing -> Nothing --Just $ TmSucc t
}
eval1 (TmPred TmZero) = Just TmZero
eval1 (TmPred (TmSucc t))
| isNumericalVal t = Just t
eval1 (TmPred t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmPred t''
; Nothing -> Nothing -- Just $ TmPred t
}
eval1 (TmIsZero TmZero) = Just TmTrue
eval1 (TmIsZero (TmSucc t))
| isNumericalVal t = Just TmFalse
eval1 (TmIsZero t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmIsZero t''
; Nothing -> Nothing -- Just $ TmIsZero t
}
eval1 _ = Nothing
eval t =
let t' = eval1 t
in case t' of
{ Just t'' -> eval t'' --if (t /= t'') then eval t'' else t
; Nothing -> t
}