Adventure in parserland – parsing lambda expressions in F# – Part IV - Luca Bolognese

Adventure in parserland – parsing lambda expressions in F# – Part IV

Luca -

☕ 3 min. read

Let’ now look at the parser. First let’s re­view the gram­mar:

(*
        <expression> ::= <name> | <function> | <application>
        <name> ::= non­blank character sequence
        <function> ::= \ <name> . <body>
        <body> ::= <expression>
        <application> ::= ( <function expression> <argument expression> )
        <function expression> ::= <expression>
        <argument expression> ::= <expression>
    *)

And the data type to rep­re­sent it:

type Name = string
and Body = Expression
and Function = Name * Expression
and FunctionExpression = Expression
and ArgumentExpression = Expression
and Expression =
| EName of string
| Function of Expression * Body
| Application of FunctionExpression * ArgumentExpression
| EOT

In essence, the data type need to store all the in­for­ma­tion needed for sub­se­quent stages of com­pu­ta­tion (i.e. beta re­duc­tions and such). The closer it is to the gram­mar, the bet­ter. In this case it looks pretty close.

Remember what is the main goal of our parser:

let parseTextReader: TextReader -> seq<Expression> =
                    textReaderToLazyList >> tokenStream >> parseExpressions

We have al­ready looked at TextReaderToLazyList and to­ken­Stream. Now it is the time to look at parse­Ex­pres­sions. It’s goal is to  parse the LazyList and re­turn a se­quence of ex­pres­sions. The choice of re­turn­ing a se­quence at this point is to make the parse­Tex­tReader, which is the main func­tion in the pro­gram, re­turn a more standard’ type.

and parseExpressions tokens = seq {
   let tokens = parseOptionalWs tokens
   let expr, tokens = parseExpr tokens
   let tokens = parseOptionalWs tokens
   match expr with
    | EOT   -> yield EOT
    | exp   -> yield exp; yield! parseExpressions tokens }

parseO­tion­alWs sim­ply skips ahead what­ever white­spaces it finds.

and parseOptionalWs tokens = match tokens with
                                | LazyList.Nil -> LazyList.empty
                                | LazyList.Cons(h, t) ->
                                    match h with
                                       | Ws _ -> parseOptionalWs t
                                       | _ -> tokens

parse­Expr is more in­ter­est­ing. It is the main switch that cre­ates ex­pres­sion kinds.

let rec parseExpr tokens = match tokens with
                            | LazyList.Nil -> EOT, LazyList.empty
                            | LazyList.Cons(h, t) ->
                                match h with
                                    | EOF -> parseEOF tokens
                                    | Name _ -> parseName  tokens
                                    | Lambda -> parseFunction tokens
                                    | OpenParens -> parseApplication tokens
                                    | token -> errorAtStart "Expression" token

parseEOF is not.

and parseEOF tokens = EOT, LazyList.empty

parse­Name just re­turns a EName, un­wrap­ping it from Name.

and parseName tokens = EName (head tokens |> unwrapName), tail tokens

Unwrap just un­wraps it.

let unwrapName = function
    | Name(s) -> s
    | tok -> errorExpecting "a Name" <| writeToken tok

parse­Func­tion just con­umes a Lambda, a name, a Dot to­ken, a body (i.e. \x.x)and as­sem­bles them in a Function:

and parseFunction tokens =
    let tokens = consumeToken Lambda tokens
    let name, tokens = parseName tokens
    let tokens = consumeToken Dot tokens
    let body, tokens = parseExpr tokens
    Function(name, body), tokens

con­sume­To­ken tries to con­sume a to­ken gen­er­at­ing an er­ror if it does­n’t find it:

let consumeToken token =
    genericConsumeToken (fun token' _ -> errorExpecting (writeToken token') (writeToken token)) token

gener­ic­Consume­To­ken is just a gen­er­al­iza­tion of the func­tion above:

let genericConsumeToken noMatch token = function
    | LazyList.Nil -> LazyList.empty
    | LazyList.Cons(h, t) as originalTokens ->
        match h with
        | tok when tok = token -> t
        | tok -> noMatch token originalTokens

The last thing left to con­sume is an ap­pli­ca­tion which is in this form (func args):

and parseApplication tokens =
    let tokens = consumeToken OpenParens tokens
    let funExpr, tokens = parseExpr tokens
    let tokens = parseOptionalWs tokens
    let argExpr, tokens = parseExpr tokens
    let tokens = consumeToken CloseParens tokens
    Application(funExpr, argExpr), tokens

Various er­ror and util­ity func­tions are de­fined be­low:

let errorEOF expecting = failwith  ("Expected " + expecting + ", got EOF")
let errorExpecting expecting gotToken = failwith ("Expected " + expecting + ", got" + gotToken)
let errorAtStart expecting gotToken = failwith ("Expected " + expecting + " which cannot start with" + writeToken gotToken)
let tail = LazyList.tail
let head = LazyList.head

And that is the parser. All 100+ lines of it. As you can tell it is rather for­mu­laic to go from a gram­mar to a lexer and a parser, which is why you should­n’t do it, but in­stead let a tool gen­er­ate the code for you given the gram­mar or use FParsec.

We have writ­ten 200+ code and I don’t think we can be too proud of our achieve­ment. It is:

  • Certainly buggy
  • Primitive in er­ror han­dling
  • Not tail re­cur­sive (big text is likely to blow up our stack)
  • Probably in­ef­fi­cient

So let’s look next at a bet­ter way to do it.

0 Webmentions

These are webmentions via the IndieWeb and webmention.io.