LLite : language friendly literate programming - Luca Bolognese

LLite : language friendly literate programming

Luca -

☕☕☕ 16 min. read

Main ideas

My in­ter­est in lit­er­ate pro­gram­ming comes from some re­al­iza­tions on my part:

  • When I go back to code that I have writ­ten some time ago, I don’t re­mem­ber my rea­son­ing
  • When I write a blog post, my code seems to be bet­ter. Perhaps ex­plain­ing things to peo­ple en­cour­ages me to be more pre­cise
  • I like to think top down, but the com­piler forces me to write code bot­tom up, start­ing from de­tails and go­ing to higher level con­cepts

Unhappiness with ex­ist­ing tools

Many of the ex­ist­ing lit­er­ate pro­gram­ming tools work sim­i­larly to the orig­i­nal CWeb.

  • They have a tan­gle pro­gram that goes over your file and ex­tract some­thing that the com­piler can un­der­stand
  • They have a weave pro­gram that ex­tracts from your file some­thing that the doc­u­ment gen­er­a­tor can un­der­stand

This scheme has the un­for­tu­nate lim­i­ta­tion of break­ing your code ed­i­tor. Given that your file is not a valid code file any­more, the ed­i­tor starts mis­be­hav­ing (i.e. in­tel­lisense breaks). The de­bug­ger starts to get con­fused (albeit peo­ple tried to re­me­di­ate that with cleaver use of #line. If your lan­guage has an in­ter­ac­tive con­sole, that would not work ei­ther.

A dif­fer­ent in­ter­pre­ta­tion

The main idea of this pro­gram is to add your nar­ra­tive to the com­ment part of a code file by ex­tend­ing the com­ment tag (i.e. in C you could use /** ). This keeps ed­i­tor, de­bug­ger and in­ter­ac­tive con­sole work­ing.

The weave phase as been re­tained and what you are read­ing is the pro­gram that goes over your code file and ex­tracts a nicely for­mat­ted (for this pro­gram in markdown’ for­mat) file that can then be trans­lated to HTML, PDF, la­tex, etc…

You got that? The doc­u­ment you are read­ing now is the pro­gram.

Multi-language, multi-doc­u­ment for­mat

LLite works for any pro­gram­ming lan­guage, as­sum­ing it has open and close com­ment char­ac­ter se­quences, and any doc­u­men­ta­tion for­mat, as­sum­ing it has open and close code char­ac­ter se­quences (aka al­lows you to de­lim­i­tate your code some­how), or it needs the code to be in­dented. This doc­u­ment uses mark­down (with Pandoc extensions to gen­er­ate table of con­tents and ti­tles).

Usage

You in­voke the pro­gram as doc­u­mented be­low. The first set of pa­ra­me­ters lets you choose the sym­bols that de­lim­i­tate your lan­guage com­ments (or the de­fault sym­bols be­low). The sec­ond set of pa­ra­me­ters lets you choose how your tar­get doc­u­men­ta­tion lan­guage treats code. Either it de­lim­its it with some sym­bols or it in­dents it.

module LLite

let langParamsTable = [ "fsharp", ("(*" + "*", "*" + "*)") // The + is not to confuse the parser
"c", ("/**", "**/")
"csharp", ("/**", "**/")
"java", ("/**", "**/")] |> Map.ofList

let languages = langParamsTable |> Map.fold (fun state lang _ -> state + lang + " ") ""

let usage = sprintf @"
Usage: llite inputFile parameters
where:
One of the following two sets of parameters is mandatory
-no string : string opening a narrative comment
-nc string : string closing a narrative comment
or
-l language: where language is one of (%s)

One of the following two sets of parameters is mandatory
-co string : string opening a code block
-cc string : string closing a code block
or
-indent N : indent the code by N whitespaces

The following parameters are optional:
-o outFile : defaults to the input file name with mkd extension"
languages


let getLangNoNC lang =
match Map.tryFind lang langParamsTable with
| Some(no, nc) -> no, nc
| None -> failwith (lang + " is not a valid programming language")

Programming Languages lim­i­ta­tions

One of the main tenets of lit­er­ate pro­gram­ming is that the code should be writ­ten in the or­der that fa­cil­i­tates ex­po­si­tion to a hu­man reader, not in the or­der that makes the com­piler happy. This is very im­por­tant.

If you have writ­ten a blog post or tried to ex­plain a code­base to a new joiner, you must have no­ticed that you don’t start from the top of the file and go down, but jump here and there try­ing to bet­ter ex­plain the main con­cepts. Literate pro­gram­ming says that you should write your code the same way. But in our ver­sion of it, the com­piler needs to be kept happy be­cause the lit­er­ate file is the code file.

Some in­ge­nu­ity is re­quired to achieve such goal:

  • In C and C++ you can for­ward de­clare func­tions and classes, also class mem­bers can be in any or­der
  • In C#, Java, VB.NET, F# (the ob­ject ori­ented part) you can write class mem­bers in any or­der
  • In the func­tional part of F# you do have a prob­lem (see later in this doc)

The F# trick be­low is used in the rest of the pro­gram. You’ll un­der­stand its us­age nat­u­rally by just read­ing the code

let declare<'a>  = ref Unchecked.defaultof<'a>

Implementation

At the core, this pro­gram is a sim­ple trans­la­tor that takes some code text and re­turn a valid mark­down/​what­ever text. We need to know:

  • The strings that start and end a nar­ra­tive com­ment (input sym­bols)
  • How to trans­late a code block into a doc­u­ment. We sup­port these vari­a­tions:
    • Indented: in­dent them by N spaces
    • Surrounded by start­Code/​end­Code strings
type CodeSymbols =
| Indent of int // indentation level in whitespaces
| Surrounded of string * string // start code * end code

type Options = {
startNarrative : string
endNarrative : string
codeSymbols : CodeSymbols
}

let translate = declare<Options -> string -> string>

Going over the parse tree

We need a func­tion that takes a string and re­turns a list with the var­i­ous blocks. We can then go over each block, per­form some op­er­a­tions and, in the end, trans­form it back to text

type Block =
| Code of string
| Narrative of string

let blockize = declare<Options -> string -> Block list>

I could have used reg­u­lar ex­pres­sions to parse the pro­gram, but it seemed ugly. I could also have used FsParsec, but that brings with it an ad­di­tional dll. So I de­cided to roll my own parser. This has sev­eral prob­lems:

  • It is prob­a­bly very slow
  • It does­n’t al­low nar­ra­tive com­ments in­side com­ments, in par­tic­u­lar it does­n’t al­low the open­ing com­ment
  • It does­n’t al­low open­ing com­ments in the pro­gram code (not even in­side a string)

The lat­ter in par­tic­u­lar is trou­ble­some. You’ll need to use a trick in the code (i.e. con­cate­nat­ing strings) to foul this pro­gram in not see­ing an open­ing com­ment, but it is in­con­ve­nient.

With all of that, it works.

TODO: con­sider switch­ing to FsParsec

###Lexer

The lexer is go­ing to process list of char­ac­ters. We need func­tions to check if a list of char­ac­ters starts with cer­tain chars and to re­turn the re­main­ing list af­ter hav­ing re­moved such chars.

BTW: these func­tions are poly­mor­phic and tail re­cur­sive

let rec startWith startItems listToCheck =
match startItems, listToCheck with
| [], _ -> true
| _ , [] -> false
| h1::t1, h2::t2 when h1 = h2 -> startWith t1 t2
| _, _ -> false

let rec remove itemsToRemove listToModify =
match itemsToRemove, listToModify with
| [], l -> l
| _ , [] -> failwith "Remove not defined on an empty list"
| h1::t1, h2::t2 when h1 = h2 -> remove t1 t2
| _, _ -> failwith "itemsToRemove are not in the list"

let isOpening options = startWith (List.ofSeq options.startNarrative)
let isClosing options = startWith (List.ofSeq options.endNarrative)
let remainingOpen options = remove (List.ofSeq options.startNarrative)
let remainingClose options = remove (List.ofSeq options.endNarrative)

This is a pretty ba­sic to­k­enizer. It just an­a­lyzes the start of the text and re­turns what it finds. It also keeps track of the line num­ber for the sake of re­port­ing it in the er­ror mes­sage.

let NL = System.Environment.NewLine

type Token =
| OpenComment of int
| CloseComment of int
| Text of string

let tokenize options source =

let startWithNL = startWith (Seq.toList NL)

let rec text line acc = function
| t when isOpening options t -> line, acc, t
| t when isClosing options t -> line, acc, t
| c :: t as full ->
let line' = if startWithNL full then line + 1 else line
text line' (acc + c.ToString()) t
| [] -> line, acc, []
let rec tokenize' line acc = function
| [] -> List.rev acc
| t when isOpening options t -> tokenize' line
(OpenComment(line)::acc) (remainingOpen options t)
| t when isClosing options t -> tokenize' line
(CloseComment(line)::acc) (remainingClose options t)
| t ->
let line, s, t'= text line "" t
tokenize' line (Text(s) :: acc) t'

tokenize' 1 [] (List.ofSeq source)

###Parser

The parse tree is just a list of Chunks, where a chunk can be a piece of nar­ra­tive or a piece of code.

type Chunk =
| NarrativeChunk of Token list
| CodeChunk of Token list

let parse options source =

let rec parseNarrative acc = function
| OpenComment(l)::t ->
failwith ("Don't open narrative comments inside narrative comments at line "
+ l.ToString())
| CloseComment(_)::t -> acc, t
| Text(s)::t -> parseNarrative (Text(s)::acc) t
| [] -> failwith "You haven't closed your last narrative comment"

let rec parseCode acc = function
| OpenComment(_)::t as t' -> acc, t'
| CloseComment(l)::t -> parseCode (CloseComment(l)::acc) t
| Text(s)::t -> parseCode (Text(s)::acc) t
| [] -> acc, []
let rec parse' acc = function
| OpenComment(_)::t ->
let narrative, t' = parseNarrative [] t
parse' (NarrativeChunk(narrative)::acc) t'
| Text(s)::t ->
let code, t' = parseCode [Text(s)] t
parse' (CodeChunk(code)::acc) t'
| CloseComment(l)::t ->
failwith ("Don't insert a close narrative comment at the start of your program at line "
+ l.ToString())
| [] -> List.rev acc

parse' [] (List.ofSeq source)

###Flattener

The flat­ten­ing part of the al­go­rithm is a bit un­usual. At this point we have a parse tree that con­tains to­kens, but we want to re­duce it to two sim­ple node types con­tain­ing all the text in string form.

TODO: con­sider man­ag­ing nested com­ments and com­ments in strings (the lat­ter has to hap­pen in ear­lier phases)

let flatten options chunks =
let tokenToStringNarrative = function
| OpenComment(l) | CloseComment(l) -> failwith ("Narrative comments cannot be nested at line "
+ l.ToString())
| Text(s) -> s

let tokenToStringCode = function
| OpenComment(l) -> failwith ("Open narrative comment cannot be in code at line"
+ l.ToString()) +
". Perhaps you have an open comment in" +
" a code string before this comment tag?"
| CloseComment(_) -> string(options.endNarrative |> Seq.toArray)
| Text(s) -> s

let flattenChunk = function
| NarrativeChunk(tokens) ->
Narrative(tokens |> List.fold (fun state token -> state + tokenToStringNarrative token) "")
| CodeChunk(tokens) ->
Code(tokens |> List.fold (fun state token -> state + tokenToStringCode token) "")

chunks |> List.fold (fun state chunk -> flattenChunk chunk :: state) [] |> List.rev

We are get­ting there, now we have a list of blocks we can op­er­ate upon

blockize := fun options source -> source |> tokenize options |> parse options |> flatten options

Narrative com­ments phases

Each phase is a func­tion that takes the op­tions and a block list and re­turns a block list that has been processed’ in some way.

type Phase = Options -> Block List -> Block List

let removeEmptyBlocks = declare<Phase>
let mergeBlocks = declare<Phase>
let addCodeTags = declare<Phase>

let processPhases options blockList =
blockList
|> !removeEmptyBlocks options
|> !mergeBlocks options
|> !addCodeTags options

We want to man­age how many new­lines there are be­tween dif­fer­ent blocks, be­cause we don’t trust the pro­gram­mer to have a good view of how many new­line to keep from com­ment blocks and code blocks. We’ll trim all new­lines from the start and end of a block, and then add our own.

let newLines = [|'\n';'\r'|]

type System.String with
member s.TrimNl () = s.Trim(newLines)

###Remove empty blocks

There might be empty blocks (i.e. be­tween two con­sec­u­tive com­ment blocks) in the file. For the sake of for­mat­ting the file beau­ti­fully, we want to re­move them.

let extract = function
| Code(text) -> text
| Narrative(text) -> text

removeEmptyBlocks := fun options blocks ->
blocks |> List.filter (fun b -> (extract b).TrimNl().Trim() <> "")

###Merge blocks

Consecutive blocks of the same kind need to be merged, for the sake of for­mat­ting the over­all text cor­rectly.

TODO: make tail re­cur­sive

let rec mergeBlockList = function
| [] -> []
| [a] -> [a]
| h1::h2::t -> match h1, h2 with
| Code(t1), Code(t2) -> mergeBlockList (Code(t1 + NL + t2)::t)
| Narrative(n1), Narrative(n2) -> mergeBlockList(Narrative(n1 + NL + n2)::t)
| _, _ -> h1::mergeBlockList(h2::t)

mergeBlocks := fun options blocks -> mergeBlockList blocks

###Adding code tags

Each code block needs a tag at the start and one at the end or it needs to be in­dented by N chars.

let indent n (s:string) =
let pad = String.replicate n " "
pad + s.Replace(NL, NL + pad)

addCodeTags := fun options blocks ->
match options.codeSymbols with
| Indent(n) ->
blocks |> List.map (function Narrative(s) as nar -> nar | Code(s) -> Code(indent n s))
| Surrounded(s, e) ->
blocks |> List.map (function
| Narrative(text) -> Narrative(NL + text.TrimNl() + NL)
| Code(text) -> Code(NL + s + NL + text.TrimNl() + NL + e + NL))

###Flatten again

Once we have the ar­ray of blocks, we need to flat­ten them (transform them in a sin­gle string), which is triv­ial, and then fi­nally im­ple­ment our orig­i­nal trans­late func­tion.

let sumBlock s b2 = s + extract b2

let flattenB blocks = (blocks |> List.fold sumBlock "").TrimStart(newLines)

translate := fun options text -> text |> !blockize options |> processPhases options |> flattenB

Parsing com­mand line ar­gu­ments

Parsing com­mand lines in­volves writ­ing a func­tion that goes from a se­quence of strings to an in­put file name, out­put file name and Options record

let parseCommandLine = declare<string array -> string * string * Options>

To im­ple­ment it, we are go­ing to use a com­mand line parser taken from here. The parseArgs func­tion takes a se­quence of ar­gu­ment val­ues and map them into a (name,value) tu­ple. It scans the tu­ple se­quence and put com­mand name into all sub­se­quent tu­ples with­out name and dis­card the ini­tial (“”,“”) tu­ple. It then groups tu­ples by name and con­verts the tu­ple se­quence into a map of (name,value seq)

For now, I don’t need the value seq’ part as all my pa­ra­me­ters take a sin­gle ar­gu­ment, but I left it in there in case I will need to pass mul­ti­ple args later on.

open  System.Text.RegularExpressions

let (|Command|_|) (s:string) =
let r = new Regex(@"^(?:-{1,2}|\/)(?<command>\w+)[=:]*(?<value>.*)$",RegexOptions.IgnoreCase)
let m = r.Match(s)
if m.Success
then
Some(m.Groups.["command"].Value.ToLower(), m.Groups.["value"].Value)
else
None


let parseArgs (args:string seq) =
args
|> Seq.map (fun i ->
match i with
| Command (n,v) -> (n,v) // command
| _ -> ("",i) // data
)
|> Seq.scan (fun (sn,_) (n,v) -> if n.Length>0 then (n,v) else (sn,v)) ("","")
|> Seq.skip 1
|> Seq.groupBy (fun (n,_) -> n)
|> Seq.map (fun (n,s) -> (n, s |> Seq.map (fun (_,v) -> v) |> Seq.filter (fun i -> i.Length>0)))
|> Map.ofSeq

let paramRetrieve (m:Map<string,_>) (p:string) =
if Map.containsKey p m
then Some(m.[p])
else None

This is the main logic of pa­ra­me­ter pass­ing. Note that we give prece­dence to the -l and -indent pa­ra­me­ters, if pre­sent.

This is a func­tion that goes from the map of com­mand line pa­ra­me­ters to the in­put file name, out­put file name and op­tions. With that we can fi­nally de­fine the orig­i­nal par­seC­om­man­d­Line.

let safeHead errMsg s = if s |> Seq.isEmpty then failwith errMsg else s |> Seq.head 

let paramsToInputs paramsMap =
let single p er = match paramRetrieve paramsMap p with | Some(k) -> Some(k |> safeHead er)
| None -> None
let get p s = match paramRetrieve paramsMap p with |Some(k) -> k |> safeHead s
| None -> failwith s
let defaultP p q er = match paramRetrieve paramsMap p with | Some(k) -> k |> safeHead er
| None -> q

let inputFile = get "" "You need to pass an input file"
let outputFile = defaultP "o"
(System.IO.Path.GetFileNameWithoutExtension(inputFile:string) + ".mkd")
"You must pass a parameter to -o"

let no, nc = match single "l" "You must pass a language parameter to -l" with
| Some(l) -> getLangNoNC l
| None ->
get "no" "The no (narrative open) parameter is mandatory, if no -l specified",
get "nc" "The nc (narrative close) parameter is mandatory, if no -l specified"

let codeSymbs = match single "indent" "You must pass a whitespace indentation number to -indent" with
| Some(n) ->
let success, value = System.Int32.TryParse n
if success
then Indent(value)
else failwith "-i accepts just an integer value as parameter"
| None ->
match single "l" "You must pass a language parameter to -l" with
| Some(l) -> Surrounded("~~~" + l,"~~~")
| None ->
Surrounded(
get "co" "The co (code open) parameter is mandatory, if no -indent specified",
get "cc" "The cc (code close) parameter is mandatory")
inputFile, outputFile, {
startNarrative = no
endNarrative = nc
codeSymbols = codeSymbs
}

parseCommandLine := parseArgs >> paramsToInputs

Main method

We can then write main as the only side ef­fect func­tion in the pro­gram. Here is where the IO monad would live …

let banner  = "LLite : language friendly literate programming\n"

let printMemory () =
let bytesInMeg = 1048576.
let peak = System.Diagnostics.Process.GetCurrentProcess().PeakWorkingSet64
let work = System.Diagnostics.Process.GetCurrentProcess().WorkingSet64
printfn "Peak working set: %A" ((float) peak / bytesInMeg)
printfn "Working set: %A" ((float) peak / bytesInMeg)

let myMain args =
try

let inputFile, outputFile, options = !parseCommandLine args
let input = System.IO.File.ReadAllText inputFile
let output = !translate options input
System.IO.File.WriteAllText (outputFile, output)

//printMemory ()
0
with
| e ->
printfn "%s" "Failure"
printfn "%s" e.Message
printfn "%s" usage
#if DEBUG
printfn "\nDetailed Error Below:\n%A" e
#endif
-1

An aside: for­ward de­clar­ing func­tions in F#

A sim­ple so­lu­tion

You can achieve some­thing some­how sim­i­lar to for­ward de­c­la­ra­tion by the declare dirty trick used in this pro­gram. When­ever you want to do a for­ward de­c­la­ra­tion of a func­tion , or vari­able, you can type:

let testDeclare() =

let add = declare<float -> float>

let ``function where I want to use add without having defined it`` nums = nums |> Seq.map !add

This cre­ates a ref to a func­tion from float to float. It looks a bit like an Haskell type de­c­la­ra­tion. You can then use such func­tion as if it were ac­tu­ally de­fine and de­lay his de­f­i­n­i­tion to a later point in time when you are ready to ex­plain it.

When you are ready to talk about it, you can then de­fine it with:

    add := fun x -> x + 1.

The syn­tax is not too bad. You get that of­ten-sought Haskell like ex­plicit type de­c­la­ra­tion and you can regex the code­base to cre­ate an in­dex at the end of the pro­gram (maybe).

But is it too slow? After all, there is one more in­di­rec­tion call for each func­tion call.

Let’s test it: en­able #time in F# in­ter­ac­tive and ex­e­cute timeNor­malF and timeIndi­rectF vary­ing sleep­Time and how­ManyIter un­til you con­vince your­self that it is ok (or not).

    let sleepTime   = 50
let howManyIter = 100
let normalF x = System.Threading.Thread.Sleep sleepTime
let indirectF = declare<int -> unit>
indirectF := fun x -> System.Threading.Thread.Sleep sleepTime

let timeNormalF = [1..howManyIter] |> List.iter normalF
let timeIndirectF = [1..howManyIter] |> List.iter !indirectF
()

A cor­rect so­lu­tion (but ugly)

Unfortunately, there is a big prob­lem with all of the above: it does­n’t work with generic func­tions and cur­ried func­tion in­vo­ca­tions. The code be­low works in all cases, but it is ugly for the user to use. In this pro­gram I’ve used the beau­ti­ful, but in­cor­rect, syn­tax.

type Literate() =
static member Declare<'a, 'b> (ref : obj ref) (x : 'a) : 'b =
unbox <| (unbox<obj -> obj> !ref) x
static member Define<'a, 'b> (func : 'a -> 'b) (ref : obj ref) (f : 'a -> 'b) =
ref := box (unbox<'a> >> f >> box)

// Declaration
let rec id (x : 'a) : 'a = Literate.Declare idImpl x
and idImpl = ref null

// Usage
let f () = id 100 + id 200

// Definition
Literate.Define id idImpl (fun x -> x)

The tra­di­tional way

The tra­di­tional way of do­ing some­thing like this is by pass­ing the func­tion as a pa­ra­me­ter in a man­ner sim­i­lar to the be­low.

// Declaration and usage intermingled
let calculate' (add: int -> int -> int) x y = add x y * add x y

// Definition
let add x y = x + y

let calculate = calculate' add

To my way of see­ing, this is the most cum­ber­some so­lu­tion and con­cep­tu­ally dis­hon­est be­cause you are not pa­ram­e­triz­ing your func­tion for tech­ni­cal rea­sons, but just for the sake of ex­plain­ing things. In a way, you are chang­ing the sig­na­ture of your func­tions for the sake of writ­ing a book. That can’t be right …

[<EntryPoint>]
let main args = myMain args
0 Webmentions

These are webmentions via the IndieWeb and webmention.io.