Packrat parsing (memoization via laziness) in OCaml - parsing

I'm implementing a packrat parser in OCaml, as per the Master Thesis by B. Ford. My parser should receive a data structure that represents the grammar of a language and parse given sequences of symbols.
I'm stuck with the memoization part. The original thesis uses Haskell's lazy evaluation to accomplish linear time complexity. I want to do this (memoization via laziness) in OCaml, but don't know how to do it.
So, how do you memoize functions by lazy evaluations in OCaml?
EDIT: I know what lazy evaluation is and how to exploit it in OCaml. The question is how to use it to memoize functions.
EDIT: The data structure I wrote that represents grammars is:
type ('a, 'b, 'c) expr =
| Empty of 'c
| Term of 'a * ('a -> 'c)
| NTerm of 'b
| Juxta of ('a, 'b, 'c) expr * ('a, 'b, 'c) expr * ('c -> 'c -> 'c)
| Alter of ('a, 'b, 'c) expr * ('a, 'b, 'c) expr
| Pred of ('a, 'b, 'c) expr * 'c
| NPred of ('a, 'b, 'c) expr * 'c
type ('a, 'b, 'c) grammar = ('a * ('a, 'b, 'c) expr) list
The (not-memoized) function that parse a list of symbols is:
let rec parse g v xs = parse' g (List.assoc v g) xs
and parse' g e xs =
match e with
| Empty y -> Parsed (y, xs)
| Term (x, f) ->
begin
match xs with
| x' :: xs when x = x' -> Parsed (f x, xs)
| _ -> NoParse
end
| NTerm v' -> parse g v' xs
| Juxta (e1, e2, f) ->
begin
match parse' g e1 xs with
| Parsed (y, xs) ->
begin
match parse' g e2 xs with
| Parsed (y', xs) -> Parsed (f y y', xs)
| p -> p
end
| p -> p
end
( and so on )
where the type of the return value of parse is defined by
type ('a, 'c) result = Parsed of 'c * ('a list) | NoParse
For example, the grammar of basic arithmetic expressions can be specified as g, in:
type nt = Add | Mult | Prim | Dec | Expr
let zero _ = 0
let g =
[(Expr, Juxta (NTerm Add, Term ('$', zero), fun x _ -> x));
(Add, Alter (Juxta (NTerm Mult, Juxta (Term ('+', zero), NTerm Add, fun _ x -> x), (+)), NTerm Mult));
(Mult, Alter (Juxta (NTerm Prim, Juxta (Term ('*', zero), NTerm Mult, fun _ x -> x), ( * )), NTerm Prim));
(Prim, Alter (Juxta (Term ('<', zero), Juxta (NTerm Dec, Term ('>', zero), fun x _ -> x), fun _ x -> x), NTerm Dec));
(Dec, List.fold_left (fun acc d -> Alter (Term (d, (fun c -> int_of_char c - 48)), acc)) (Term ('0', zero)) ['1';'2';'3';])]

The idea of using lazyness for memoization is use not functions, but data structures, for memoization. Lazyness means that when you write let x = foo in some_expr, foo will not be evaluated immediately, but only as far as some_expr needs it, but that different occurences of xin some_expr will share the same trunk: as soon as one of them force computation, the result is available to all of them.
This does not work for functions: if you write let f x = foo in some_expr, and call f several times in some_expr, well, each call will be evaluated independently, there is not a shared thunk to store the results.
So you can get memoization by using a data structure instead of a function. Typically, this is done using an associative data structure: instead of computing a a -> b function, you compute a Table a b, where Table is some map from the arguments to the results. One example is this Haskell presentation of fibonacci:
fib n = fibTable !! n
fibTable = [0,1] ++ map (\n -> fib (n - 1) + fib (n - 2)) [2..]
(You can also write that with tail and zip, but this doesn't make the point clearer.)
See that you do not memoize a function, but a list: it is the list fibTable that does the memoization. You can write this in OCaml as well, for example using the LazyList module of the Batteries library:
open Batteries
module LL = LazyList
let from_2 = LL.seq 2 ((+) 1) (fun _ -> true)
let rec fib n = LL.at fib_table (n - 1) + LL.at fib_table (n - 2)
and fib_table = lazy (LL.Cons (0, LL.cons 1 <| LL.map fib from_2))
However, there is little interest in doing so: as you have seen in the example above, OCaml does not particularly favor call-by-need evaluation -- it's reasonable to use, but not terribly convenient as it was forced to be in Haskell. It is actually equally simple to directly write the cache structure by direct mutation:
open Batteries
let fib =
let fib_table = DynArray.of_list [0; 1] in
let get_fib n = DynArray.get fib_table n in
fun n ->
for i = DynArray.length fib_table to n do
DynArray.add fib_table (get_fib (i - 1) + get_fib (i - 2))
done;
get_fib n
This example may be ill-chosen, because you need a dynamic structure to store the cache. In the packrat parser case, you're tabulating parsing on a known input text, so you can use plain arrays (indexed by the grammar rules): you would have an array of ('a, 'c) result option for each rule, of the size of the input length and initialized to None. Eg. juxta.(n) represents the result of trying the rule Juxta from input position n, or None if this has not yet been tried.
Lazyness is a nice way to present this kind of memoization, but is not always expressive enough: if you need, say, to partially free some part of your result cache to lower memory usage, you will have difficulties if you started from a lazy presentation. See this blog post for a remark on this.

Why do you want to memoize functions? What you want to memoize is, I believe, the parsing result for a given (parsing) expression and a given position in the input stream. You could for instance use Ocaml's Hashtables for that.

The lazy keyword.
Here you can find some great examples.
If it fits your use case, you can also use OCaml streams instead of manually generating thunks.

Related

How do I pattern match with Data.Text in Haskell?

I am currently in the process of writing a parser in Haskell. I have the following code.
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text
newtype Parser a = Parser { runParser :: Text -> Either Text (Text, a) }
char1 :: Char -> Parser Char
char1 c = Parser $ \case
(x:xs) | x == c -> Right (xs, x)
_ -> Left "Unexpected character"
It fails to compile with these two errors.
test.hs:12:6: error:
• Couldn't match expected type ‘Text’ with actual type ‘[Char]’
• In the pattern: x : xs
In a case alternative: (x : xs) | x == c -> Right (xs, x)
In the second argument of ‘($)’, namely
‘\case
(x : xs) | x == c -> Right (xs, x)
_ -> Left "Unexpected character"’
|
12 | (x:xs) | x == c -> Right (xs, x)
| ^^^^
test.hs:12:24: error:
• Couldn't match type ‘[Char]’ with ‘Text’
Expected type: Either Text (Text, Char)
Actual type: Either Text ([Char], Char)
• In the expression: Right (xs, x)
In a case alternative: (x : xs) | x == c -> Right (xs, x)
In the second argument of ‘($)’, namely
‘\case
(x : xs) | x == c -> Right (xs, x)
_ -> Left "Unexpected character"’
|
12 | (x:xs) | x == c -> Right (xs, x)
| ^^^^^^^^^^^^^
I can fix the error by replacing the Text data type with String but I would prefer to use the Text data type.
Is there a way to pattern match with the Data.Text type without first explicitly converting it to a string first? Perhaps there is a GHC extension that would allow me to do this?
Thanks in advance.
A a refinement to #DanielWagner's answer, you can combine view patterns and pattern synonyms to do this. You'll need a new constructor in place of :, but it might look like:
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE ViewPatterns #-}
import Data.Text
pattern x :> xs <- (uncons -> Just (x, xs))
pattern Empty <- (uncons -> Nothing)
findInText :: (Char -> Bool) -> Text -> Maybe Char
findInText _ Empty = Nothing
findInText p (x :> xs) | p x = Just x
| otherwise = findInText p xs
The idea here is that a pattern x :> xs is a synonym for the pattern uncons -> Just (x, xs) which is a view pattern that operates by applying uncons to the scrutinee and pattern-matching the result with Just (x, xs) to population x and xs for the parent pattern.
As per the comment, there might be some concern about whether this usage ends up calling uncons more than once. With optimization entirely shut off (-O0), the generated core does have multiple uncons calls:
-- unoptimized -O0
findInText
= \ ds ds1 ->
case uncons ds1 of {
Nothing -> Nothing;
Just ipv ->
case uncons ds1 of {
Nothing -> ...
With optimization on (-O or -O2), everything gets inlined and the generated core is incredibly complicated because of the Unicode processing going on. However, if you also define:
findInText' :: (Char -> Bool) -> Text -> Maybe Char
findInText' p txt = case uncons txt of
Nothing -> Nothing
Just (x, xs) | p x -> Just x
| otherwise -> findInText' p xs
it turns out that GHC compiles findInText' to:
findInText' = findInText
so it looks like in this case at least, GHC doesn't do any extra work as a result of the view patterns.
You can match on a call to uncons.
case uncons text of
Just (x, xs) -> ...
Nothing -> ...
View patterns let you do this within the pattern instead of within the scrutinee, but require you to say uncons once for each pattern.
case text of
(uncons -> Just (x, xs)) -> ...
(uncons -> Nothing) -> ...

Taking two streams and combining them in OCaml

I want to take two streams of integers in increasing order and combine them into one stream that contains no duplicates and should be in increasing order. I have defined the functionality for streams in the following manner:
type 'a susp = Susp of (unit -> 'a)
let force (Susp f) = f()
type 'a str = {hd : 'a ; tl : ('a str) susp }
let merge s1 s2 = (* must implement *)
The first function suspends computation by wrapping a computation within a function, and the second function evaluates the function and provides me with the result of the computation.
I want to emulate the logic of how you go about combining lists, i.e. match on both lists and check which elements are greater, lesser, or equal and then append (cons) the integers such that the resulting list is sorted.
However, I know I cannot just do this with streams of course as I cannot traverse it like a list, so I think I would need to go integer by integer, compare, and then suspend the computation and keep doing this to build the resulting stream.
I am at a bit of a loss how to implement such logic however, assuming it is how I should be going about this, so if somebody could point me in the right direction that would be great.
Thank you!
If the the input sequences are sorted, there is not much difference between merging lists and sequences. Consider the following merge function on lists:
let rec merge s t =
match s, t with
| x :: s , [] | [], x :: s -> x :: s
| [], [] -> s
| x :: s', y :: t' ->
if x < y then
x :: (merge s' t)
else if x = y then
x :: (merge s' t')
else
y :: (merge s t')
This function is only using two properties of lists:
the ability to split the potential first element from the rest of the list
the ability to add an element to the front of the list
This suggests that we could rewrite this function as a functor over the signature
module type seq = sig
type 'a t
(* if the seq is non-empty we split the seq into head and tail *)
val next: 'a t -> ('a * 'a t) option
(* add back to the front *)
val cons: 'a -> 'a t -> 'a t
end
Then if we replace the pattern matching on the list with a call to next, and the cons operation with a call to cons, the previous function is transformed into:
module Merge(Any_seq: seq ) = struct
open Any_seq
let rec merge s t =
match next s, next t with
| Some(x,s), None | None, Some (x,s) ->
cons x s
| None, None -> s
| Some (x,s'), Some (y,t') ->
if x < y then
cons x (merge s' t)
else if x = y then
cons x (merge s' t')
else
cons y (merge s t')
end
Then, with list, our implementation was:
module List_core = struct
type 'a t = 'a list
let cons = List.cons
let next = function
| [] -> None
| a :: q -> Some(a,q)
end
module List_implem = Merge(List_core)
which can be tested with
let test = List_implem.merge [1;5;6] [2;4;9]
Implementing the same function for your stream type is then just a matter of writing a similar Stream_core module for stream.

How to avoid stack overflow during CPS conversion?

I'm writing a transformation from Scheme subset to CPS language. It is implemented in F#. On big input programs conversion fails by stack overflow.
I'm using some sort of algorithm described in the paper Compiling with Continuations.
I've tried to increase maximum stack size of the working thread up to 50 MB, then it works.
Maybe there some way to modify the algorithm, so that I won't need to tune stack size?
For example, the algorithm transforms
(foo (bar 1) (bar 2))
to
(let ((c1 (cont (r1)
(let ((c2 (cont (r2)
(foo halt r1 r2))))
(bar c2 2)))))
(bar c1 1))
where halt is a final continuation which finishes the program.
Maybe your actual problems has simple solutions to avoid heavy stack consumption, so please don't mind adding details. However, without more knowledge about your particular code, here is a general approach to reduce the stack consumption in a recursive programs, based on trampolines and continuations.
Walker
Here is a typical recursive function that is not trivially tail-recursive, written in Common Lisp because I don't know F#:
(defun walk (form transform join)
(typecase form
(cons (funcall join
(walk (car form) transform join)
(walk (cdr form) transform join)))
(t (funcall transform form))))
The code is however quite simple, hopefully, and walks a tree made of cons cells:
if the form is a cons-cell, recursively walk on the car (resp. cdr) and join the results
Otherwise, apply a transform on the value
For example:
(walk '(a (b c d) 3 2 (a 2 1) 0)
(lambda (u) (and (numberp u) u))
(lambda (a b) (if a (cons a b) (or a b))))
=> (3 2 (2 1) 0)
The code walks the form, and retain only numbers, but preserves (non-empty) nesting.
Calling trace on walk with the above example shows a maximal depth of 8 nested calls.
Continuations and trampoline
Here is an adapted version, called
walk/then, that walks a form as previously, and when a result is
available, calls then on it. Here then is a continuation.
The function also returns a thunk, i.e. a parameterless closure.
What happens is that when we return the closure, the stack is unwound,
and when we apply the thunk it will
start from a fresh stack, but having advanced in the computation
(I usually picture someone walking up an escalator that goes down).
The fact that we return a thunk to reduce the number of stack frames is part of the trampoline.
The then function takes a value, namely
the result that the current walk eventually will return.
The result is thus passed down the stack, and what is
returned at each step is a thunk function.
Nesting continuations allows to capture the complex behaviour of transform/join, by pushing the remaining parts of the computation in nested continuations.
(defun walk/then (form transform join then)
(typecase form
(cons (lambda ()
(walk/then (car form) transform join
(lambda (v)
(walk/then (cdr form) transform join
(lambda (w)
(funcall then (funcall join v w))))))))
(t (funcall then (funcall transform form)))))
For example, (walk/then (car form) transform join (lambda (v) ...)) reads as follows: walk the car of form with
arguments transform and join, and eventually call (lambda (v) ...) on the result; namely, walk down the cdr, and then join both results; eventually, call the input then on the joined result.
What is missing is a way to continually call the returned thunk until exhaustion; here is it
with a loop, but this could easily be a tail-recursive function:
(loop for res =
(walk/then '(a (b c d) 3 2 (a 2 1) 0)
(lambda (u) (and (numberp u) u))
(lambda (a b) (if a (cons a b) (or a b)))
#'identity)
then (typecase res (function (funcall res)) (t res))
while (functionp res)
finally (return res))
The above returns (3 2 (2 1) 0), and the depth of the trace never goes over 2 when tracing walk/then.
See Eli Bendersky's article for another take at this, in Python.
I've converted algorithm to trampoline form. It looks like FSM.
There is a loop, which looks at the current state, makes some manipulations, and goes to another state. Also it uses two stacks for different kind of continuations.
Here is input language (it is a subset of the language I used originally) :
// Input language consists of only variables and function applications
type Expr =
| Var of string
| App of Expr * Expr list
Here is target language:
// CPS form - each function gets a continuation,
// added continuation definitions and continuation applications
type Norm =
| LetCont of name : string * args : string list * body : Norm * inner : Norm
| FuncCall of func : string * cont : string * args : string list
| ContCall of cont : string * args : string list
Here is original algorithm:
// Usual way to make CPS conversion.
let rec transform expr cont =
match expr with
| App(func, args) ->
transformMany (func :: args) (fun vars ->
let func' = List.head vars
let args' = List.tail vars
let c = fresh()
let r = fresh()
LetCont(c, [r], cont r, FuncCall(func', c, args')))
| Var(v) -> cont v
and transformMany exprs cont =
match exprs with
| e :: rest ->
transform e (fun e' ->
transformMany rest (fun rest' ->
cont (e' :: rest')))
| _ -> cont []
let transformTop expr =
transform expr (fun var -> ContCall("halt", [var]))
Here is modified version:
type Action =
| ContinuationVar of Expr * (string -> Action)
| ContinuationExpr of string * (Norm -> Action)
| TransformMany of string list * Expr list * (string list -> Action)
| Result of Norm
| Variable of string
// Make one action at time and return to top loop
let rec transform2 expr =
match expr with
| App(func, args) ->
TransformMany([], func :: args, (fun vars ->
let func' = List.head vars
let args' = List.tail vars
let c = fresh()
let r = fresh()
ContinuationExpr(r, fun expr ->
Result(LetCont(c, [r], expr, FuncCall(func', c, args'))))))
| Var(v) -> Variable(v)
// We have two stacks here:
// contsVar for continuations accepting variables
// contsExpr for continuations accepting expressions
let transformTop2 expr =
let rec loop contsVar contsExpr action =
match action with
| ContinuationVar(expr, cont) ->
loop (cont :: contsVar) contsExpr (transform2 expr)
| ContinuationExpr(var, contExpr) ->
let contVar = List.head contsVar
let contsVar' = List.tail contsVar
loop contsVar' (contExpr :: contsExpr) (contVar var)
| TransformMany(vars, e :: exprs, cont) ->
loop contsVar contsExpr (ContinuationVar(e, fun var ->
TransformMany(var :: vars, exprs, cont)))
| TransformMany(vars, [], cont) ->
loop contsVar contsExpr (cont (List.rev vars))
| Result(r) ->
match contsExpr with
| cont :: rest -> loop contsVar rest (cont r)
| _ -> r
| Variable(v) ->
match contsVar with
| cont :: rest -> loop rest contsExpr (cont v)
| _ -> failwith "must not be empty"
let initial = ContinuationVar(expr, fun var -> Result(ContCall("halt", [var])))
loop [] [] initial

Converting F# pipeline operators ( <|, >>, << ) to OCaml

I'm converting some F# code to OCaml and I see a lot of uses of this pipeline operator <|, for example:
let printPeg expr =
printfn "%s" <| pegToString expr
The <| operator is apparently defined as just:
# let ( <| ) a b = a b ;;
val ( <| ) : ('a -> 'b) -> 'a -> 'b = <fun>
I'm wondering why they bother to define and use this operator in F#, is it just so they can avoid putting in parens like this?:
let printPeg expr =
Printf.printf "%s" ( pegToString expr )
As far as I can tell, that would be the conversion of the F# code above to OCaml, correct?
Also, how would I implement F#'s << and >> operators in Ocaml?
( the |> operator seems to simply be: let ( |> ) a b = b a ;; )
why they bother to define and use this operator in F#, is it just so they can avoid putting in parens?
It's because the functional way of programming assumes threading a value through a chain of functions. Compare:
let f1 str server =
str
|> parseUserName
|> getUserByName server
|> validateLogin <| DateTime.Now
let f2 str server =
validateLogin(getUserByName(server, (parseUserName str)), DateTime.Now)
In the first snippet, we clearly see everything that happens with the value. Reading the second one, we have to go through all parens to figure out what's going on.
This article about function composition seems to be relevant.
So yes, in a regular life, it is mostly about parens. But also, pipeline operators are closely related to partial function application and point-free style of coding. See Programming is "Pointless", for example.
The pipeline |> and function composition >> << operators can produce yet another interesting effect when they are passed to higher-level functions, like here.
Directly from the F# source:
let inline (|>) x f = f x
let inline (||>) (x1,x2) f = f x1 x2
let inline (|||>) (x1,x2,x3) f = f x1 x2 x3
let inline (<|) f x = f x
let inline (<||) f (x1,x2) = f x1 x2
let inline (<|||) f (x1,x2,x3) = f x1 x2 x3
let inline (>>) f g x = g(f x)
let inline (<<) f g x = f(g x)
OCaml Batteries supports these operators, but for reasons of precedence, associativity and other syntactic quirks (like Camlp4) it uses different symbols. Which particular symbols to use has just been settled recently, there are some changes. See: Batteries API:
val (|>) : 'a -> ('a -> 'b) -> 'b
Function application. x |> f is equivalent to f x.
val ( **> ) : ('a -> 'b) -> 'a -> 'b
Function application. f **> x is equivalent to f x.
Note The name of this operator is not written in stone. It is bound to change soon.
val (|-) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c
Function composition. f |- g is fun x -> g (f x). This is also equivalent to applying <** twice.
val (-|) : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b
Function composition. f -| g is fun x -> f (g x). Mathematically, this is operator o.
But Batteries trunk provides:
val ( ## ) : ('a -> 'b) -> 'a -> 'b
Function application. [f ## x] is equivalent to [f x].
val ( % ) : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b
Function composition: the mathematical [o] operator.
val ( |> ) : 'a -> ('a -> 'b) -> 'b
The "pipe": function application. [x |> f] is equivalent to [f x].
val ( %> ) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c
Piping function composition. [f %> g] is [fun x -> g (f x)].
I'm wondering why they bother to define and use this operator in F#, is it just so they can avoid putting in parens like this?
Excellent question. The specific operator you're referring to (<|) is pretty useless IME. It lets you avoid parentheses on rare occasions but more generally it complicates the syntax by dragging in more operators and makes it harder for less experienced F# programmers (of which there are now many) to understand your code. So I've stopped using it.
The |> operator is much more useful but only because it helps F# to infer types correctly in situations where OCaml would not have a problem. For example, here is some OCaml:
List.map (fun o -> o#foo) os
The direct equivalent fails in F# because the type of o cannot be inferred prior to reading its foo property so the idiomatic solution is to rewrite the code like this using the |> so F# can infer the type of o before foo is used:
os |> List.map (fun o -> o.foo)
I rarely use the other operators (<< and >>) because they also complicate the syntax. I also dislike parser combinator libraries that pull in lots of operators.
The example Bytebuster gave is interesting:
let f1 str server =
str
|> parseUserName
|> getUserByName server
|> validateLogin <| DateTime.Now
I would write this as:
let f2 str server =
let userName = parseUserName str
let user = getUserByName server userName
validateLogin user DateTime.Now
There are no brackets in my code. My temporaries have names so they appear in the debugger and I can inspect them and Intellisense can give me type throwback when I hover the mouse over them. These characteristics are valuable for production code that non-expert F# programmers will be maintaining.

Is this a reasonable foundation for a parser combinator library?

I've been working with FParsec lately and I found that the lack of generic parsers is a major stopping point for me. My goal for this little library is simplicity as well as support for generic input. Can you think of any additions that would improve this or is anything particularly bad?
open LazyList
type State<'a, 'b> (input:LazyList<'a>, data:'b) =
member this.Input = input
member this.Data = data
type Result<'a, 'b, 'c> =
| Success of 'c * State<'a, 'b>
| Failure of string * State<'a, 'b>
type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c>
let (>>=) left right state =
match left state with
| Success (result, state) -> (right result) state
| Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state)
let (<|>) left right state =
match left state with
| Success (_, _) as result -> result
| Failure (_, _) -> right state
let (|>>) parser transform state =
match parser state with
| Success (result, state) -> Success (transform result, state)
| Failure (message, _) -> Failure (message, state)
let (<?>) parser errorMessage state =
match parser state with
| Success (_, _) as result -> result
| Failure (_, _) -> Failure (errorMessage, state)
type ParseMonad() =
member this.Bind (f, g) = f >>= g
member this.Return x s = Success(x, s)
member this.Zero () s = Failure("", s)
member this.Delay (f:unit -> Parser<_,_,_>) = f()
let parse = ParseMonad()
Backtracking
Surprisingly it didn't take too much code to implement what you describe. It is a bit sloppy but seems to work quite well.
let (>>=) left right state =
seq {
for res in left state do
match res with
| Success(v, s) ->
let v =
right v s
|> List.tryFind (
fun res ->
match res with
| Success (_, _) -> true
| _ -> false
)
match v with
| Some v -> yield v
| None -> ()
} |> Seq.toList
let (<|>) left right state =
left state # right state
Backtracking Part 2
Switched around the code to use lazy lists and tail-call optimized recursion.
let (>>=) left right state =
let rec readRight lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
let rec readLeft lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) ->
match readRight (right r s) with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> readLeft xs
| Failure (m, s) -> readLeft xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
readLeft (left state)
let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state =
LazyList.delayed (fun () -> left state)
|> LazyList.append
<| LazyList.delayed (fun () -> right state)
I think that one important design decision that you'll need to make is whether you want to support backtracking in your parsers or not (I don't remember much about parsing theory, but this probably specifies the types of languages that your parser can handle).
Backtracking. In your implementation, a parser can either fail (the Failure case) or produce exactly one result (the Success case). An alternative option is to generate zero or more results (for example, represent results as seq<'c>). Sorry if this is something you already considered :-), but anyway...
The difference is that your parser always follows the first possible option. For example, if you write something like the following:
let! s1 = (str "ab" <|> str "a")
let! s2 = str "bcd"
Using your implementation, this will fail for input "abcd". It will choose the first branch of the <|> operator, which will then process first two characters and the next parser in the sequence will fail. An implementation based on sequences would be able to backtrack and follow the second path in <|> and parse the input.
Combine. Another idea that occurs to me is that you could also add Combine member to your parser computation builder. This is a bit subtle (because you need to understand how computation expressions are translated), but it can be sometimes useful. If you add:
member x.Combine(a, b) = a <|> b
member x.ReturnFrom(p) = p
You can then write recursive parsers nicely:
let rec many p acc =
parser { let! r = p // Parse 'p' at least once
return! many p (r::acc) // Try parsing 'p' multiple times
return r::acc |> List.rev } // If fails, return the result

Resources