Generating a parser with `inverse`, with constraints on the grammar - parsing

I recently followed through A Taste of Curry, and afterwards decided to put the trivial arithmetic parser example to test, by writing a somewhat more substantial parser: a primitive but correct and functional HTML parser.
I ended up with a working node2string function to operate on Node (with attributes and children), which I then inversed to obtain a parse function, as exemplified in the article.
The first naive implementation had the mistake that it parsed anything but e.g. the trivial <input/> HTML snippet into exactly one Node representation; everything else nondeterministically yielded invalid things like
Node { name = "input", attrs = [Attr "type" "submit"] }
Node { name = "input type=\"submit\"", attrs = [] }
and so on.
After some initial naive attempts to fix that from within node2string, I realized the point, which I believe all seasoned logic programmers see instantaneously, that parse = inverse node2string was right more right and insightful about the sitatution than I was: the above 2 parse results of <input type="submit"/> indeed were exactly the 2 valid and constructible values of Node that would lead to HTML representations.
I realized I had to constrain Node to only allow passing in alphabetic — well not really but let's keep it simple — names (and of course same for Attr). In a less fundamental setting than a logic program (such as regular Haskell with much more hand written and "instructional" as opposed to purely declarative programming), I would simply have hidden the Node constructor behind e.g. a mkNode sentinel function, but I have the feeling this wouldn't work well in Curry due to how the inference engine or constraint solver work (I might be wrong on this, and in fact I hope I am).
So I ended up instead with the following. I think Curry metaprogramming (or Template Haskell, if Curry supported it) could be used ot clean up the manual boielrplate, but cosmetically dealing is only one way out of the situation.
data Name = Name [NameChar] -- newtype crashes the compiler
data NameChar = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
name2char :: NameChar -> Char
name2char c = case c of A -> 'a'; B -> 'b'; C -> 'c'; D -> 'd'; E -> 'e'; F -> 'f'; G -> 'g'; H -> 'h'; I -> 'i'; J -> 'j'; K -> 'k'; L -> 'l'; M -> 'm'; N -> 'n'; O -> 'o'; P -> 'p'; Q -> 'q'; R -> 'r'; S -> 's'; T -> 't'; U -> 'u'; V -> 'v'; W -> 'w'; X -> 'x'; Y -> 'y'; Z -> 'z'
name2string :: Name -> String
name2string (Name s) = map name2char s
-- for "string literal" support
nameFromString :: String -> Name
nameFromString = inverse name2string
data Node = Node { nodeName :: Name, attrs :: [Attr], children :: [Node] }
data Attr = Attr { attrName :: Name, value :: String }
attr2string :: Attr -> String
attr2string (Attr name value) = name2string name ++ "=\"" ++ escape value ++ "\""
where escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
node2string :: Node -> String
node2string (Node name attrs children) | null children = "<" ++ name' ++ attrs' ++ "/>"
| otherwise = "<" ++ name' ++ attrs' ++ ">" ++ children' ++ "</" ++ name' ++ ">"
where name' = name2string name
attrs' = (concatMap ((" " ++) . attr2string) attrs)
children' = intercalate "" $ map (node2string) children
inverse :: (a -> b) -> (b -> a)
inverse f y | f x =:= y = x where x free
parse :: String -> Node
parse = inverse node2string
This, in fact, works perfectly (in my judgement):
Parser> parse "<input type=\"submit\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit")] [])
Parser> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit"),(Attr [N,A,M,E] "btn1")] [])
(Curry doesn't have type classes so I wouldn't know yet how to make [NameChar] print more nicely)
However, my question is:
is there a way to use something like isAlpha (or a function more true to the actual HTML spec, of course) to achieve a result equivalent to this, without having to go through the verbose boilerplate that NameChar and its "supporting members" are? There seems to be no way to even place the "functional restriction" anywhere within the ADT.
In a Dependently Typed Functional Logic Programming language, I would just express the constraint at the type level and let the inference engine or constraint solver deal with it, but here I seem to be at a loss.

You can achieve the same results using just Char. As you've already pointed out, you can use isAlpha to define name2char as a partial identity. I changed the following lines of your code.
type NameChar = Char
name2char :: NameChar -> Char
name2char c | isAlpha c = c
The two exemplary expressions then evaluate as follows.
test> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node (Name "input") [(Attr (Name "type") "submit"),(Attr (Name "name") "btn1")] [])
test> parse "<input type=\"submit\"/>"
(Node (Name "input") [(Attr (Name "type") "submit")] [])
As a side-effect, names with non-alpha characters silently fail with nameFromString.
test> nameFromString "input "
Edit: Since you seem to be a fan of function patterns, you can define generators for Nodes and Attrs and use them in your conversion function.
attr :: Name -> String -> Attr
attr name val
| name `elem` ["type", "src", "alt", "name"] = Attr name val
node :: String -> [Attr] -> [Node] -> Node
node name [] nodes
| name `elem` ["a", "p"] = Node name [] nodes
node name attrPairs#(_:_) nodes
| name `elem` ["img", "input"] = Node name attrPairs nodes
node2string :: Node -> String
node2string (node name attrs children)
| null children = "<" ++ name ++ attrs' ++ "/>"
| otherwise = "<" ++ name ++ attrs' ++ ">"
++ children' ++ "</" ++ name' ++ ">"
where
name' = name
attrs' = concatMap ((" " ++) . attr2string) attrs
children' = intercalate "" $ map (node2string) children
attr2string :: Attr -> String
attr2string (attr name val) = name ++ "=\"" ++ escape val ++ "\""
where
escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
This approach has its disadvantages; it works quite well for a specific set of valid names, but fails miserably when you use a predicate like before (e.g., all isAlpha name).
Edit2:
Besides the fact that the solution with the isAlpha condition is quite "prettier" than your verbose solution, it is also defined in a declarative way.
Without your comments, it doesn't become clear (that easily) that you are encoding alphabetic characters with your NameChar data type. The isAlpha condition on the other hand is a good example for a declarative specification of the wanted property.
Does this answer your question? I'm not sure what you are aiming at.

Related

Parse error when using case expression inside guards haskell

I am new to Haskell and I am having issues with syntax. What I want to do is given data and a tree of this datatype, find the path to the corresponding node in the tree. I believe my logic for the function is correct but I am not sure how to make it valid Haskell. I have tried changing tabs to spaces.
-- | Binary trees with nodes labeled by values of an arbitrary type.
data Tree a
= Node a (Tree a) (Tree a)
| End
deriving (Eq,Show)
-- | One step in a path, indicating whether to follow the left subtree (L)
-- or the right subtree (R).
data Step = L | R
deriving (Eq,Show)
-- | A path is a sequence of steps. Each node in a binary tree can be
-- identified by a path, indicating how to move down the tree starting
-- from the root.
type Path = [Step]
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| case (pathTo a l) of
Just p -> Just [L:p]
Nothing -> case (pathTo a r) of
Just p -> Just [R:p]
Nothing -> Nothing
This is the error:
parse error (possibly incorrect indentation or mismatched brackets)
The underlying problem here is that this does not look like a guard: a guard is an expression with type Bool, this determines if the guard "fires" or not. Here this is likely `otherwise:
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| otherwise = case (pathTo a l) of
Just p -> Just (L:p)
Nothing -> case (pathTo a r) of
Just p -> Just (R:p)
Nothing -> Nothing
This also revealed some extra mistakes: Just [L:p] is a Maybe [[Step]], you likely wanted to use Just (L:p), the same applies for Just [R:p].
You furthermore do not need to use nested cases, you can work with the Alternative typeclass:
import Control.Applicative((<|>))
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| otherwise = ((L:) <$> pathTo a l) <|> ((R:) <$> pathTo a r)
Here x <|> y will take x if it is a Just …, and y otherwise. We use (L:) <$> … to prepend the list wrapped in the Just data constructor, or return Nothing in case … is Nothing.

How to parse a boolean expression

I'm pretty new to F#, and I'm trying to use recursion to solve a problem.
The function receives a string, and returns a bool. The string gets parsed, and evaluated. This is bool logic, so
(T|F) returns true
(T&(T&T)) returns true
((T|T)&(T&F)) returns false
(F) = returns false
My idea was that every time I found a ), replace the part of the string from the previous ( to that ) with the result of the Comparison match. Doing this over and over until only T or F remains, to return true or false.
EDIT:
I expect it to take the string, and keep swapping out what is in between the ( and ) with the result of the comparison until it comes down to a T or F. What is happening, is an error about an incomplete structured construct. The error is in the for loop.
As I am so new to this language, I'm not sure what I'm doing wrong. Do you see it?
let ComparisonSolver (comp:string) =
let mutable trim = comp
trim <- trim.Replace("(", "")
trim <- trim.Replace(")", "")
match trim with
| "T" -> "T"
| "F" -> "F"
| "!T" -> "F"
| "!F" -> "T"
| "T&T" -> "T"
| "F&F" -> "T"
| "T&F" -> "F"
| "F&T" -> "F"
| "T|T" -> "T"
| "F|F" -> "F"
| "T|F" -> "T"
| "F|T" -> "T"
| _ -> ""
let rec BoolParser arg =
let mutable args = arg
if String.length arg = 1 then
match arg with
| "T" -> true
| "F" -> false
else
let mutable ParseStart = 0
let endRange = String.length args
for letter in [0 .. endRange]
if args.[letter] = "(" then
ParseStart <- letter
else if args.[letter] = ")" then
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
let result = BoolParser "(T)&(F)"
There are a few things you need to correct.
for letter in [0 .. endRange] is missing a do at the end of it - it should be for letter in [0 .. endRange] do
The if comparisons in the for loop are comparing chars with strings. You need to replace "(" and ")" with '(' and ')'
for letter in [0 .. endRange] will go out of range: In F# the array construct [x..y] will go from x to y inclusive. It's a bit like in C# if you had for (int i = 0; i <= array.Length; i++). In F# you can also declare loops like this: for i = 0 to endRange - 1 do.
for letter in [0 .. endRange] will go out of range again: It's going from 0 to endrange, which is the length of args. But args is getting shortened in the for loop, so it will eventually try to get a character from args that's out of range.
Now, the problem with the if..then..else statements, which is what I think you were looking at from the beginning.
if args.[letter] = '(' then
ParseStart <- letter
else if args.[letter] = ')' then
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
Let's take the code within the two branches as two separate functions.
The first does ParseStart <- letter, which assigns letter to ParseStart. This function returns unit, which is F# equivalent of void.
The second does:
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
This function returns a bool.
Now when you put them together in an if..then..else statement you have in one branch that results a unit and in the other in a bool. In this case it doesn't know which one to return, so it shows an "expression was expected to have type" error.
I strongly suspect that you wanted to call BoolParser args from outside
the for/if loop. But it's been indented so that F# treats it as part of the else if statement.
There are many ways to parse a boolean expression. It might be a good idea to look at the excellent library FParsec.
http://www.quanttec.com/fparsec/
Another way to implement parsers in F# is to use Active Patterns which can make for readable code
https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/active-patterns
It's hard to provide good error reporting through Active Patterns but perhaps you can find some inpiration from the following example:
let next s i = struct (s, i) |> Some
// Skips whitespace characters
let (|SkipWhitespace|_|) struct (s, i) =
let rec loop j =
if j < String.length s && s.[j] = ' ' then
loop (j + 1)
else
next s j
loop i
// Matches a specific character: ch
let (|Char|_|) ch struct (s, i) =
if i < String.length s && s.[i] = ch then
next s (i + 1)
else
None
// Matches a specific character: ch
// and skips trailing whitespaces
let (|Token|_|) ch =
function
| Char ch (SkipWhitespace ps) -> Some ps
| _ -> None
// Parses the boolean expressions
let parse s =
let rec term =
function
| Token 'T' ps -> Some (true, ps)
| Token 'F' ps -> Some (false, ps)
| Token '(' (Parse (v, Token ')' ps)) -> Some (v, ps)
| _ -> None
and opReducer p ch reducer =
let (|P|_|) ps = p ps
let rec loop l =
function
| Token ch (P (r, ps)) -> loop (reducer l r) ps
| Token ch _ -> None
| ps -> Some (l, ps)
function
| P (l, ps) -> loop l ps
| _ -> None
and andExpression ps = opReducer term '&' (&&) ps
and orExpression ps = opReducer andExpression '|' (||) ps
and parse ps = orExpression ps
and (|Parse|_|) ps = parse ps
match (struct (s, 0)) with
| SkipWhitespace (Parse (v, _)) -> Some v
| _ -> None
module Tests =
// FsCheck allows us to get better confidence in that the parser actually works
open FsCheck
type Whitespace =
| Space
type Ws = Ws of (Whitespace [])*(Whitespace [])
type Expression =
| Term of Ws*bool
| And of Expression*Ws*Expression
| Or of Expression*Ws*Expression
override x.ToString () =
let orPrio = 1
let andPrio = 2
let sb = System.Text.StringBuilder 16
let ch c = sb.Append (c : char) |> ignore
let token (Ws (l, r)) c =
sb.Append (' ', l.Length) |> ignore
sb.Append (c : char) |> ignore
sb.Append (' ', r.Length) |> ignore
let enclose p1 p2 f =
if p1 > p2 then ch '('; f (); ch ')'
else f ()
let rec loop prio =
function
| Term (ws, v) -> token ws (if v then 'T' else 'F')
| And (l, ws, r) -> enclose prio andPrio <| fun () -> loop andPrio l; token ws '&' ;loop andPrio r
| Or (l, ws, r) -> enclose prio orPrio <| fun () -> loop orPrio l ; token ws '|' ;loop orPrio r
loop andPrio x
sb.ToString ()
member x.ToBool () =
let rec loop =
function
| Term (_, v) -> v
| And (l, _, r) -> loop l && loop r
| Or (l, _, r) -> loop l || loop r
loop x
type Properties() =
static member ``Parsing expression shall succeed`` (expr : Expression) =
let expected = expr.ToBool () |> Some
let str = expr.ToString ()
let actual = str |> parse
expected = actual
let fscheck () =
let config = { Config.Quick with MaxTest = 1000; MaxRejected = 1000 }
Check.All<Properties> config

Haskell read variable name

I need to write a code that parses some language. I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. I think I made a good start with following code:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
This function however gives me a weird results. If I call my test function
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
like this:
test_parseIdents "test"
I get this:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
Note that Parser is just synonym for ReadP a.
I also want to encode in the parser that variable names should start with a lowercase character.
Thank you for your help.
Part of the problem is with your use of the +++ operator. The following code works for me:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
So what went wrong:
+++ imposes an order on its arguments, and allows for multiple alternatives to succeed (symmetric choice). <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem.
Your parser was looking for letters first, then digits, and finally underscores. Digits after underscores failed, for example. The parser had to be modified to munch characters that were either letters, digits or underscores.
I also removed some functions that were unused and made an educated guess for the definition of your datatypes.

An elegant way to parse sexp

sexp is like this: type sexp = Atom of string | List of sexp list, e.g., "((a b) ((c d) e) f)".
I have written a parser to parse a sexp string to the type:
let of_string s =
let len = String.length s in
let empty_buf () = Buffer.create 16 in
let rec parse_atom buf i =
if i >= len then failwith "cannot parse"
else
match s.[i] with
| '(' -> failwith "cannot parse"
| ')' -> Atom (Buffer.contents buf), i-1
| ' ' -> Atom (Buffer.contents buf), i
| c when i = len-1 -> (Buffer.add_char buf c; Atom (Buffer.contents buf), i)
| c -> (Buffer.add_char buf c; parse_atom buf (i+1))
and parse_list acc i =
if i >= len || (i = len-1 && s.[i] <> ')') then failwith "cannot parse"
else
match s.[i] with
| ')' -> List (List.rev acc), i
| '(' ->
let list, j = parse_list [] (i+1) in
parse_list (list::acc) (j+1)
| c ->
let atom, j = parse_atom (empty_buf()) i in
parse_list (atom::acc) (j+1)
in
if s.[0] <> '(' then
let atom, j = parse_atom (empty_buf()) 0 in
if j = len-1 then atom
else failwith "cannot parse"
else
let list, j = parse_list [] 1 in
if j = len-1 then list
else failwith "cannot parse"
But I think it is too verbose and ugly.
Can someone help me with an elegant way to write such a parser?
Actually, I always have problems in writing code of parser and what I could do only is write such a ugly one.
Any tricks for this kind of parsing? How to effectively deal with symbols, such as (, ), that implies recursive parsing?
You can use a lexer+parser discipline to separate the details of lexical syntax (skipping spaces, mostly) from the actual grammar structure. That may seem overkill for such a simple grammar, but it's actually better as soon as the data you parse has the slightest chance of being wrong: you really want error location (and not to implement them yourself).
A technique that is easy and gives short parsers is to use stream parsers (using a Camlp4 extension for them described in the Developping Applications with Objective Caml book); you may even get a lexer for free by using the Genlex module.
If you want to do really do it manually, as in your example above, here is my recommendation to have a nice parser structure. Have mutually recursive parsers, one for each category of your syntax, with the following interface:
parsers take as input the index at which to start parsing
they return a pair of the parsed value and the first index not part of the value
nothing more
Your code does not respect this structure. For example, you parser for atoms will fail if it sees a (. That is not his role and responsibility: it should simply consider that this character is not part of the atom, and return the atom-parsed-so-far, indicating that this position is not in the atom anymore.
Here is a code example in this style for you grammar. I have split the parsers with accumulators in triples (start_foo, parse_foo and finish_foo) to factorize multiple start or return points, but that is only an implementation detail.
I have used a new feature of 4.02 just for fun, match with exception, instead of explicitly testing for the end of the string. It is of course trivial to revert to something less fancy.
Finally, the current parser does not fail if the valid expression ends before the end of the input, it only returns the end of the input on the side. That's helpful for testing but you would do it differently in "production", whatever that means.
let of_string str =
let rec parse i =
match str.[i] with
| exception _ -> failwith "unfinished input"
| ')' -> failwith "extraneous ')'"
| ' ' -> parse (i+1)
| '(' -> start_list (i+1)
| _ -> start_atom i
and start_list i = parse_list [] i
and parse_list acc i =
match str.[i] with
| exception _ -> failwith "unfinished list"
| ')' -> finish_list acc (i+1)
| ' ' -> parse_list acc (i+1)
| _ ->
let elem, j = parse i in
parse_list (elem :: acc) j
and finish_list acc i =
List (List.rev acc), i
and start_atom i = parse_atom (Buffer.create 3) i
and parse_atom acc i =
match str.[i] with
| exception _ -> finish_atom acc i
| ')' | ' ' -> finish_atom acc i
| _ -> parse_atom (Buffer.add_char acc str.[i]; acc) (i + 1)
and finish_atom acc i =
Atom (Buffer.contents acc), i
in
let result, rest = parse 0 in
result, String.sub str rest (String.length str - rest)
Note that it is an error to reach the end of input when parsing a valid expression (you must have read at least one atom or list) or when parsing a list (you must have encountered the closing parenthesis), yet it is valid at the end of an atom.
This parser does not return location information. All real-world parsers should do so, and this is enough of a reason to use a lexer/parser approach (or your preferred monadic parser library) instead of doing it by hand. Returning location information here is not terribly difficult, though, just duplicate the i parameter into the index of the currently parsed character, on one hand, and the first index used for the current AST node, on the other; whenever you produce a result, the location is the pair (first index, last valid index).

Ocaml lexer / parser rules

I wrote a program in ocaml that given an infix expression like 1 + 2, outputs the prefix notation : + 1 2
My problem is I don't find a way to make a rules like : all value, operator and bracket should be always separated by at least one space: 1+ 1 would be wrong 1 + 1 ok. I would like to not use the ocamlp4 grammar.
here is the code:
open Genlex
type tree =
| Leaf of string
| Node of tree * string * tree
let my_lexer str =
let kwds = ["("; ")"; "+"; "-"; "*"; "/"] in
make_lexer kwds (Stream.of_string str)
let make_tree_from_stream stream =
let op_parser operator_l higher_perm =
let rec aux left higher_perm = parser
[<'Kwd op when List.mem op operator_l; right = higher_perm; s >]
-> aux (Node (left, op, right)) higher_perm s
| [< >]
-> left
in
parser [< left = higher_perm; s >] -> aux left higher_perm s
in
let rec high_perm l = op_parser ["*"; "/"] brackets l
and low_perm l = op_parser ["+"; "-"] high_perm l
and brackets = parser
| [< 'Kwd "("; e = low_perm; 'Kwd ")" >] -> e
| [< 'Ident n >] -> Leaf n
| [< 'Int n >] -> Leaf (string_of_int n)
in
low_perm stream
let rec draw_tree = function
| Leaf n -> Printf.printf "%s" n
| Node(fg, r, fd) -> Printf.printf "(%s " (r);
draw_tree fg;
Printf.printf " ";
draw_tree fd;
Printf.printf ")"
let () =
let line = read_line() in
draw_tree (make_tree_from_stream (my_lexer line)); Printf.printf "\n"
Plus if you have some tips about the code or if you notice some error of prog style then I will appreciate that you let it me know. Thanks !
The Genlex provides a ready-made lexer that respects OCaml's lexical convention, and in particular ignore the spaces in the positions you mention. I don't think you can implement what you want on top of it (it is not designed as a flexible solution, but a quick way to get a prototype working).
If you want to keep writing stream parsers, you could write your own lexer for it: define a token type, and lex a char Stream.t into a token Stream.t, which you can then parse as you wish. Otherwise, if you don't want to use Camlp4, you may want to try an LR parser generator, such as menhir (a better ocamlyacc).

Resources