I am new to Haskell and I am having issues with syntax. What I want to do is given data and a tree of this datatype, find the path to the corresponding node in the tree. I believe my logic for the function is correct but I am not sure how to make it valid Haskell. I have tried changing tabs to spaces.
-- | Binary trees with nodes labeled by values of an arbitrary type.
data Tree a
= Node a (Tree a) (Tree a)
| End
deriving (Eq,Show)
-- | One step in a path, indicating whether to follow the left subtree (L)
-- or the right subtree (R).
data Step = L | R
deriving (Eq,Show)
-- | A path is a sequence of steps. Each node in a binary tree can be
-- identified by a path, indicating how to move down the tree starting
-- from the root.
type Path = [Step]
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| case (pathTo a l) of
Just p -> Just [L:p]
Nothing -> case (pathTo a r) of
Just p -> Just [R:p]
Nothing -> Nothing
This is the error:
parse error (possibly incorrect indentation or mismatched brackets)
The underlying problem here is that this does not look like a guard: a guard is an expression with type Bool, this determines if the guard "fires" or not. Here this is likely `otherwise:
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| otherwise = case (pathTo a l) of
Just p -> Just (L:p)
Nothing -> case (pathTo a r) of
Just p -> Just (R:p)
Nothing -> Nothing
This also revealed some extra mistakes: Just [L:p] is a Maybe [[Step]], you likely wanted to use Just (L:p), the same applies for Just [R:p].
You furthermore do not need to use nested cases, you can work with the Alternative typeclass:
import Control.Applicative((<|>))
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| otherwise = ((L:) <$> pathTo a l) <|> ((R:) <$> pathTo a r)
Here x <|> y will take x if it is a Just …, and y otherwise. We use (L:) <$> … to prepend the list wrapped in the Just data constructor, or return Nothing in case … is Nothing.
I'm pretty new to F#, and I'm trying to use recursion to solve a problem.
The function receives a string, and returns a bool. The string gets parsed, and evaluated. This is bool logic, so
(T|F) returns true
(T&(T&T)) returns true
((T|T)&(T&F)) returns false
(F) = returns false
My idea was that every time I found a ), replace the part of the string from the previous ( to that ) with the result of the Comparison match. Doing this over and over until only T or F remains, to return true or false.
EDIT:
I expect it to take the string, and keep swapping out what is in between the ( and ) with the result of the comparison until it comes down to a T or F. What is happening, is an error about an incomplete structured construct. The error is in the for loop.
As I am so new to this language, I'm not sure what I'm doing wrong. Do you see it?
let ComparisonSolver (comp:string) =
let mutable trim = comp
trim <- trim.Replace("(", "")
trim <- trim.Replace(")", "")
match trim with
| "T" -> "T"
| "F" -> "F"
| "!T" -> "F"
| "!F" -> "T"
| "T&T" -> "T"
| "F&F" -> "T"
| "T&F" -> "F"
| "F&T" -> "F"
| "T|T" -> "T"
| "F|F" -> "F"
| "T|F" -> "T"
| "F|T" -> "T"
| _ -> ""
let rec BoolParser arg =
let mutable args = arg
if String.length arg = 1 then
match arg with
| "T" -> true
| "F" -> false
else
let mutable ParseStart = 0
let endRange = String.length args
for letter in [0 .. endRange]
if args.[letter] = "(" then
ParseStart <- letter
else if args.[letter] = ")" then
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
let result = BoolParser "(T)&(F)"
There are a few things you need to correct.
for letter in [0 .. endRange] is missing a do at the end of it - it should be for letter in [0 .. endRange] do
The if comparisons in the for loop are comparing chars with strings. You need to replace "(" and ")" with '(' and ')'
for letter in [0 .. endRange] will go out of range: In F# the array construct [x..y] will go from x to y inclusive. It's a bit like in C# if you had for (int i = 0; i <= array.Length; i++). In F# you can also declare loops like this: for i = 0 to endRange - 1 do.
for letter in [0 .. endRange] will go out of range again: It's going from 0 to endrange, which is the length of args. But args is getting shortened in the for loop, so it will eventually try to get a character from args that's out of range.
Now, the problem with the if..then..else statements, which is what I think you were looking at from the beginning.
if args.[letter] = '(' then
ParseStart <- letter
else if args.[letter] = ')' then
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
Let's take the code within the two branches as two separate functions.
The first does ParseStart <- letter, which assigns letter to ParseStart. This function returns unit, which is F# equivalent of void.
The second does:
args <- args.Replace(args.[ParseStart .. letter], ComparisonSolver args.[ParseStart .. letter])
BoolParser args
This function returns a bool.
Now when you put them together in an if..then..else statement you have in one branch that results a unit and in the other in a bool. In this case it doesn't know which one to return, so it shows an "expression was expected to have type" error.
I strongly suspect that you wanted to call BoolParser args from outside
the for/if loop. But it's been indented so that F# treats it as part of the else if statement.
There are many ways to parse a boolean expression. It might be a good idea to look at the excellent library FParsec.
http://www.quanttec.com/fparsec/
Another way to implement parsers in F# is to use Active Patterns which can make for readable code
https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/active-patterns
It's hard to provide good error reporting through Active Patterns but perhaps you can find some inpiration from the following example:
let next s i = struct (s, i) |> Some
// Skips whitespace characters
let (|SkipWhitespace|_|) struct (s, i) =
let rec loop j =
if j < String.length s && s.[j] = ' ' then
loop (j + 1)
else
next s j
loop i
// Matches a specific character: ch
let (|Char|_|) ch struct (s, i) =
if i < String.length s && s.[i] = ch then
next s (i + 1)
else
None
// Matches a specific character: ch
// and skips trailing whitespaces
let (|Token|_|) ch =
function
| Char ch (SkipWhitespace ps) -> Some ps
| _ -> None
// Parses the boolean expressions
let parse s =
let rec term =
function
| Token 'T' ps -> Some (true, ps)
| Token 'F' ps -> Some (false, ps)
| Token '(' (Parse (v, Token ')' ps)) -> Some (v, ps)
| _ -> None
and opReducer p ch reducer =
let (|P|_|) ps = p ps
let rec loop l =
function
| Token ch (P (r, ps)) -> loop (reducer l r) ps
| Token ch _ -> None
| ps -> Some (l, ps)
function
| P (l, ps) -> loop l ps
| _ -> None
and andExpression ps = opReducer term '&' (&&) ps
and orExpression ps = opReducer andExpression '|' (||) ps
and parse ps = orExpression ps
and (|Parse|_|) ps = parse ps
match (struct (s, 0)) with
| SkipWhitespace (Parse (v, _)) -> Some v
| _ -> None
module Tests =
// FsCheck allows us to get better confidence in that the parser actually works
open FsCheck
type Whitespace =
| Space
type Ws = Ws of (Whitespace [])*(Whitespace [])
type Expression =
| Term of Ws*bool
| And of Expression*Ws*Expression
| Or of Expression*Ws*Expression
override x.ToString () =
let orPrio = 1
let andPrio = 2
let sb = System.Text.StringBuilder 16
let ch c = sb.Append (c : char) |> ignore
let token (Ws (l, r)) c =
sb.Append (' ', l.Length) |> ignore
sb.Append (c : char) |> ignore
sb.Append (' ', r.Length) |> ignore
let enclose p1 p2 f =
if p1 > p2 then ch '('; f (); ch ')'
else f ()
let rec loop prio =
function
| Term (ws, v) -> token ws (if v then 'T' else 'F')
| And (l, ws, r) -> enclose prio andPrio <| fun () -> loop andPrio l; token ws '&' ;loop andPrio r
| Or (l, ws, r) -> enclose prio orPrio <| fun () -> loop orPrio l ; token ws '|' ;loop orPrio r
loop andPrio x
sb.ToString ()
member x.ToBool () =
let rec loop =
function
| Term (_, v) -> v
| And (l, _, r) -> loop l && loop r
| Or (l, _, r) -> loop l || loop r
loop x
type Properties() =
static member ``Parsing expression shall succeed`` (expr : Expression) =
let expected = expr.ToBool () |> Some
let str = expr.ToString ()
let actual = str |> parse
expected = actual
let fscheck () =
let config = { Config.Quick with MaxTest = 1000; MaxRejected = 1000 }
Check.All<Properties> config
I need to write a code that parses some language. I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. I think I made a good start with following code:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
This function however gives me a weird results. If I call my test function
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
like this:
test_parseIdents "test"
I get this:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
Note that Parser is just synonym for ReadP a.
I also want to encode in the parser that variable names should start with a lowercase character.
Thank you for your help.
Part of the problem is with your use of the +++ operator. The following code works for me:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
So what went wrong:
+++ imposes an order on its arguments, and allows for multiple alternatives to succeed (symmetric choice). <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem.
Your parser was looking for letters first, then digits, and finally underscores. Digits after underscores failed, for example. The parser had to be modified to munch characters that were either letters, digits or underscores.
I also removed some functions that were unused and made an educated guess for the definition of your datatypes.
sexp is like this: type sexp = Atom of string | List of sexp list, e.g., "((a b) ((c d) e) f)".
I have written a parser to parse a sexp string to the type:
let of_string s =
let len = String.length s in
let empty_buf () = Buffer.create 16 in
let rec parse_atom buf i =
if i >= len then failwith "cannot parse"
else
match s.[i] with
| '(' -> failwith "cannot parse"
| ')' -> Atom (Buffer.contents buf), i-1
| ' ' -> Atom (Buffer.contents buf), i
| c when i = len-1 -> (Buffer.add_char buf c; Atom (Buffer.contents buf), i)
| c -> (Buffer.add_char buf c; parse_atom buf (i+1))
and parse_list acc i =
if i >= len || (i = len-1 && s.[i] <> ')') then failwith "cannot parse"
else
match s.[i] with
| ')' -> List (List.rev acc), i
| '(' ->
let list, j = parse_list [] (i+1) in
parse_list (list::acc) (j+1)
| c ->
let atom, j = parse_atom (empty_buf()) i in
parse_list (atom::acc) (j+1)
in
if s.[0] <> '(' then
let atom, j = parse_atom (empty_buf()) 0 in
if j = len-1 then atom
else failwith "cannot parse"
else
let list, j = parse_list [] 1 in
if j = len-1 then list
else failwith "cannot parse"
But I think it is too verbose and ugly.
Can someone help me with an elegant way to write such a parser?
Actually, I always have problems in writing code of parser and what I could do only is write such a ugly one.
Any tricks for this kind of parsing? How to effectively deal with symbols, such as (, ), that implies recursive parsing?
You can use a lexer+parser discipline to separate the details of lexical syntax (skipping spaces, mostly) from the actual grammar structure. That may seem overkill for such a simple grammar, but it's actually better as soon as the data you parse has the slightest chance of being wrong: you really want error location (and not to implement them yourself).
A technique that is easy and gives short parsers is to use stream parsers (using a Camlp4 extension for them described in the Developping Applications with Objective Caml book); you may even get a lexer for free by using the Genlex module.
If you want to do really do it manually, as in your example above, here is my recommendation to have a nice parser structure. Have mutually recursive parsers, one for each category of your syntax, with the following interface:
parsers take as input the index at which to start parsing
they return a pair of the parsed value and the first index not part of the value
nothing more
Your code does not respect this structure. For example, you parser for atoms will fail if it sees a (. That is not his role and responsibility: it should simply consider that this character is not part of the atom, and return the atom-parsed-so-far, indicating that this position is not in the atom anymore.
Here is a code example in this style for you grammar. I have split the parsers with accumulators in triples (start_foo, parse_foo and finish_foo) to factorize multiple start or return points, but that is only an implementation detail.
I have used a new feature of 4.02 just for fun, match with exception, instead of explicitly testing for the end of the string. It is of course trivial to revert to something less fancy.
Finally, the current parser does not fail if the valid expression ends before the end of the input, it only returns the end of the input on the side. That's helpful for testing but you would do it differently in "production", whatever that means.
let of_string str =
let rec parse i =
match str.[i] with
| exception _ -> failwith "unfinished input"
| ')' -> failwith "extraneous ')'"
| ' ' -> parse (i+1)
| '(' -> start_list (i+1)
| _ -> start_atom i
and start_list i = parse_list [] i
and parse_list acc i =
match str.[i] with
| exception _ -> failwith "unfinished list"
| ')' -> finish_list acc (i+1)
| ' ' -> parse_list acc (i+1)
| _ ->
let elem, j = parse i in
parse_list (elem :: acc) j
and finish_list acc i =
List (List.rev acc), i
and start_atom i = parse_atom (Buffer.create 3) i
and parse_atom acc i =
match str.[i] with
| exception _ -> finish_atom acc i
| ')' | ' ' -> finish_atom acc i
| _ -> parse_atom (Buffer.add_char acc str.[i]; acc) (i + 1)
and finish_atom acc i =
Atom (Buffer.contents acc), i
in
let result, rest = parse 0 in
result, String.sub str rest (String.length str - rest)
Note that it is an error to reach the end of input when parsing a valid expression (you must have read at least one atom or list) or when parsing a list (you must have encountered the closing parenthesis), yet it is valid at the end of an atom.
This parser does not return location information. All real-world parsers should do so, and this is enough of a reason to use a lexer/parser approach (or your preferred monadic parser library) instead of doing it by hand. Returning location information here is not terribly difficult, though, just duplicate the i parameter into the index of the currently parsed character, on one hand, and the first index used for the current AST node, on the other; whenever you produce a result, the location is the pair (first index, last valid index).
I wrote a program in ocaml that given an infix expression like 1 + 2, outputs the prefix notation : + 1 2
My problem is I don't find a way to make a rules like : all value, operator and bracket should be always separated by at least one space: 1+ 1 would be wrong 1 + 1 ok. I would like to not use the ocamlp4 grammar.
here is the code:
open Genlex
type tree =
| Leaf of string
| Node of tree * string * tree
let my_lexer str =
let kwds = ["("; ")"; "+"; "-"; "*"; "/"] in
make_lexer kwds (Stream.of_string str)
let make_tree_from_stream stream =
let op_parser operator_l higher_perm =
let rec aux left higher_perm = parser
[<'Kwd op when List.mem op operator_l; right = higher_perm; s >]
-> aux (Node (left, op, right)) higher_perm s
| [< >]
-> left
in
parser [< left = higher_perm; s >] -> aux left higher_perm s
in
let rec high_perm l = op_parser ["*"; "/"] brackets l
and low_perm l = op_parser ["+"; "-"] high_perm l
and brackets = parser
| [< 'Kwd "("; e = low_perm; 'Kwd ")" >] -> e
| [< 'Ident n >] -> Leaf n
| [< 'Int n >] -> Leaf (string_of_int n)
in
low_perm stream
let rec draw_tree = function
| Leaf n -> Printf.printf "%s" n
| Node(fg, r, fd) -> Printf.printf "(%s " (r);
draw_tree fg;
Printf.printf " ";
draw_tree fd;
Printf.printf ")"
let () =
let line = read_line() in
draw_tree (make_tree_from_stream (my_lexer line)); Printf.printf "\n"
Plus if you have some tips about the code or if you notice some error of prog style then I will appreciate that you let it me know. Thanks !
The Genlex provides a ready-made lexer that respects OCaml's lexical convention, and in particular ignore the spaces in the positions you mention. I don't think you can implement what you want on top of it (it is not designed as a flexible solution, but a quick way to get a prototype working).
If you want to keep writing stream parsers, you could write your own lexer for it: define a token type, and lex a char Stream.t into a token Stream.t, which you can then parse as you wish. Otherwise, if you don't want to use Camlp4, you may want to try an LR parser generator, such as menhir (a better ocamlyacc).