How to insert an indentation check into the operator precedence parser? - parsing

I am working on the parsing stage for the language I am making and am having difficulty with the following.
let test2 = // I'd like this to be an error.
"""
2
+ 2
"""
let result = run (spaces >>. expr) test2
val result : ParserResult<CudaExpr,unit> =
Success: Add (LitInt32 2,LitInt32 2)
I already managed to make the following example when the terms are indented incorrectly
2 +
2
give me an error, but not when the operator is on the wrong indentation level. I need something like a before-parse check.
let operators expr i =
let f expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f expr
f opp.ExpressionParser
The above function is how the operators phase is structured and as you can see, the term parsers get wrapped in a function that does the indentation check, but the last line is faulty.
Here is a simplified example of the full parser.
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsecCS.dll"
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsec.dll"
open FParsec
type Expr =
| V of string
| Add of Expr * Expr
let identifier = many1Satisfy2L isAsciiLetter (fun x -> isAsciiLetter x || isDigit x || x = ''') "identifier" .>> spaces |>> V
let indentations expressions (s: CharStream<_>) =
let i = s.Column
let expr_indent expr (s: CharStream<_>) =
let expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
many1 expr s
expr_indent (expressions i) s
let expr =
let opp = new OperatorPrecedenceParser<_,_,_>()
opp.AddOperator(InfixOperator("+", spaces, 6, Associativity.Left, fun x y -> Add(x,y)))
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
let rec expr s = indentations (operators identifier) s
expr
let test2 = // I'd like this to be an error.
"""
a
+
b
"""
let result = run (spaces >>. expr) test2
The full parser so far can be found here.

let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
I did not realize it 2.5 weeks ago, but what happens when a new block gets opened and expr s gets called is that the term parser gets overwritten with the new indentation and there is no way to back it up and restore it on exit. I did a bit of looking around and managed to adapt the Pratt top down parsing method for my purposes.
Here is a talk by Douglas Crockford on the method.
let poperator: Parser<_,_> =
let f c = (isAsciiIdContinue c || isAnyOf [|' ';'\t';'\n';'\"';'(';')';'{';'}';'[';']'|] c) = false
(many1Satisfy f .>> spaces)
>>= fun token ->
match dict_operator.TryGetValue token with
| true, x -> preturn x
| false, _ -> fail "unknown operator"
let rec led poperator term left (prec,asoc,m) =
match asoc with
| Associativity.Left | Associativity.None -> tdop poperator term prec |>> m left
| Associativity.Right -> tdop poperator term (prec-1) |>> m left
| _ -> failwith "impossible"
and tdop poperator term rbp =
let rec f left =
poperator >>= fun (prec,asoc,m as v) ->
if rbp < prec then led poperator term left v >>= loop
else pzero
and loop left = attempt (f left) <|>% left
term >>= loop
let operators expr i (s: CharStream<_>) =
let expr_indent expr (s: CharStream<_>) = expr_indent i (<=) expr s
let op s = expr_indent poperator s
let term s = expr_indent expr s
tdop op term 0 s
The led and tdop functions which do the actual precedence parsing are 10 lines long. The above is just a snippet of the full parser for the language I am making - in terms of syntax it is similar to F# and is indentation sensitive. Here is a more straightforward F# translation of Douglas Crockford's Javascript example.

Related

How to parse a recursive left syntax rule with FParsec?

I usually use FParsec for LL grammars, but sometimes it happens that in a whole grammar only one element requires left recursive parsing (so the grammar is no longer LL). Currently I have such a situation, I have a large LL grammar implemented with FParsec, but a small grammar element is bothering me because it obviously cannot be parsed correctly.
The syntax element in question is an access to an array index à la F#, e.g. myArray.[index] where myArray can be any expression and index can be any expression too. It turns out that my function calls use square brackets, not parentheses, and my identifiers can be qualified with dots.
An example of correct syntax for an expression is: std.fold[fn, f[myArray.[0]], std.tail[myArray]].
The .[] syntax element is obviously left recursive, but perhaps there is a trick that allows me to parse it anyway? My minimal code is as follows:
open FParsec
type Name = string list
type Expr =
(* foo, Example.Bar.fizz *)
| Variable of Name
(* 9, 17, -1 *)
| Integer of int
(* foo[3, 2], Std.sqrt[2] *)
| FunCall of Name * Expr list
(* (a + b), (a + (1 - c)) *)
| Parens of Expr
(* myArray.[0], table.[index - 1] *)
| ArrayAccess of Expr * Expr
(* a + b *)
| Addition of Expr * Expr
let opp =
new OperatorPrecedenceParser<Expr, _, _>()
let pExpr = opp.ExpressionParser
let pName =
let id =
identifier (IdentifierOptions(isAsciiIdStart = isAsciiLetter, isAsciiIdContinue = isAsciiLetter))
sepBy1 id (skipChar '.')
let pVariable = pName |>> Variable
let pInt = pint32 |>> Integer
let pFunCall =
pipe4
pName
(spaces >>. skipChar '[')
(sepBy (spaces >>. pExpr) (skipChar ','))
(spaces >>. skipChar ']')
(fun name _ args _ -> FunCall(name, args))
let pArrayAccess =
pipe5
pExpr
(spaces >>. skipChar '.')
(spaces >>. skipChar '[')
(spaces >>. pExpr)
(spaces >>. skipChar ']')
(fun expr _ _ index _ -> ArrayAccess(expr, index))
let pParens =
between (skipChar '(') (skipChar ')') (spaces >>. pExpr)
opp.TermParser <-
choice [ attempt pFunCall
pVariable
pArrayAccess
pInt
pParens ]
.>> spaces
let addInfixOperator str prec assoc mapping =
opp.AddOperator
<| InfixOperator(str, spaces, prec, assoc, (), (fun _ leftTerm rightTerm -> mapping leftTerm rightTerm))
addInfixOperator "+" 6 Associativity.Left (fun a b -> Addition(a, b))
let startParser = runParserOnString (pExpr .>> eof) () ""
printfn "%A" <| startParser "std.fold[fn, f[myArray.[0]], std.tail[myArray]]"
One way to do this is as follows: instead of making a list of parsing choices that also lists pArrayAccess like above, which will at some point cause an infinite loop, one can modify pExpr to parse the grammar element in question as an optional element following an expression:
let pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") opp.ExpressionParser
match! opt pArrayAccess with
| None -> return exp
| Some index -> return ArrayAccess(exp, index)
}
After testing, it turns out that this works very well if the following two conditions are not met:
The contents of the square brackets must not contain access to another array ;
An array cannot be accessed a second time in succession (my2DArray.[x].[y]).
This restricts usage somewhat. How can I get away with this? Is there a way to do this or do I have to change the grammar?
Finally, a solution to this problem is quite simple: just expect a list of array access. If the list is empty, then return the initial expression, otherwise fold over all the array accesses and return the result. Here is the implementation:
let rec pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") pExpr
match! many pArrayAccess with
| [] -> return exp
| xs -> return List.fold
(fun acc curr -> ArrayAccess(acc, curr)) exp xs
}
This way of doing things meets my needs, so I'd be happy with it, if anyone passes by and wants something more general and not applicable with the proposed solution, then I refer to #Martin Freedman comment, using createParserForwardedToRef().

How do parentheses work with custom data types?

Currently, I am working on a problem of parsing and showing expressions in Haskell.
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
This is the code of my data type Expr and this is how i define show function:
instance Show Expr where
show (Val x) = show x
show (Var y) = y
show (p :+: q) = par (show p ++ "+" ++ show q)
show (p :-: q) = par (show p ++ "-" ++ show q)
show (p :/: q) = par (show p ++ "/" ++ show q)
show (p :*: q) = par (show p ++ "*" ++ show q)
show (p :%: q) = par (show p ++ "%" ++ show q)
par :: String -> String
par s = "(" ++ s ++ ")"
Later i tried to transform string input into the expression but i encounter the following problem: I don't understand how parentheses in the second case are implemented in Haskell.
*Main> Val 2 :*:Val 2 :+: Val 3
((2*2)+3)
*Main> Val 2 :*:(Val 2 :+: Val 3)
(2*(2+3))
Because of that, i am a bit confused regarding how should i transform parentheses from my string into the expression. Currently i am using the following function for parsing, but for now, it just ignores parentheses which is not intended behavior:
toExpr :: String -> Expr
toExpr str = f (lexer str) (Val 0)
where
f [] expr = expr
f (c:cs) expr
|isAlpha (head c) = f cs (Var c)
|isDigit (head c) = f cs (Val (read c))
|c == "+" = (expr :+: f cs (Val 0))
|c == "-" = (expr :-: f cs (Val 0))
|c == "/" = (expr :/: f cs (Val 0))
|c == "*" = (expr :*: f cs (Val 0))
|c == "%" = (expr :%: f cs (Val 0))
|otherwise = f cs expr
Edit: few grammar mistakes
I don't understand how parentheses in the second case are implemented in Haskell.
The brackets just give precedence to a certain part of the expression to parse. The problem is not with the parenthesis you render. I think the problem is that you did not assign precedence to your operators. This thus means that, unless you specify brackets, Haskell will consider all operators to have the same precedence, and parse these left-to-right. This thus means that x ⊕ y ⊗ z is parsed as (x ⊕ y) ⊗ z.
You can define the precedence of your :+:, :*, etc. operators with infixl:
infixl 7 :*:, :/:, :%:
infixl 5 :+:, :-:
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
As for your parser (the toExpr), you will need a parsing mechanism like a LALR parser [wiki] that stores results on a stack, and thus makes proper operations.
This was my final parser which gave me the result I needed. To get the result i wanted proper grammar was added and i wrote a parses according to he grammar.
Thanks, everyone for the help.
{-
parser for the following grammar:
E -> T E'
E' -> + T E' | - T E' | <empty string>
T -> F T'
T' -> * F T' | / F T' | % F T' | <empty string>
F -> (E) | <integer> | <identifier>
-}
parseExpr :: String -> (Expr,[String])
parseExpr tokens = parseE (lexer tokens)
parseE :: [String] -> (Expr,[String])
parseE tokens = parseE' acc rest where (acc,rest) = parseT tokens
parseE' :: Expr -> [String] -> (Expr,[String])
parseE' accepted ("+":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :+: acc) rest
parseE' accepted ("-":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :-: acc) rest
parseE' accepted tokens = (accepted,tokens)
parseT :: [String] -> (Expr,[String])
parseT tokens = let (acc,rest) = parseF tokens in parseT' acc rest
parseT' :: Expr -> [String] -> (Expr,[String])
parseT' accepted ("*":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :*: acc) rest
parseT' accepted ("/":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :/: acc) rest
parseT' accepted ("%":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :%: acc) rest
parseT' accepted tokens = (accepted,tokens)
parseF :: [String] -> (Expr,[String])
parseF ("(":tokens) = (e, tail rest) where (e,rest) = parseE tokens
parseF (t:tokens)
| isAlpha (head t) = (Var t,tokens)
| isDigit (head t) = (Val (read t),tokens)
| otherwise = error ""
parseF [] = error ""
lexer :: String -> [String]
lexer [] = []
lexer (c:cs)
| elem c " \t\n" = lexer cs
| elem c "=+-*/%()" = [c]:(lexer cs)
| isAlpha c = (c:takeWhile isAlpha cs):lexer(dropWhile isAlpha cs)
| isDigit c = (c:takeWhile isDigit cs):lexer(dropWhile isDigit cs)
| otherwise = error ""

Parsing the signature of a function - Error with the arrow type - FParsec + indentation

I had already asked a question about how to parse the arrow type, this is not a duplicate, but rather an adaptation with the indentation based syntax.
Indeed, I would like to be able to analyze a syntax close to that of the ML family languages. I also introduced the syntax of the type signature of a function in Haskell, so this:
myFunction :: atype
My parser works very well for all kinds of signature types, except the arrow type when it is "alone":
foo :: a // ok
foo :: [a] // ok
foo :: (a, a) // ok
foo :: [a -> a] // ok
foo :: (a -> a, a) // ok
foo :: a -> a // error
Same for the creation of functions (to make it simple, I just expected a number as a value):
foo: a = 0 // ok
foo: [a] = 0 // ok
foo: (a, a) = 0 // ok
foo: [a -> a] = 0 // ok
foo: (a -> a, a) = 0 // ok
foo: a -> a = 0 // error
Without the indentation, all these cases work a priori.
I tried a module to parse the indentation other than the FParsec wiki, just to try and evaluate a little. It comes from there, and here is the necessary and sufficient module code for the question:
module IndentParser =
type Indentation =
| Fail
| Any
| Greater of Position
| Exact of Position
| AtLeast of Position
| StartIndent of Position
with
member this.Position = match this with
| Any | Fail -> None
| Greater p -> Some p
| Exact p -> Some p
| AtLeast p -> Some p
| StartIndent p -> Some p
type IndentState<'T> = { Indent : Indentation; UserState : 'T }
type CharStream<'T> = FParsec.CharStream<IndentState<'T>>
type IndentParser<'T, 'UserState> = Parser<'T, IndentState<'UserState>>
let indentState u = {Indent = Any; UserState = u}
let runParser p u s = runParserOnString p (indentState u) "" s
let runParserOnFile p u path = runParserOnFile p (indentState u) path System.Text.Encoding.UTF8
let getIndentation : IndentParser<_,_> =
fun stream -> match stream.UserState with
| {Indent = i} -> Reply i
let getUserState : IndentParser<_,_> =
fun stream -> match stream.UserState with
| {UserState = u} -> Reply u
let putIndentation newi : IndentParser<unit, _> =
fun stream ->
stream.UserState <- {stream.UserState with Indent = newi}
Reply(Unchecked.defaultof<unit>)
let failf fmt = fail << sprintf fmt
let acceptable i (pos : Position) =
match i with
| Any _ -> true
| Fail -> false
| Greater bp -> bp.Column < pos.Column
| Exact ep -> ep.Column = pos.Column
| AtLeast ap -> ap.Column <= pos.Column
| StartIndent _ -> true
let tokeniser p = parse {
let! pos = getPosition
let! i = getIndentation
if acceptable i pos then return! p
else return! failf "incorrect indentation at %A" pos
}
let indented<'a,'u> i (p : Parser<'a,_>) : IndentParser<_, 'u> = parse {
do! putIndentation i
do! spaces
return! tokeniser p
}
/// Allows to check if the position of the parser currently being analyzed (`p`)
/// is on the same line as the defined position (`pos`).
let exact<'a,'u> pos p: IndentParser<'a, 'u> = indented (Exact pos) p
/// Allows to check if the position of the parser currently being analyzed (`p`)
/// is further away than the defined position (`pos`).
let greater<'a,'u> pos p: IndentParser<'a, 'u> = indented (Greater pos) p
/// Allows to check if the position of the parser currently being analyzed (`p`)
/// is on the same OR line further than the defined position (`pos`).
let atLeast<'a,'u> pos p: IndentParser<'a, 'u> = indented (AtLeast pos) p
/// Simply check if the parser (`p`) exists, regardless of its position in the text to be analyzed.
let any<'a,'u> pos p: IndentParser<'a, 'u> = indented Any p
let newline<'u> : IndentParser<unit, 'u> = many (skipAnyOf " \t" <?> "whitespace") >>. newline |>> ignore
let rec blockOf p = parse {
do! spaces
let! pos = getPosition
let! x = exact pos p
let! xs = attempt (exact pos <| blockOf p) <|> preturn []
return x::xs
}
Now, here is the code I'm trying to fix for the problem I encountered:
module Parser =
open IndentParser
type Identifier = string
type Type =
| Typename of Identifier
| Tuple of Type list
| List of Type
| Arrow of Type * Type
| Infered
type Expression =
| Let of Identifier * Type * int
| Signature of Identifier * Type
type Program = Program of Expression list
// Utils -----------------------------------------------------------------
let private ws = spaces
/// All symbols granted for the "opws" parser
let private allowedSymbols =
['!'; '#'; '#'; '$'; '%'; '+'; '&'; '*'; '('; ')'; '-'; '+'; '='; '?'; '/'; '>'; '<'; '|']
/// Parse an operator and white spaces around it: `ws >>. p .>> ws`
let inline private opws str =
ws >>.
(tokeniser (pstring str >>?
(nextCharSatisfiesNot
(isAnyOf (allowedSymbols # ['"'; '''])) <?> str))) .>> ws
let private identifier =
(many1Satisfy2L isLetter
(fun c -> isLetter c || isDigit c) "identifier")
// Types -----------------------------------------------------------------
let rec typename = parse {
let! name = ws >>. identifier
return Type.Typename name
}
and tuple_type = parse {
let! types = between (opws "(") (opws ")") (sepBy (ws >>. type') (opws ","))
return Type.Tuple types
}
and list_type = parse {
let! ty = between (opws "[") (opws "]") type'
return Type.List ty
}
and arrow_type =
chainr1 (typename <|> tuple_type <|> list_type) (opws "->" >>% fun t1 t2 -> Arrow(t1, t2))
and type' =
attempt arrow_type <|>
attempt typename <|>
attempt tuple_type <|>
attempt list_type
// Expressions -----------------------------------------------------------------
let rec private let' = parse {
let! pos = getPosition
let! id = exact pos identifier
do! greater pos (opws ":")
let! ty = greater pos type'
do! greater pos (opws "=")
let! value = greater pos pint32
return Expression.Let(id, ty, value)
}
and private signature = parse {
let! pos = getPosition
let! id = exact pos identifier
do! greater pos (opws "::")
let! ty = greater pos type'
return Expression.Signature(id, ty)
}
and private expression =
attempt let'
and private expressions = blockOf expression <?> "expressions"
let private document = ws >>. expressions .>> ws .>> eof |>> Program
let private testType = ws >>. type' .>> ws .>> eof
let rec parse code =
runParser document () code
|> printfn "%A"
open Parser
parse #"
foo :: a -> a
"
Here is the error message obtained:
There is no reference to indentation in the error message, that's what troubles also, because if I implement an identical parser, except for indentation parsing, it works.
Could you put me on the right way?
EDIT
Here is the "fixed" code (the use of the function signature parser was missing + removal of unnecessary attempt):
open FParsec
// module IndentParser
module Parser =
open IndentParser
type Identifier = string
type Type =
| Typename of Identifier
| Tuple of Type list
| List of Type
| Arrow of Type * Type
| Infered
type Expression =
| Let of Identifier * Type * int
| Signature of Identifier * Type
type Program = Program of Expression list
// Utils -----------------------------------------------------------------
let private ws = spaces
/// All symbols granted for the "opws" parser
let private allowedSymbols =
['!'; '#'; '#'; '$'; '%'; '+'; '&'; '*'; '('; ')'; '-'; '+'; '='; '?'; '/'; '>'; '<'; '|']
/// Parse an operator and white spaces around it: `ws >>. p .>> ws`
let inline private opws str =
ws >>.
(tokeniser (pstring str >>?
(nextCharSatisfiesNot
(isAnyOf (allowedSymbols # ['"'; '''])) <?> str))) .>> ws
let private identifier =
(many1Satisfy2L isLetter
(fun c -> isLetter c || isDigit c) "identifier")
// Types -----------------------------------------------------------------
let rec typename = parse {
let! name = ws >>. identifier
return Type.Typename name
}
and tuple_type = parse {
let! types = between (opws "(") (opws ")") (sepBy (ws >>. type') (opws ","))
return Type.Tuple types
}
and list_type = parse {
let! ty = between (opws "[") (opws "]") type'
return Type.List ty
}
and arrow_type =
chainr1 (typename <|> tuple_type <|> list_type) (opws "->" >>% fun t1 t2 -> Arrow(t1, t2))
and type' =
attempt arrow_type <|>
typename <|>
tuple_type <|>
list_type
// Expressions -----------------------------------------------------------------
let rec private let' = parse {
let! pos = getPosition
let! id = exact pos identifier
do! greater pos (opws ":")
let! ty = greater pos type'
do! greater pos (opws "=")
let! value = greater pos pint32
return Expression.Let(id, ty, value)
}
and private signature = parse {
let! pos = getPosition
let! id = exact pos identifier
do! greater pos (opws "::")
let! ty = greater pos type'
return Expression.Signature(id, ty)
}
and private expression =
attempt let' <|>
signature
and private expressions = blockOf expression <?> "expressions"
let private document = ws >>. expressions .>> ws .>> eof |>> Program
let private testType = ws >>. type' .>> ws .>> eof
let rec parse code =
runParser document () code
|> printfn "%A"
open Parser
System.Console.Clear()
parse #"
foo :: a -> a
"
So, here are the new error messages:
and
At the moment, your code is failing on the :: signature because you haven't actually used your signature parser anywhere. You have defined expression as attempt let', but I think you meant to write attempt signature <|> attempt let'. That is why your test is failing on the second colon of ::, because it's matching the single colon of a let' and then not expecting the second colon.
Also, I think your chaining multiple attempt combinators together like attempt a <|> attempt b <|> attempt c is going to cause you problems somewhere, and that you should remove the final attempt, e.g., attempt a <|> attempt b <|> c. If you use attempt in all the possible choices, you'll end up with a parser that can succeed by parsing nothing, which is often not what you intended.
Update: I think I've found the cause and the solution.
Summary: In your opws parser, replace the line ws >>. with ws >>?.
Explanation: In all the sepBy variants (and chainr1 is a sepBy variant), FParsec expects that the separator parser will either succeed, or will fail without consuming input. (If the separator fails after consuming input, FParsec considers the entire sepBy-family parser to have failed in its entirety.) But your opws parser will consume whitespace, then fail if it doesn't find a correct operator. So when your arrow_type parser parses the string a -> a followed by a newline, the arrow after the first a is correctly matched, then it sees the second a, and then it tries to find another arrow. Since what follows next is at least one whitespace character (newlines count as whitespace), the opws "->" parser ends up consuming some input before it fails. (It fails because after that whitespace is the end of the file, not another -> token). This makes the chainr1 combinator fail, so arrow_type fails and your a -> a parser ends up being parsed as a single type a. (At which point the arrow is now unexpected).
By using >>? in your definition of opws, you ensure that if the second part of the parser fails, it will backtrack to before it matched any whitespace. That ensures that the separator parser will fail without matching input and without advancing the parse position in the character stream. Therefore, the chainr1 parser succeeds after parsing a -> a and you get the expected results.

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

NOTE: Not long ago, I had already asked a similar question. This is not a duplication, but the clarifications to be requested did not fall within the scope of the subject itself. I therefore allow myself to open another position dealing with the analysis of an ML-like syntax based on indentation, and considering everything as an instruction / expression.
For example:
"Hello" is an expression,
let foo = 2 + 1 is an instruction using an expression (2 + 1),
print foo is an instruction, ...
In short, a syntax and semantics that is quite modular and dynamic. Like F#, or OCaml.
To do this, I use F#, with the API (available on nuget) FParsec. The FParsec wiki provides an example of a syntax based on indentation, so I have taken it up again. The module in the code below used is IndentationParserWithoutBacktracking.
The example code to be parsed uses an elementary indentation, not mixing "literal" and "instructions/expressions":
loop i 1 10
loop k 1 10
print k
print i
print j
A simple code, and without context (but this is not important at the moment).
My implementation allows codes such as:
let foo = a + b
let foo =
let a = 9
let b = 1
a + b
let foo = 7
let foo =
loop i 1 10
print i
For example. (The loop and print are there just for the tests...)
The problem I have been having for a long week now, and that I can't solve, is the fact that the indentation module asks me every time an instruction is expected in a parser for a new line... Here is a screenshot:
This applies to all the examples mentioned above. I don't really understand the problem, and therefore don't know how to solve it.
Here is the code tested for this question, it meets the minimum and functional code criteria, however, FParsec must be used:
open FParsec
// This module come from 'https://github.com/stephan-tolksdorf/fparsec/wiki/Parsing-indentation-based-syntax-with-FParsec'
// I used the second module: 'IndentationParserWithoutBacktracking'
module IndentationParserWithoutBacktracking =
let tabStopDistance = 8
type LastParsedIndentation() =
[<DefaultValue>]
val mutable Value: int32
[<DefaultValue>]
val mutable EndIndex: int64
type UserState =
{Indentation: int
// We put LastParsedIndentation into the UserState so that we
// can conveniently use a separate instance for each stream.
// The members of the LastParsedIndentation instance will be mutated
// directly and hence won't be affected by any stream backtracking.
LastParsedIndentation: LastParsedIndentation}
with
static member Create() = {Indentation = -1
LastParsedIndentation = LastParsedIndentation(EndIndex = -1L)}
type CharStream = CharStream<UserState>
type Parser<'t> = Parser<'t, UserState>
// If this function is called at the same index in the stream
// where the function previously stopped, then the previously
// returned indentation will be returned again.
// This way we can avoid backtracking at the end of indented blocks.
let skipIndentation (stream: CharStream) =
let lastParsedIndentation = stream.UserState.LastParsedIndentation
if lastParsedIndentation.EndIndex = stream.Index then
lastParsedIndentation.Value
else
let mutable indentation = stream.SkipNewlineThenWhitespace(tabStopDistance, false)
lastParsedIndentation.EndIndex <- stream.Index
lastParsedIndentation.Value <- indentation
indentation
let indentedMany1 (p: Parser<'t>) label : Parser<'t list> =
fun stream ->
let oldIndentation = stream.UserState.Indentation
let indentation = skipIndentation stream
if indentation <= oldIndentation then
Reply(Error, expected (if indentation < 0 then "newline" else "indented " + label))
else
stream.UserState <- {stream.UserState with Indentation = indentation}
let results = ResizeArray()
let mutable stateTag = stream.StateTag
let mutable reply = p stream // parse the first element
let mutable newIndentation = 0
while reply.Status = Ok
&& (results.Add(reply.Result)
newIndentation <- skipIndentation stream
newIndentation = indentation)
do
stateTag <- stream.StateTag
reply <- p stream
if reply.Status = Ok
|| (stream.IsEndOfStream && results.Count > 0 && stream.StateTag = stateTag)
then
if newIndentation < indentation || stream.IsEndOfStream then
stream.UserState <- {stream.UserState with Indentation = oldIndentation}
Reply(List.ofSeq results)
else
Reply(Error, messageError "wrong indentation")
else // p failed
Reply(reply.Status, reply.Error)
open IndentationParserWithoutBacktracking
let isBlank = fun c -> c = ' ' || c = '\t'
let ws = spaces
let ws1 = skipMany1SatisfyL isBlank "whitespace"
let str s = pstring s .>> ws
let keyword str = pstring str >>? nextCharSatisfiesNot (fun c -> isLetter c || isDigit c) <?> str
// AST
type Identifier = Identifier of string
// A value is just a literal or a data name, called here "Variable"
type Value =
| Int of int | Float of float
| Bool of bool | String of string
| Char of char | Variable of Identifier
// All is an instruction, but there are some differences:
type Instr =
// Arithmetic
| Literal of Value | Infix of Instr * InfixOp * Instr
// Statements (instructions needing another instructions)
| Let of Identifier * Instr list
| Loop of Identifier * Instr * Instr * Instr list
// Other - the "print" function, from the link seen above
| Print of Identifier
and InfixOp =
// Arithmetic
| Sum | Sub | Mul | Div
// Logic
| And | Or | Equal | NotEqual | Greater | Smaller | GreaterEqual | SmallerEqual
// Literals
let numberFormat = NumberLiteralOptions.AllowMinusSign ||| NumberLiteralOptions.AllowFraction |||
NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowOctal |||
NumberLiteralOptions.AllowBinary
let literal_numeric =
numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Literal (Int(int nl.String))
else Literal (Float(float nl.String))
let literal_bool =
(choice [
(stringReturn "true" (Literal (Bool true)))
(stringReturn "false" (Literal (Bool false)))
]
.>> ws) <?> "boolean"
let literal_string =
(between (pstring "\"") (pstring "\"") (manyChars (satisfy (fun c -> c <> '"')))
|>> fun s -> Literal (String s)) <?> "string"
let literal_char =
(between (pstring "'") (pstring "'") (satisfy (fun c -> c <> '''))
|>> fun c -> Literal (Char c)) <?> "character"
let identifier =
(many1Satisfy2L isLetter (fun c -> isLetter c || isDigit c) "identifier"
|>> Identifier) <?> "identifier"
let betweenParentheses p =
(between (str "(") (str ")") p) <?> ""
let variable = identifier |>> fun id -> Literal (Variable id)
let literal = (attempt literal_numeric <|>
attempt literal_bool <|>
attempt literal_char <|>
attempt literal_string <|>
attempt variable)
// Instressions and statements
let pInstrs, pInstrimpl = createParserForwardedToRef()
// `ploop` is located here to force `pInstrs` to be of the type `Instr list`, `ploop` requesting an instression list.
let ploop =
pipe4
(keyword "loop" >>. ws1 >>. identifier)
(ws1 >>. literal)
(ws1 >>. literal)
(pInstrs)
(fun id min max stmts -> Loop(id, min, max, stmts))
// `singlepInstr` allows to use only one Instression, used just after.
let singlepInstr =
pInstrs |>> fun ex -> ex.Head
let term =
(ws >>. singlepInstr .>> ws) <|>
(betweenParentheses (ws >>. singlepInstr)) <|>
(ws >>. literal .>> ws) <|>
(betweenParentheses (ws >>. literal))
let infixOperator (p: OperatorPrecedenceParser<_, _, _>) op prec map =
p.AddOperator(InfixOperator(op, ws, prec, Associativity.Left, map))
let ops =
// Arithmetic
[ "+"; "-"; "*"; "/"; "%" ] #
// Logical
[ "&&"; "||"; "=="; "!="; ">"; "<"; ">="; "<=" ]
let opCorrespondance op =
match op with
// Arithmetic operators
| "+" -> Sum | "-" -> Sub
| "*" -> Mul | "/" -> Div
// Logical operators
| "&&" -> And | "||" -> Or
| "==" -> Equal | "!=" -> NotEqual
| ">" -> Greater | "<" -> Smaller
| ">=" -> GreaterEqual | "<=" -> SmallerEqual
| _ -> failwith ("Unknown operator: " + op)
let opParser = new OperatorPrecedenceParser<Instr, unit, UserState>()
for op in ops do
infixOperator opParser op 1 (fun x y -> Infix(x, opCorrespondance op, y))
opParser.TermParser <- term
// Statements
(*
- let:
let <identifier> = <instruction(s) / value>
- print:
print <identifier>
- loop:
loop <identifier> <literal> <literal> <indented statements>
*)
let plet =
pipe2
(keyword "let" >>. ws1 >>. identifier)
(ws >>. str "=" >>. ws >>. pInstrs)
(fun id exp -> Let(id, exp))
let print =
keyword "print" >>. ws1 >>. identifier
|>> Print
let instruction =
print <|> ploop <|> plet <|>
opParser.ExpressionParser <|>
literal
pInstrimpl := indentedMany1 instruction "instruction"
let document = pInstrs .>> spaces .>> eof
let test str =
match runParserOnString document (UserState.Create()) "" str with
| Success(result, _, _) -> printfn "%A" result
| Failure(errorMsg, _, _) -> printfn "%s" errorMsg
System.Console.Clear()
let code = test #"
let foo = a + b
"
I would like to understand first of all why it doesn't work, but also to be able to find a solution to my problem, and that this solution can be extended to the potential syntax additions of the parser.
Awaiting a salutary answer, thank you.
In order to understand why your parser doesn't work, you need to isolate the issues.
If I understand you correctly, you want your let parser to support either a single instruction on the same line or indented instructions on subsequent lines, e.g:
let x = instruction
let b =
instruction
instruction
If you can't get your existing implementation to work, I'd recommend going back to the implementation on the Wiki and trying to just add support for the let statement.
For example, I made the Wiki parser accept simple let statements with the following modifications:
type Statement = Loop of Identifier * int * int * Statement list
| Print of Identifier
| Let of Identifier * Statement list
let ws = skipManySatisfy isBlank
let str s = pstring s .>> ws
let statement, statementRef = createParserForwardedToRef()
let indentedStatements = indentedMany1 statement "statement"
let plet = keyword "let" >>. pipe2 (ws1 >>. identifier)
(ws >>. str "=" >>. ws
>>. (indentedStatements
<|> (statement |>> fun s -> [s])))
(fun id exp -> Let(id, exp))
statementRef := print <|> loop <|> plet
Note that in the modified version statement is now the parser forwarded to a ref cell, not indentedStatements.
Note also that ws is not implemented with spaces, like in your parser. This is important because spaces also consumes newlines, which would prevent the indentedMany1 from seeing the newline and properly calculating the indentation.
The reason your parser produced an "Expecting: newline" error is that indentedMany1 needs a newline at the beginning of the indented sequence in order to be able to calculate the indentation. You would have to modify the implementation of indentedMany1 if you wanted to support e.g. the following indentation pattern:
let x = instruction
instruction
instruction

Parsing "x y z" with the precedence of multiply

I'm trying to write a parser for the Mathematica language in F# using FParsec.
I have written one for a MiniML that supports the syntax f x y = (f(x))(y) with high precedence for function application. Now I need to use the same syntax to mean f*x*y and, therefore, have the same precedence as multiply. In particular, x y + 2 = x*y + 2 whereas x y ^ 2 = x * y^2.
How can this be accomplished?
As Stephan pointed out in a comment you can split the operator parser into two separate parsers and put your own parser in the middle for space-separated expressions. The following code demonstrates this:
#I "../packages/FParsec.1.0.1/lib/net40-client"
#r "FParsec"
#r "FParsecCS"
open FParsec
open System.Numerics
type Expr =
| Int of BigInteger
| Add of Expr * Expr
| Mul of Expr * Expr
| Pow of Expr * Expr
let str s = pstring s >>. spaces
let pInt : Parser<_, unit> = many1Satisfy isDigit |>> BigInteger.Parse .>> spaces
let high = OperatorPrecedenceParser<Expr,unit,unit>()
let low = OperatorPrecedenceParser<Expr,unit,unit>()
let pHighExpr = high.ExpressionParser .>> spaces
let pLowExpr = low.ExpressionParser .>> spaces
high.TermParser <-
choice
[ pInt |>> Int
between (str "(") (str ")") pLowExpr ]
low.TermParser <-
many1 pHighExpr |>> (function [f] -> f | fs -> List.reduce (fun f g -> Mul(f, g)) fs) .>> spaces
low.AddOperator(InfixOperator("+", spaces, 10, Associativity.Left, fun f g -> Add(f, g)))
high.AddOperator(InfixOperator("^", spaces, 20, Associativity.Right, fun f g -> Pow(f, g)))
run (spaces >>. pLowExpr .>> eof) "1 2 + 3 4 ^ 5 6"
The output is:
Add (Mul (Int 1,Int 2),Mul (Mul (Int 3,Pow (Int 4,Int 5)),Int 6))
which represents 1 * 2 + 3 * 4^5 * 6 as expected.

Resources