Backtracking Recursive Descent Parser for the following grammar

Backtracking Recursive Descent Parser for the following grammar - parsing

I am trying to figure out some details involving parsing expression grammars, and am stuck on the following question:
For the given grammar:
a = b Z
b = Z Z | Z
(where lower-case letters indicate productions, and uppercase letters indicate terminals).
Is the production "a" supposed to match against the string "Z Z"?
Here is the pseudo-code that I've seen the above grammar get translated to, where each production is mapped to a function that outputs two values. The first indicates whether the parse succeeded. And the second indicates the resulting position in the stream after the parse.
defn parse-a (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b(i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b1 (i:Int) -> [True|False, Int] :
val [r1, i1] = eat("Z", i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b2 (i:Int) -> [True|False, Int] :
eat("Z", i)
defn parse-b (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b1(i)
if r1 : [r1, i1]
else : parse-b2(i)
The above code will fail when trying to parse the production "a" on the input "Z Z". This is because the parsing function for "b" is incorrect. It will greedily consume both Z's in the input and succeed, and then leave nothing left for a to parse. Is this what a parsing expression grammar is supposed to do? The pseudocode in Ford's thesis seems to indicate this.
Thanks very much.
-Patrick

In PEGs, disjunctions (alternatives) are indeed ordered. In Ford's thesis, the operator is written / and called "ordered choice", which distinguishes it from the | disjunction operator.
That makes PEGs fundamentally different from CFGs. In particular, given PEG rules a -> b Z and b -> Z Z / Z, a will not match Z Z.

Thanks for your reply Rici.
I re-read Ford's thesis much more closely, and it reaffirms what you said. PEGs / operator are both ordered and greedy. So the rule presented above is supposed to fail.
-Patrick

Related

Folding Set to make a new Set

Say I have a type Prop for propositions:
type Prop =
| P of string
| Disjunction of Prop * Prop
| Conjunction of Prop * Prop
| Negation of Prop
Where:
• A "p" representing the atom P,
• Disjunction(A "P", A "q") representing the proposition P ∨ q.
• Conjunction(A "P", A "q") representing the proposition P ∧ q.
• Negation(A "P") representing the proposition ¬P.
I'm supposed to use a set-based representation of formulas in disjunctive normal form. Since conjunction is commutative, associative and (a ∧ a) is equivalent to a it is convenient to represent a basic conjunct bc by its set of literals litOf(bc).
bc is defined as: A literal is an atom or the negation of an atom and a basic conjunct is a conjunction of literals
This leads me to the function for litOf:
let litOf bc =
Set.fold (fun acc (Con(x, y)) -> Set.add (x, y) acc) Set.empty bc
I'm pretty sure my litOf is wrong, and I get an error on the (Con(x,y)) part saying: "Incomplete pattern m
atches on this expression. For example, the value 'Dis (_, _)' may indicate a cas
e not covered by the pattern(s).", which I also have no clue what actually means in this context.
Any hints to how I can procede?

I assume your example type Prop changed on the way from keyboard to here, and orginally looked like this:
type Prop =
| P of string
| Dis of Prop * Prop
| Con of Prop * Prop
| Neg of Prop
There are several things that tripped you up:
Set.fold operates on input that is a set, and does something for each element in the set. In your case, the input is a boolean clause, and the output is a set.
You did not fully define what constitutes a literal. For a conjunction, the set of literals is the union of the literals on the left and on the right side. But what about a disjunction? The compiler error message means exactly that.
Here's what I think you are after:
let rec literals = function
| P s -> Set.singleton s
| Dis (x, y) -> Set.union (literals x) (literals y)
| Con (x, y) -> Set.union (literals x) (literals y)
| Neg x -> literals x
With that, you will get
> literals (Dis (P "A", Neg (Con (P "B", Con (P "A", P "C")))))
val it : Set<string> = set ["A"; "B"; "C"]

In Agda is it possible to define a datatype that has equations?

I want to describe the integers:
data Integer : Set where
Z : Integer
Succ : Integer -> Integer
Pred : Integer -> Integer
?? what else
The above does not define the Integers. We need Succ (Pred x) = x and Pred (Succ x) = x. However,
spReduce : (m : Integer) -> Succ (Pred m) = m
psReduce : (m : Integer) -> Pred (Succ m) = m
Can't be added to the data type. A better definition of the integers is most certainly,
data Integers : Set where
Pos : Nat -> Integers
Neg : Nat -> Integers
But I am curious if there is a way to add equations to a datatype.

I'd go about it by defining a record:
record Integer (A : Set) : Set where
constructor integer
field
z : A
succ : A -> A
pred : A -> A
spInv : (x : A) -> succ (pred x) == x
psInv : (x : A) -> pred (succ x) == x
This record can be used as a proof that a certain type A behaves like an Integer should.

It seems that what you'd like to do is define your Integers type as a quotient type by the equivalence relation that identifies Succ (Pred m) with m, etc. Agda doesn't support that anymore -- there was an experimental library that tried to do that (by forcing all functions over a quotient type to be defined via a helper function that requires proof of representational invariance), but then someone discovered that the implementation wasn't watertight enough and so could lead to inconsistencies (basically by accessing one of its postulates that was supposed to be inaccessible from the outside), for the details you can see this message:
We were not sure if this hack was sound or not. Now, thanks to Dan
Doel, I know that it isn't.
[...]
Given these observations it is easy to prove that the postulate above
is unsound:
I think your best bet at the moment (if you want to/need to stick to a loose representation with an equivalency to tighten it up) is to define a Setoid for your type..

What's the meaning of the `in` keyword in this example (F#)

I've been trying to get my head round various bits of F# (I'm coming from more of a C# background), and parsers interest me, so I jumped at this blog post about F# parser combinators:
http://santialbo.com/blog/2013/03/24/introduction-to-parser-combinators
One of the samples here was this:
/// If the stream starts with c, returns Success, otherwise returns Failure
let CharParser (c: char) : Parser<char> =
let p stream =
match stream with
| x::xs when x = c -> Success(x, xs)
| _ -> Failure
in p //what does this mean?
However, one of the things that confused me about this code was the in p statement. I looked up the in keyword in the MSDN docs:
http://msdn.microsoft.com/en-us/library/dd233249.aspx
I also spotted this earlier question:
Meaning of keyword "in" in F#
Neither of those seemed to be the same usage. The only thing that seems to fit is that this is a pipelining construct.

The let x = ... in expr allows you to declare a binding for some variable x which can then be used in expr.
In this case p is a function which takes an argument stream and then returns either Success or Failure depending on the result of the match, and this function is returned by the CharParser function.
The F# light syntax automatically nests let .. in bindings, so for example
let x = 1
let y = x + 2
y * z
is the same as
let x = 1 in
let y = x + 2 in
y * z
Therefore, the in is not needed here and the function could have been written simply as
let CharParser (c: char) : Parser<char> =
let p stream =
match stream with
| x::xs when x = c -> Success(x, xs)
| _ -> Failure
p

The answer from Lee explains the problem. In F#, the in keyword is heritage from earlier functional languages that inspired F# and required it - namely from ML and OCaml.
It might be worth adding that there is just one situation in F# where you still need in - that is, when you want to write let followed by an expression on a single line. For example:
let a = 10
if (let x = a * a in x = 100) then printfn "Ok"
This is a bit funky coding style and I would not normally use it, but you do need in if you want to write it like this. You can always split that to multiple lines though:
let a = 10
if ( let x = a * a
x = 100 ) then printfn "Ok"

F# How to tokenise user input: separating numbers, units, words?

I am fairly new to F#, but have spent the last few weeks reading reference materials. I wish to process a user-supplied input string, identifying and separating the constituent elements. For example, for this input:
XYZ Hotel: 6 nights at 220EUR / night
plus 17.5% tax
the output should resemble something like a list of tuples:
[ ("XYZ", Word); ("Hotel:", Word);
("6", Number); ("nights", Word);
("at", Operator); ("220", Number);
("EUR", CurrencyCode); ("/",
Operator); ("night", Word);
("plus", Operator); ("17.5",
Number); ("%", PerCent); ("tax",
Word) ]
Since I'm dealing with user input, it could be anything. Thus, expecting users to comply with a grammar is out of the question. I want to identify the numbers (could be integers, floats, negative...), the units of measure (optional, but could include SI or Imperial physical units, currency codes, counts such as "night/s" in my example), mathematical operators (as math symbols or as words including "at" "per", "of", "discount", etc), and all other words.
I have the impression that I should use active pattern matching -- is that correct? -- but I'm not exactly sure how to start. Any pointers to appropriate reference material or similar examples would be great.

I put together an example using the FParsec library. The example is not robust at all but it gives a pretty good picture of how to use FParsec.
type Element =
| Word of string
| Number of string
| Operator of string
| CurrencyCode of string
| PerCent of string
let parsePerCent state =
(parse {
let! r = pstring "%"
return PerCent r
}) state
let currencyCodes = [|
pstring "EUR"
|]
let parseCurrencyCode state =
(parse {
let! r = choice currencyCodes
return CurrencyCode r
}) state
let operators = [|
pstring "at"
pstring "/"
|]
let parseOperator state =
(parse {
let! r = choice operators
return Operator r
}) state
let parseNumber state =
(parse {
let! e1 = many1Chars digit
let! r = opt (pchar '.')
let! e2 = manyChars digit
return Number (e1 + (if r.IsSome then "." else "") + e2)
}) state
let parseWord state =
(parse {
let! r = many1Chars (letter <|> pchar ':')
return Word r
}) state
let elements = [|
parseOperator
parseCurrencyCode
parseWord
parseNumber
parsePerCent
|]
let parseElement state =
(parse {
do! spaces
let! r = choice elements
do! spaces
return r
}) state
let parseElements state =
manyTill parseElement eof state
let parse (input:string) =
let result = run parseElements input
match result with
| Success (v, _, _) -> v
| Failure (m, _, _) -> failwith m

It sounds like what you really want is just a lexer. A good alternative to FSParsec would be FSLex. (Good intro tutorial, albiet somewhat dated, can be found on my old blog here.) Using FSLex you can take your input text:
XYZ Hotel: 6 nights at 220EUR / night plus 17.5% tax
And get it properly tokenized into something like:
[ Word("XYZ"); Hotel; Int(6); Word("nights"); Word("at"); Int(220); EUR; ... ]
The next step, once you have an List of tokens, is to do some form of pattern matching / analysis to extract semantic information (which I assume is what you are really after). With the normalized token stream, it should be as simple as:
let rec processTokenList tokens =
match tokens with
| Float(x) :: Keyword("EUR") :: rest -> // Dollar amount x
| Word(x) :: Keyword("Hotel") :: rest -> // Hotel x
| hd :: rest -> // Couldn't find anything interesting...
processTokenList rest
That should at least get you started. But note that as your input gets more 'formal', so will the usefulness of your lexing. (And if you only accept a very specific input, then you can use a proper parser and be done with it!)

F-Sharp (F#) untyped infinity

I wonder why F-Sharp doesn't support infinity.
This would work in Ruby (but not in f#):
let numbers n = [1 .. 1/0] |> Seq.take(n)
-> System.DivideByZeroException: Attempted to divide by zero.
I can write the same functionality in much complex way:
let numbers n = 1 |> Seq.unfold (fun i -> Some (i, i + 1)) |> Seq.take(n)
-> works
However I think that first one would be much more clear.
I can't find any easy way to use dynamically typed infinity in F#.
There is infinity keyword but it is float:
let a = Math.bigint +infinity;;
System.OverflowException: BigInteger cannot represent infinity.
at System.Numerics.BigInteger..ctor(Double value)
at .$FSI_0045.main#()
stopped due to error
Edit: also this seems to work in iteration:
let numbers n = Seq.initInfinite (fun i -> i+1) |> Seq.take(n)

First of all, F# lists are not lazy, (I'm not sure Ruby lists are lazy), so even with a general notion of infinity your first example can never work.
Second, there is no infinity value in Int32. Only MaxValue. There is a positive and negative infinity in Double though.
Putting it together, this works:
let numbers n = seq { 1. .. 1./0. } |> Seq.take(n)
I feel however Seq.initInfinite is your best option. The code above looks strange to me. (Or at least use Double.PositiveInfinity instead of 1./0.)
At first sight, a nice option to have in the language would be an infinite range operator like in haskell: seq { 1.. } The problem is that it would only work for seq, so I guess the extra work to support postfix operators is not worth it for this feature alone.
Bottom line: in my opinion, use Seq.initInfinite.

I think the following is the best solution to infinite ranges in F#; by marking the function inline we do better than "dynamically typed infinity" we get structurally typed infinite ranges (works with int32, int64, bigint, ... any type that which has a static member + which takes two arguments of its own type and returns a value of it's own type):
let inline infiniteRange start skip =
seq {
let n = ref start
while true do
yield n.contents
n.contents <- n.contents + skip
}
//val inline infiniteRange :
// ^a -> ^b -> seq< ^a>
// when ( ^a or ^b) : (static member ( + ) : ^a * ^b -> ^a)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Backtracking Recursive Descent Parser for the following grammar - parsing

Thanks for your reply Rici. I re-read Ford's thesis much more closely, and it reaffirms what you said. PEGs / operator are both ordered and greedy. So the rule presented above is supposed to fail. -Patrick

Related

Folding Set to make a new Set

In Agda is it possible to define a datatype that has equations?

What's the meaning of the `in` keyword in this example (F#)

F# How to tokenise user input: separating numbers, units, words?

F-Sharp (F#) untyped infinity

Categories

Resources