Grammar Precedence and associativity - parsing

if i am given following grammar
E->E W T|T
T->L S T|L
L->a|b|c
W->*
S->+|-
From following grammar i see that since + and - are deeper down the tree they have higher precedence then *, am i correct on that?
Also since this is left recursion i can assume left associativity?
Since operators can have different associativity i a confused how to tell which one has which one.
I guess what i am asking is how can i tell operator associativity based on grammar?

Start with
T->L S T|L
and consider a+b+c, which can be produced from T as follows:
T -> L S T
-> L S (L S T)
-> L S (L S (L))
-> L S (L S (c))
-> L S (b + (c))
-> L + (b + (c))
-> a + (b + (c))
(The parentheses are only there as a shorthand for the parse tree.)
That rightmost derivation is unique; T cannot match (a + b) + c because a + b is not an L.
Consequently, + and - are "right-associative".
By contrast, we have
E->E W T|T
so a*b*c will be produced as follows:
E -> E W T
-> E W L
-> E W c
-> E * c
-> (E W T) * c
-> (E W L) * c
-> (E W b) * c
-> (E * b) * c
-> ((T) * b) * c
-> ((L) * b) * c
-> ((a) * b) * c
Again, that parse is unambiguous.
I didn't do a+b*c, so it would be a good exercise.

Related

Could not parse the left-hand side (m + 1) * n

I just started learning Agda reading Programming Language Foundations in Agda. Right in the first chapter there's a definition of multiplication with one of the cases being (suc m) * n = n + (m * n). I assumed it could be nicer expressed as (m + 1) * n = n + (m * n), but apparently this is not the case. The following program:
data ℕ : Set where
zero : ℕ
suc : ℕ → ℕ
_+_ : ℕ → ℕ → ℕ
zero + n = n
(suc m) + n = suc (m + n)
{-# BUILTIN NATURAL ℕ #-}
_*_ : ℕ → ℕ → ℕ
zero * n = zero
-- This is fine:
-- (suc m) * n = n + (m * n)
-- This is not:
(m + 1) * n = n + (m * n)
fails with:
Could not parse the left-hand side (m + 1) * n
Operators used in the grammar:
* (infix operator, level 20) [_*_ (/Users/proxi/Documents/Projekty/Agda/multiply.agda:11,1-4)]
+ (infix operator, level 20) [_+_ (/Users/proxi/Documents/Projekty/Agda/multiply.agda:5,1-4)]
when scope checking the left-hand side (m + 1) * n in the
definition of _*_
I believe in Agda terms one could say that definition using constructor works fine, but definition using operator does not. Why is that so? Is this never possible, or does it depend on how operator (function) is defined?
Using functions in patterns is not supported.
Note also that if functions were allowed in patterns, it would be (1 + m) * n rather than (m + 1) * n, because _+_ is defined by pattern matching on its first argument, so 1 + m reduces to suc m and m + 1 is stuck.
As has been pointed out, you cannot just use a function to pattern-match. However, it is possible to declare pattern-matching extensions that express nearly the same thing:
open import Data.Nat.Base as Nat
using (ℕ; zero) renaming (suc to 1+_)
pattern 2+_ n = 1+ 1+ n
Then you can use (1+ m) * n, and even (2+ m).

how to tell whether Parsec parser uses constant heap space in Haskell

In a recent question, I asked about the following
parsec parser:
manyLength
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLength p = go 0
where
go :: Int -> ParsecT s u m Int
go !i = (p *> go (i + 1)) <|> pure i
This function is similar to many. However, instead of returning [a], it
returns the number of times it was able to successfully run p.
This works well, except for one problem. It doesn't run in constant heap
space.
In the linked question, Li-yao
Xia gives an alternative way of
writing manyLength that uses constant heap space:
manyLengthConstantHeap
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeap p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
((p *> pure True) <|> pure False) >>=
\success -> if success then go (i+1) else pure i
This is a significant improvement, but I don't understand why
manyLengthConstantHeap uses constant heap space, while my original manyLength doesn't.
If you inline (<|>) in manyLength, it looks somewhat like this:
manyLengthInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let meerr :: ParserError -> m b
meerr err =
let neok :: Int -> State s u -> ParserError -> m b
neok y s' err' = eok y s' (mergeError err err')
neerr :: ParserError -> m b
neerr err' = eerr $ mergeError err err'
in unParser (pure i) s cok cerr neok neerr
in unParser (p *> go (i + 1)) s cok cerr eok meerr
If you inline (>>=) in manyLengthConstantHeap, it looks somewhat like this:
manyLengthConstantHeapInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeapInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let mcok :: Bool -> State s u -> ParserError -> m b
mcok success s' err =
let peok :: Int -> State s u -> ParserError -> m b
peok int s'' err' = cok int s'' (mergeError err err')
peerr :: ParserError -> m b
peerr err' = cerr (mergeError err err')
in unParser
(if success then go (i + 1) else pure i)
s'
cok
cerr
peok
peerr
meok :: Bool -> State s u -> ParserError -> m b
meok success s' err =
let peok :: Int -> State s u -> ParserError -> m b
peok int s'' err' = eok int s'' (mergeError err err')
peerr :: ParserError -> m b
peerr err' = eerr (mergeError err err')
in unParser
(if success then go (i + 1) else pure i)
s'
cok
pcerr
peok
peerr
in unParser ((p *> pure True) <|> pure False) s mcok cerr meok eerr
Here is the ParsecT constructor for completeness:
newtype ParsecT s u m a = ParsecT
{ unParser
:: forall b .
State s u
-> (a -> State s u -> ParseError -> m b) -- consumed ok
-> (ParseError -> m b) -- consumed err
-> (a -> State s u -> ParseError -> m b) -- empty ok
-> (ParseError -> m b) -- empty err
-> m b
}
Why does manyLengthConstantHeap run with constant heap space, while
manyLength does not? It doesn't look like the recursive call to go is in
the tail-call position for either manyLengthConstantHeap or manyLength.
When writing parsec parsers in the future, how can I know the space
requirements for a given parser? How did Li-yao Xia know that
manyLengthConstantHeap would be okay?
I don't feel like I have any confidence in predicting which parsers will use a
lot of memory on a large input.
Is there an easy way to figure out whether a given function will be
tail-recursive in Haskell without running it? Or better yet, without compiling
it?

Parsing non binary operators with Parsec

Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.
Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?
For example, the following expression should be parsed into the tree
(a + b) + c + d * e + f
Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.
This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:
module Lib where
import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
data Expr =
Identifier String
| Multiply Expr Expr
| Add Expr Expr
instance Show Expr where
show (Identifier s) = s
show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell
-- Infix parser.
infix_ operator func =
Expr.Infix (reserved operator >> return func) Expr.AssocLeft
parser =
Expr.buildExpressionParser table term <?> "expression"
where
table = [[infix_ "*" Multiply], [infix_ "+" Add]]
term =
parens parser
<|> (Identifier <$> identifier)
<?> "term"
Running this in GHCi:
λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)
There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:
data Expr' =
Identifier' String
| Add' [Expr']
| Multiply' [Expr']
deriving (Show)
collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e#(Add l r) f =
collect l f ++ collect r f
collect e#(Multiply l r) f =
collect l f ++ collect r f
isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False
isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False
optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e#(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e#(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))
I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.

Extending Grammar for LR Parser

I have the following grammar for basic arithmetic expressions
E -> E + T
E -> T
T -> T * F
T -> F
F -> (E)
F -> id
Where E is expression, T is term, F is factor. I'm wondering how I can extend this grammar to support further arithmetic operations such exponents possibly represented with ^ or logarithm.
Thanks
Since exponentation has higher precedence you could use the following grammar:
E -> E + T
E -> T
T -> T * F
T -> F
F -> G ^ F
F -> G
G -> log(E)
G -> (E)
G -> id

How do I prove a "seemingly obvious" fact when relevant types are abstracted by a lambda in Idris?

I am writing a basic monadic parser in Idris, to get used to the syntax and differences from Haskell. I have the basics of that working just fine, but I am stuck on trying to create VerifiedSemigroup and VerifiedMonoid instances for the parser.
Without further ado, here's the parser type, Semigroup, and Monoid instances, and the start of a VerifiedSemigroup instance.
data ParserM a = Parser (String -> List (a, String))
parse : ParserM a -> String -> List (a, String)
parse (Parser p) = p
instance Semigroup (ParserM a) where
p <+> q = Parser (\s => parse p s ++ parse q s)
instance Monoid (ParserM a) where
neutral = Parser (const [])
instance VerifiedSemigroup (ParserM a) where
semigroupOpIsAssociative (Parser p) (Parser q) (Parser r) = ?whatGoesHere
I'm basically stuck after intros, with the following prover state:
-Parser.whatGoesHere> intros
---------- Other goals: ----------
{hole3},{hole2},{hole1},{hole0}
---------- Assumptions: ----------
a : Type
p : String -> List (a, String)
q : String -> List (a, String)
r : String -> List (a, String)
---------- Goal: ----------
{hole4} : Parser (\s => p s ++ q s ++ r s) =
Parser (\s => (p s ++ q s) ++ r s)
-Parser.whatGoesHere>
It looks like I should be able to use rewrite together with appendAssociative somehow,
but I don't know how to "get inside" the lambda \s.
Anyway, I'm stuck on the theorem-proving part of the exercise - and I can't seem to find much Idris-centric theorem proving documentation. I guess maybe I need to start looking at Agda tutorials (though Idris is the dependently-typed language I'm convinced I want to learn!).
The simple answer is that you can't. Reasoning about functions is fairly awkward in intensional type theories. For example, Martin-Löf's type theory is unable to prove:
S x + y = S (x + y)
0 + y = y
x +′ S y = S (x + y)
x +′ 0 = x
_+_ ≡ _+′_ -- ???
(as far as I know, this is an actual theorem and not just "proof by lack of imagination"; however, I couldn't find the source where I read it). This also means that there is no proof for the more general:
ext : ∀ {A : Set} {B : A → Set}
{f g : (x : A) → B x} →
(∀ x → f x ≡ g x) → f ≡ g
This is called function extensionality: if you can prove that the results are equal for all arguments (that is, the functions are equal extensionally), then the functions are equal as well.
This would work perfectly for the problem you have:
<+>-assoc : {A : Set} (p q r : ParserM A) →
(p <+> q) <+> r ≡ p <+> (q <+> r)
<+>-assoc (Parser p) (Parser q) (Parser r) =
cong Parser (ext λ s → ++-assoc (p s) (q s) (r s))
where ++-assoc is your proof of associative property of _++_. I'm not sure how would it look in tactics, but it's going to be fairly similar: apply congruence for Parser and the goal should be:
(\s => p s ++ q s ++ r s) = (\s => (p s ++ q s) ++ r s)
You can then apply extensionality to get assumption s : String and a goal:
p s ++ q s ++ r s = (p s ++ q s) ++ r s
However, as I said before, we don't have function extensionality (note that this is not true for type theories in general: extensional type theories, homotopy type theory and others are able to prove this statement). The easy option is to assume it as an axiom. As with any other axiom, you risk:
Losing consistency (i.e. being able to prove falsehood; though I think function extensionality is OK)
Breaking reduction (what does a function that does case analysis only for refl do when given this axiom?)
I'm not sure how Idris handles axioms, so I won't go into details. Just beware that axioms can mess up some stuff if you are not careful.
The hard option is to work with setoids. A setoid is basically a type equipped with custom equality. The idea is that instead of having a Monoid (or VerifiedSemigroup in your case) that works on the built-in equality (= in Idris, ≡ in Agda), you have a special monoid (or semigroup) with different underlying equality. This is usually done by packing the monoid (semigroup) operations together with the equality and bunch of proofs, namely (in pseudocode):
= : A → A → Set -- equality
_*_ : A → A → A -- associative binary operation
1 : A -- neutral element
=-refl : x = x
=-trans : x = y → y = z → x = z
=-sym : x = y → y = x
*-cong : x = y → u = v → x * u = y * v -- the operation respects
-- our equality
*-assoc : x * (y * z) = (x * y) * z
1-left : 1 * x = x
1-right : x * 1 = x
The choice of equality for parsers is clear: two parsers are equal if their outputs agree for all possible inputs.
-- Parser equality
_≡p_ : {A : Set} (p q : ParserM A) → Set
Parser p ≡p Parser q = ∀ x → p x ≡ q x
This solution comes with different tradeoffs, namely that the new equality cannot fully substitute the built-in one (this tends to show up when you need to rewrite some terms). But it's great if you just want to show that your code does what it's supposed to do (up to some custom equality).

Resources