how would this grammar be parsed? - parsing

I am studying grammars, and in the book it lists that this grammar can unambiguously parse subtraction and multiplication of numbers.
This is the grammar:
S -> E
E -> E - F
E -> F
F -> F / NUM
F -> NUM
NUM -> 0-9
say you have some input like 1 - 2 - 3. From my understanding the parse tree would go something like this:
S
|
E
/|\
E - F
/|\
E - E
....
Here we get into an infinite loop since E -> E - F and that E again goes to another E - F. We can't just magically know to choose to go to E->F (which is the terminal that we want).
I feel that I am understanding something incorrectly here. Can someone please explain a bit about how this actually works?

As far as I know, instead of using a top-down parser that starts at the root, a bottom-up parser
that starts with the leaves of the tree, in this case 1 - 2 - 3, would be able to construct a valid parse tree like this:
NUM - NUM - NUM
\ | /
F - F - F
\ | /
E - F - F
| /
E - F
|
E
|
S

Related

Is it possible to transform this grammar to be LR(1)?

The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.

Proving MC/DC unique cause definition compliance

I'm reading the following paper on MC/DC: http://shemesh.larc.nasa.gov/fm/papers/Hayhurst-2001-tm210876-MCDC.pdf.
I have the source code: Z := (A or B) and (C or D) and the following test cases:
-----------------
| A | F F T F T |
| B | F T F T F |
| C | T F F T T |
| D | F T F F F |
| Z | F T F T T |
-----------------
I want to prove that the mentioned test cases comply with unique cause definition.
I started by eliminating masked tests:
A or B = F T T T T, meaning it masks the first test case from C or D as F and (C or D) = F.
C or D = T T F T T, meaning it masks the third test case from A or B as (A or B) and F = F.
I then determined MC/DC:
Required test cases for A or B:
F F (first case)
T F (fifth case)
F T (second or fourth case)
Required test cases for C or D:
F F (third case)
T F (fourth or fifth case)
F T (second case)
Required test cases for (A or B) and (C or D):
T T (second, fourth or fifth case)
F T (first case)
T F (third case)
According to the paper, this example doesn't complies to unique cause definition. Instead, they propose changing the second test case from F T F T to T F F T.
-----------------
| A | F T T F T |
| B | F F F T F |
| C | T F F T T |
| D | F T F F F |
| Z | F T F T T |
-----------------
I determined MC/DC for A or B again:
F F (first case)
T F (fifth case)
F T (fourth case)
Then, they introduce the following independence pairs table that shows the difference between both examples (in page 38):
I understand that for the first example, the independence pair that they show changes two variables instead of one, however I don't understand how they are computing the independence pairs.
In the A column, I can infer they take F F T F from the test cases table's A row, and they compute the independence pair as the same test case with only A changed (T F T F).
In B's column, however, they pick F F T F again. According to my thinking, this should equal to the B's column: F T F T instead.
The rest of the letters show the same dilemma.
Also for D's first example column, they show that the independence pair of F T F T is T F F F, which ruins my theory that they are computing the independence pair from the first value, and proving that they are picking it from somewhere else.
Can someone explain better how (and from where) do they construct such independence pair table?
First the let’s re-read the definitions:
(From www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/media/cast-10.pdf)
DO-178B/ED-12B includes the following definitions:
Condition
A Boolean expression containing no Boolean operators.
Decision
A Boolean expression composed of conditions and zero or more Boolean operators.
A decision without a Boolean operator is a condition.
If a condition appears more than once in a decision, each occurrence is a
distinct condition.
Decision Coverage
Every point of entry and exit in the program has been invoked at least once
and every decision in the program has taken on all possible outcomes at least once.
Modified Condition/Decision Coverage
Every point of entry and exit in the program has been invoked at least once,
every condition in a decision in the program has taken all possible outcomes
at least once, every decision in the program has taken all possible outcomes
at least once, and each condition in a decision has been shown to independently
affect that decision's outcome.
A condition is shown to independently affect a decision's outcome by varying just
that condition while holding fixed all other possible conditions.
So, for the decision '(A or B) and (C or D)' we have four conditions: A,B,C and D
For each condition we must find a pair of test vectors that shows that the condition
'independently affect that decision's outcome'.
For unique cause MC/DC, only the value of the condition considered can vary in the pair of test vectors.
For example let's consider condition A. The following pair of test vectors covers condition A:
(A or B) and (C or D) = Z
T F T F T
F F T F F
With this pair of test vectors (TFTF, FFTF) only the value of A and Z (the decision) change.
We then search pairs for conditions B, C and D.
Using the RapiCover GUI (Qualifiable Code coverage tool from Rapita Systems - www.rapitasystems.com/products/rapicover) we can see the full set of test vectors (observed or missing) to fully cover all conditions of the decision.
RapiCover screenshot
Vector V3 (in yellow in the screenshot above) isn't used in any independence pair.
Vector V6 (in red in the screenshot) is missing for MC/DC coverage of condition D.
This is for the definition of 'unique cause' MC/DC.
Now for 'masking MC/DC':
For 'masking MC/DC' the requirement that the value of a single condition may vary in a pair
of test vectors is relaxed provided that any other change is masked by the boolean
operators in the expression.
For example, let's consider the pair of vectors for condition D:
(A or B) and (C or D) = Z
T F F T T
T F F F F
We can represent these two test vectors on the expression tree:
and
/ \
or1 or2
/ \ / \
A B C D
and and
[T] [F]
/ \ / \
or1 or2 or1 or2
[T] [T] [T] [F]
/ \ / \ / \ / \
A B C D A B C D
[T] [F][F] [T] [T] [F][F] [F]
This is a pair for unique cause MC/DC.
Let's now consider a new pair of test vectors for condition D:
(A or B) and (C or D) = Z
F T F T T
T F F F F
Again we can represent these two test vectors on the expression tree:
and and
[T] [F]
/ \ / \
or1 or2 or1 or2
[T] [T] [T] [F]
/ \ / \ / \ / \
A B C D A B C D
[F] [T][F] [T] [T] [F][F] [F]
This is a pair for masking MC/DC because although the values for 3 conditions (A, B and D) have changed
the change for conditions A and B is masked by the boolean operator 'or1' (i.e. the value of 'A or B' is unchanged).
So, for masking MCDC, the independence pairs for all condition D can be:
RapiCover screenshot

Transform a grammar G into LL(1)

I have the following grammar and I need to convert it to LL(1) grammar
G = (N; T; P; S) N = {S,A,B,C} T = {a, b, c, d}
P = {
S -> CbSb | adB | bc
A -> BdA | b
B -> aCd | ë
C -> Cca | bA | a
}
The point is that I know how to convert when its just a production, but I can't find any clear method of solving this on the internet.
Thanks in advance!
Remove left recursion, direct and indirect.
Build an LA(k) table. If there's no ambiguity, the grammar (and the language) is LL(k).
The obvious left recursion in the grammar is:
S ==> C... ==> C...

Parse string with lex in Haskell

I'm following Gentle introduction to Haskell tutorial and the code presented there seems to be broken. I need to understand whether it is so, or my seeing of the concept is wrong.
I am implementing parser for custom type:
data Tree a = Leaf a | Branch (Tree a) (Tree a)
printing function for convenience
showsTree :: Show a => Tree a -> String -> String
showsTree (Leaf x) = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)
instance Show a => Show (Tree a) where
showsPrec _ x = showsTree x
this parser is fine but breaks when there are spaces
readsTree :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) = [(Branch l r, u) | (l, '|':t) <- readsTree s,
(r, '>':u) <- readsTree t ]
readsTree s = [(Leaf x, t) | (x,t) <- reads s]
this one is said to be a better solution, but it does not work without spaces
readsTree_lex :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s = [(Branch l r, x) | ("<", t) <- lex s,
(l, u) <- readsTree_lex t,
("|", v) <- lex u,
(r, w) <- readsTree_lex v,
(">", x) <- lex w ]
++
[(Leaf x, t) | (x, t) <- reads s ]
next I pick one of parsers to use with read
instance Read a => Read (Tree a) where
readsPrec _ s = readsTree s
then I load it in ghci using Leksah debug mode (this is unrelevant, I guess), and try to parse two strings:
read "<1|<2|3>>" :: Tree Int -- succeeds with readsTree
read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex
when lex encounters |<2... part of the former string, it splits onto ("|<", _). That does not match ("|", v) <- lex u part of parser and fails to complete parsing.
There are two questions arising:
how do I define parser that really ignores spaces, not requires them?
how can I define rules for splitting encountered literals with lex
speaking of second question -- it is asked more of curiousity as defining my own lexer seems to be more correct than defining rules of existing one.
lex splits into Haskell lexemes, skipping whitespace.
This means that since Haskell permits |< as a lexeme, lex will not split it into two lexemes, since that's not how it parses in Haskell.
You can only use lex in your parser if you're using the same (or similar) syntactic rules to Haskell.
If you want to ignore all whitespace (as opposed to making any whitespace equivalent to one space), it's much simpler and more efficient to first run filter (not.isSpace).
The answer to this seems to be a small gap between text of Gentle introduction to Haskell and its code samples, plus an error in sample code.
there should also be one more lexer, but there is no working example (satisfying my need) in codebase, so I written one. Please point out any flaw in it:
lexAll :: ReadS String
lexAll s = case lex s of
[("",_)] -> [] -- nothing to parse.
[(c, r)] -> if length c == 1 then [(c, r)] -- we will try to match
else [(c, r), ([head s], tail s)]-- not only as it was
any_else -> any_else -- parsed but also splitted
author sais:
Finally, the complete reader. This is not sensitive to white space as
were the previous versions. When you derive the Show class for a data
type the reader generated automatically is similar to this in style.
but lexAll should be used instead of lex (which seems to be said error):
readsTree' :: (Read a) => ReadS (Tree a)
readsTree' s = [(Branch l r, x) | ("<", t) <- lexAll s,
(l, u) <- readsTree' t,
("|", v) <- lexAll u,
(r, w) <- readsTree' v,
(">", x) <- lexAll w ]
++
[(Leaf x, t) | (x, t) <- reads s]

How to remove left-recursion in the following grammar?

Unfortunately, it is not possible for ANTLR to support direct-left recursion when the rule has parameters passed. The only viable option is to remove the left recursion. Is there a way to remove the left-recursion in the following grammar ?
a[int x]
: b a[$x] c
| a[$x - 1]
(
c a[$x - 1]
| b c
)
;
The problem is in the second alternative involving left recursion. Any kind of help would be much appreciated.
Without the parameters and easier formatting, it would look like this:
a
: b a c
| a (c a | b c)
;
When a's left recursive alternative is matched n times, it would just mean that (c a | b c) will be matched n times, pre-pended with the terminating b a c (the first alternative). That means that this rule will always start with b a c, followed by zero or more occurrences of (c a | b c):
a
: b a c (c a | b c)*
;

Resources