I want to design CFG for a language that is defined by
L= { w | {a,b,c}* where w= a^i b^j c^k and i+j>k }
Case where i+j=k was easy, however I cannot figure how the case for i+j>k.
We can start with a grammar for i + j = k:
S := aSc | T
T := bSc | e
If we want i + j > k, we need either at least one extra a or at least one extra a. We can assume each in turn and combine them into one grammar:
A := aW
W := aW | aWc | X
X := bX | bXc | e
B := aB | aBc | Y
Y := bZ
Z := bZ | bZc | e
S := A | B
The production for S nondeterministically chooses between guaranteeing an extra a or an extra b. A/W/X guarantee at least one extra a and allow any number of extra a and b. B/Y/Z guarantee at least one extra b and allow any number of extra a and b.
A/Y require the extra symbol. W/X/B/Z guarantee at least as many a+b as c.
Related
Basically, I have N rows with one unique value always repeating three times. This is col_1. Then I have a range of values I want repeated as many times there are unique values in col_1. This needs to be dynamic, since col_1 is automatically generated from a list.
col_1 | values
------- ------
a | d
a | e
a | f
b |
b |
b |
c |
c |
c |
So this is what I want to end up with:
col_1 | col_2
----------------
a | d
a | e
a | f
b | d
b | e
b | f
c | d
c | e
c | f
Edit: as a note in comment, my data is completely dynamic so I can't have any assumptions about how many rows there will be. In here I have a list of [a,b,c], multiplied by as many times there are items in Values, so [a,b,c] & [d,e,f] results in 9 rows. If I add "g" to [d,e,f], I then have 12 rows and if I then add "h" to [a,b,c] I would have 16 rows. The dynamic part is the important bit in here.
So I want to answer my own question, because I spend way too long for looking the answer and couldn't find one, so I just came up with one by myself. So here's the answer:
=ArrayFormula(TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~")))
You can just copy and change the ranges for it to work, but let me explain how does it work.
First we combine the values we want to repeat into one string with CONCATENATE. The three values are defined in the range of C2:C4.
CONCATENATE(C2:C4&"~") → "d~e~f~"
~ is used here as a delimiter, so there's no any special tricks in here. Next we repeat this string we just made as many times as there are unique values in col_1. For this we use a combination of COUNTA, UNIQUE and REPT.
COUNTA(UNIQUE(A2:A500)) ← Count how many unique occurrences there are in a range ( 3 )
REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))
Basically this is converted into:
REPT("d~e~f~",3) → "d~e~f~d~e~f~d~e~f~"
Now we have as many d, e and f as we want. Next we need to turn them into cells. We'll do this with a combination of SPLIT and TRANSPOSE.
TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~"))
We split the string from "~" so we'll end up with an array looking like [d,e,f,d,e,f,d,e,f]. We then need to transpose it to turn it into rows instead of columns.
Last part is to wrap everything into an arrayformula, so the formula actually does work.
=ArrayFormula(TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~")))
Now the array will look like:
col_1 | col_2
----------------
a | d
a | e
a | f
b | d
b | e
b | f
c | d
c | e
c | f
Now any time you add a new unique value to col_1, three new values are added
There is a new function that we discovered on the Google Product forums due to a user's post. That function is called FLATTEN().
in your scenario, this should work:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(A2:A&"|"&TRANSPOSE(C2:C4)),"|",0,0),"where Col1<>''"))
I'm having problems understanding an online explanation of how to remove the left recursion in this grammar. I know how to remove direct recursion, but I'm not clear how to handle the indirect. Could anyone explain it?
A --> B x y | x
B --> C D
C --> A | c
D --> d
The way I learned to do this is to replace one of the offending non-terminal symbols with each of its expansions. In this case, we first replace B with its expansions:
A --> B x y | x
B --> C D
becomes
A --> C x y | D x y | x
Now, we do the same for non-terminal symbol C:
A --> C x y | D x y | x
C --> A | c
becomes
A --> A x y | c x y | D x y | x
The only other remaining grammar rule is
D --> d
so you can also make that replacement, leaving your entire grammar as
A --> A x y | c x y | d x y | x
There is no indirect left recursion now, since there is nothing indirect at all.
Also see here.
To eliminate left recursion altogether (not merely indirect left recursion), introduce the A' symbol from your own materials (credit to OP for this clarification and completion):
A -> x A'
A' -> xyA' | cxyA' | dxyA' | epsilon
Response to naomik's comments
Yes, grammars have interesting properties, and you can characterize certain semantic capabilities in terms of constraints on grammar rules. There are transformation algorithms to handle certain types of parsing problems.
In this case, we want to remove left-recursion: one desirable property of a grammar is that the use of any rule must consume at least one input token (terminal symbol). Left-recursion opens a door to infinite recursion in the parser.
I learned these things in my "Foundations of Computing" and "Compiler Construction" classes many years ago. Instead of writing a parser to adapt to a particular grammar, we'd transform the grammar to fit the parser style we wanted.
The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.
I'm reading the following paper on MC/DC: http://shemesh.larc.nasa.gov/fm/papers/Hayhurst-2001-tm210876-MCDC.pdf.
I have the source code: Z := (A or B) and (C or D) and the following test cases:
-----------------
| A | F F T F T |
| B | F T F T F |
| C | T F F T T |
| D | F T F F F |
| Z | F T F T T |
-----------------
I want to prove that the mentioned test cases comply with unique cause definition.
I started by eliminating masked tests:
A or B = F T T T T, meaning it masks the first test case from C or D as F and (C or D) = F.
C or D = T T F T T, meaning it masks the third test case from A or B as (A or B) and F = F.
I then determined MC/DC:
Required test cases for A or B:
F F (first case)
T F (fifth case)
F T (second or fourth case)
Required test cases for C or D:
F F (third case)
T F (fourth or fifth case)
F T (second case)
Required test cases for (A or B) and (C or D):
T T (second, fourth or fifth case)
F T (first case)
T F (third case)
According to the paper, this example doesn't complies to unique cause definition. Instead, they propose changing the second test case from F T F T to T F F T.
-----------------
| A | F T T F T |
| B | F F F T F |
| C | T F F T T |
| D | F T F F F |
| Z | F T F T T |
-----------------
I determined MC/DC for A or B again:
F F (first case)
T F (fifth case)
F T (fourth case)
Then, they introduce the following independence pairs table that shows the difference between both examples (in page 38):
I understand that for the first example, the independence pair that they show changes two variables instead of one, however I don't understand how they are computing the independence pairs.
In the A column, I can infer they take F F T F from the test cases table's A row, and they compute the independence pair as the same test case with only A changed (T F T F).
In B's column, however, they pick F F T F again. According to my thinking, this should equal to the B's column: F T F T instead.
The rest of the letters show the same dilemma.
Also for D's first example column, they show that the independence pair of F T F T is T F F F, which ruins my theory that they are computing the independence pair from the first value, and proving that they are picking it from somewhere else.
Can someone explain better how (and from where) do they construct such independence pair table?
First the let’s re-read the definitions:
(From www.faa.gov/aircraft/air_cert/design_approvals/air_software/cast/cast_papers/media/cast-10.pdf)
DO-178B/ED-12B includes the following definitions:
Condition
A Boolean expression containing no Boolean operators.
Decision
A Boolean expression composed of conditions and zero or more Boolean operators.
A decision without a Boolean operator is a condition.
If a condition appears more than once in a decision, each occurrence is a
distinct condition.
Decision Coverage
Every point of entry and exit in the program has been invoked at least once
and every decision in the program has taken on all possible outcomes at least once.
Modified Condition/Decision Coverage
Every point of entry and exit in the program has been invoked at least once,
every condition in a decision in the program has taken all possible outcomes
at least once, every decision in the program has taken all possible outcomes
at least once, and each condition in a decision has been shown to independently
affect that decision's outcome.
A condition is shown to independently affect a decision's outcome by varying just
that condition while holding fixed all other possible conditions.
So, for the decision '(A or B) and (C or D)' we have four conditions: A,B,C and D
For each condition we must find a pair of test vectors that shows that the condition
'independently affect that decision's outcome'.
For unique cause MC/DC, only the value of the condition considered can vary in the pair of test vectors.
For example let's consider condition A. The following pair of test vectors covers condition A:
(A or B) and (C or D) = Z
T F T F T
F F T F F
With this pair of test vectors (TFTF, FFTF) only the value of A and Z (the decision) change.
We then search pairs for conditions B, C and D.
Using the RapiCover GUI (Qualifiable Code coverage tool from Rapita Systems - www.rapitasystems.com/products/rapicover) we can see the full set of test vectors (observed or missing) to fully cover all conditions of the decision.
RapiCover screenshot
Vector V3 (in yellow in the screenshot above) isn't used in any independence pair.
Vector V6 (in red in the screenshot) is missing for MC/DC coverage of condition D.
This is for the definition of 'unique cause' MC/DC.
Now for 'masking MC/DC':
For 'masking MC/DC' the requirement that the value of a single condition may vary in a pair
of test vectors is relaxed provided that any other change is masked by the boolean
operators in the expression.
For example, let's consider the pair of vectors for condition D:
(A or B) and (C or D) = Z
T F F T T
T F F F F
We can represent these two test vectors on the expression tree:
and
/ \
or1 or2
/ \ / \
A B C D
and and
[T] [F]
/ \ / \
or1 or2 or1 or2
[T] [T] [T] [F]
/ \ / \ / \ / \
A B C D A B C D
[T] [F][F] [T] [T] [F][F] [F]
This is a pair for unique cause MC/DC.
Let's now consider a new pair of test vectors for condition D:
(A or B) and (C or D) = Z
F T F T T
T F F F F
Again we can represent these two test vectors on the expression tree:
and and
[T] [F]
/ \ / \
or1 or2 or1 or2
[T] [T] [T] [F]
/ \ / \ / \ / \
A B C D A B C D
[F] [T][F] [T] [T] [F][F] [F]
This is a pair for masking MC/DC because although the values for 3 conditions (A, B and D) have changed
the change for conditions A and B is masked by the boolean operator 'or1' (i.e. the value of 'A or B' is unchanged).
So, for masking MCDC, the independence pairs for all condition D can be:
RapiCover screenshot
Unfortunately, it is not possible for ANTLR to support direct-left recursion when the rule has parameters passed. The only viable option is to remove the left recursion. Is there a way to remove the left-recursion in the following grammar ?
a[int x]
: b a[$x] c
| a[$x - 1]
(
c a[$x - 1]
| b c
)
;
The problem is in the second alternative involving left recursion. Any kind of help would be much appreciated.
Without the parameters and easier formatting, it would look like this:
a
: b a c
| a (c a | b c)
;
When a's left recursive alternative is matched n times, it would just mean that (c a | b c) will be matched n times, pre-pended with the terminating b a c (the first alternative). That means that this rule will always start with b a c, followed by zero or more occurrences of (c a | b c):
a
: b a c (c a | b c)*
;