Checking for set existence in YACC - parsing

I am trying to write rules in YACC (.y file). I want to make sure that certain tokens occur together or they don't appear at all.
I tried writing this :
rule_pilot : NULL | rule_pilot rule
rule : A | B | C | D | E | F
Some valid strings are:
ABCD
ACEDB
FCDAB
AE
FA
EFA
As shown in example valid strings, what i want is all B, C and D come in the string or none of them come in the string. But rule_pilot can't ensure that. It will also accept strings like ABC, B etc.
How should I write rule_pilot so that it fulfills the above mentioned criteria?

Related

How to repeat row range N times in Google Sheets

Basically, I have N rows with one unique value always repeating three times. This is col_1. Then I have a range of values I want repeated as many times there are unique values in col_1. This needs to be dynamic, since col_1 is automatically generated from a list.
col_1 | values
------- ------
a | d
a | e
a | f
b |
b |
b |
c |
c |
c |
So this is what I want to end up with:
col_1 | col_2
----------------
a | d
a | e
a | f
b | d
b | e
b | f
c | d
c | e
c | f
Edit: as a note in comment, my data is completely dynamic so I can't have any assumptions about how many rows there will be. In here I have a list of [a,b,c], multiplied by as many times there are items in Values, so [a,b,c] & [d,e,f] results in 9 rows. If I add "g" to [d,e,f], I then have 12 rows and if I then add "h" to [a,b,c] I would have 16 rows. The dynamic part is the important bit in here.
So I want to answer my own question, because I spend way too long for looking the answer and couldn't find one, so I just came up with one by myself. So here's the answer:
=ArrayFormula(TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~")))
You can just copy and change the ranges for it to work, but let me explain how does it work.
First we combine the values we want to repeat into one string with CONCATENATE. The three values are defined in the range of C2:C4.
CONCATENATE(C2:C4&"~") → "d~e~f~"
~ is used here as a delimiter, so there's no any special tricks in here. Next we repeat this string we just made as many times as there are unique values in col_1. For this we use a combination of COUNTA, UNIQUE and REPT.
COUNTA(UNIQUE(A2:A500)) ← Count how many unique occurrences there are in a range ( 3 )
REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))
Basically this is converted into:
REPT("d~e~f~",3) → "d~e~f~d~e~f~d~e~f~"
Now we have as many d, e and f as we want. Next we need to turn them into cells. We'll do this with a combination of SPLIT and TRANSPOSE.
TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~"))
We split the string from "~" so we'll end up with an array looking like [d,e,f,d,e,f,d,e,f]. We then need to transpose it to turn it into rows instead of columns.
Last part is to wrap everything into an arrayformula, so the formula actually does work.
=ArrayFormula(TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~")))
Now the array will look like:
col_1 | col_2
----------------
a | d
a | e
a | f
b | d
b | e
b | f
c | d
c | e
c | f
Now any time you add a new unique value to col_1, three new values are added
There is a new function that we discovered on the Google Product forums due to a user's post. That function is called FLATTEN().
in your scenario, this should work:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(A2:A&"|"&TRANSPOSE(C2:C4)),"|",0,0),"where Col1<>''"))

F# is it possible to "upcast" discriminated union value to a "superset" union?

Let's say there are two unions where one is a strict subset of another.
type Superset =
| A of int
| B of string
| C of decimal
type Subset =
| A of int
| B of string
Is it possible to automatically upcast a Subset value to Superset value without resorting to explicit pattern matching? Like this:
let x : Subset = A 1
let y : Superset = x // this won't compile :(
Also it's ideal if Subset type was altered so it's no longer a subset then compiler should complain:
type Subset =
| A of int
| B of string
| D of bool // - no longer a subset of Superset!
I believe it's not possible to do but still worth asking (at least to understand why it's impossible)
WHY I NEED IT
I use this style of set/subset typing extensively in my domain to restrict valid parameters in different states of entities / make invalid states non-representable and find the approach very beneficial, the only downside is very tedious upcasting between subsets.
Sorry, no
Sorry, but this is not possible. Take a look at https://fsharpforfunandprofit.com/posts/fsharp-decompiled/#unions — you'll see that F# compiles discriminated unions to .NET classes, each one separate from each other with no common ancestors (apart from Object, of course). The compiler makes no effort to try to identify subsets or supersets between different DUs. If it did work the way you suggested, it would be a breaking change, because the only way to do this would be to make the subset DU a base class, and the superset class its derived class with an extra property. And that would make the following code change behavior:
type PhoneNumber =
| Valid of string
| Invalid
type EmailAddress =
| Valid of string
| ValidButOutdated of string
| Invalid
let identifyContactInfo (info : obj) =
// This came from external code we don't control, but it should be contact info
match (unbox obj) with
| :? PhoneNumber as phone -> // Do something
| :? EmailAddress as email -> // Do something
Yes, this is bad code and should be written differently, but it illustrates the point. Under current compiler behavior, if identifyContactInfo gets passed a EmailAddress object, the :? PhoneNumber test will fail and so it will enter the second branch of the match, and treat that object (correctly) as an email address. If the compiler were to guess supersets/subsets based on DU names as you're suggesting here, then PhoneNumber would be considered a subset of EmailAddress and so would become its base class. And then when this function received an EmailAddress object, the :? PhoneNumber test would succeed (because an instance of a derived class can always be cast to the type of its base class). And then the code would enter the first branch of the match expression, and your code might then try to send a text message to an email address.
But wait...
What you're trying to do might be achievable by pulling out the subsets into their own DU category:
type AorB =
| A of int
| B of string
type ABC =
| AorB of AorB
| C of decimal
type ABD =
| AorB of AorB
| D of bool
Then your match expressions for an ABC might look like:
match foo with
| AorB (A num) -> printfn "%d" num
| AorB (B s) -> printfn "%s" s
| C num -> printfn "%M" num
And if you need to pass data between an ABC and an ABD:
let (bar : ABD option) =
match foo with
| AorB data -> Some (AorB data)
| C _ -> None
That's not a huge savings if your subset has only two common cases. But if your subset is a dozen cases or so, being able to pass those dozen around as a unit makes this design attractive.

What is the | symbol for in f #?

I'm pretty new to functional programming and I've started looking at the documentation for match statements and in the example I came across here gitpages and cut and pasted to my question below:
let rec fib n =
match n with
| 0 -> 0
| 1 -> 1
| _ -> fib (n - 1) + fib (n - 2)
I understand that let is for static binding in this case for a recursive function called fib which takes a parameter n. It tries to match n with 3 cases. If it's 0, 1 or anything else.
What I don't understand is what the | symbol is called in this context or why it is used? Anything I search for pertaining to f-sharp pipe takes me to this |> which is the piping character in f sharp.
What is this | used for in this case? Is it required or optional? And when should be and shouldn't I be using |?
The | symbol is used for several things in F#, but in this case, it serves as a separator of cases of the match construct.
The match construct lets you pattern match on some input and handle different values in different ways - in your example, you have one case for 0, one for 1 and one for all other values.
Generally, the syntax of match looks like this:
match <input> with <case_1> | ... | <case_n>
Where each <case> has the following structure:
<case> = <pattern> -> <expression>
Here, the | symbol simply separates multiple cases of the pattern matching expression. Each case then has a pattern and an expression that is evaluated when the input matches the pattern.
To expand on Tomas's excellent answer, here are some more of the various uses of | in F#:
Match expressions
In match expressions, | separates the various patterns, as Tomas has pointed. While you can write the entire match expression on a single line, it's conventional to write each pattern on a separate line, lining up the | characters, so that they form a visual indicator of the scope of the match statement:
match n with
| 0 -> "zero"
| 1 -> "one"
| 2 -> "two"
| 3 -> "three"
| _ -> "something else"
Discriminated Unions
Discriminated Unions (or DUs, since that's a lot shorter to type) are very similar to match expressions in style: defining them means listing the possibilities, and | is used to separate the possibilities. As with match expressions, you can (if you want to) write DUs on a single line:
type Option<'T> = None | Some of 'T
but unless your DU has just two possibilities, it's usually better to write it on multiple lines:
type ContactInfo =
| Email of string
| PhoneNumber of areaCode : string * number : string
| Facebook of string
| Twitter of string
Here, too, the | ends up forming a vertical line that draws the eye to the possibilities of the DU, and makes it very clear where the DU definition ends.
Active patterns
Active patterns also use | to separate the possibilities, but they also are wrapped inside an opening-and-closing pair of | characters:
let (Even|Odd) n = if n % 2 = 0 then Even else Odd // <-- Wrong!
let (|Even|Odd|) n = if n % 2 = 0 then Even else Odd // <-- Right!
Active patterns are usually written in the way I just showed, with the | coming immediately inside the parentheses, which is why some people talk about "banana clips" (because the (| and |) pairs look like bananas if you use your imagination). But in fact, it's not necessary to write the (| and |) characters together: it's perfectly valid to have spaces separating the parentheses from the | characters:
let (|Even|Odd|) n = if n % 2 = 0 then Even else Odd // <-- Right!
let ( |Even|Odd| ) n = if n % 2 = 0 then Even else Odd // <-- ALSO right!
Unrelated things
The pipe operator |> and the Boolean-OR operator || are not at all the same thing as uses of the | operator. F# allows operators to be any combination of symbols, and they can have very different meanings from an operator that looks almost the same. For example, >= is a standard operator that means "greater than". And many F# programs will define a custom operator >>=. But although >>= is not defined in the F# core library, it has a standard meaning, and that standard meaning is NOT "a lot greater than". Rather, >>= is the standard way to write an operator for the bind function. I won't get into what bind does right now, as that's a concept that could take a whole answer all on its own to go through. But if you're curious about how bind works, you can read Scott Wlaschin's series on computation expressions, which explains it all very well.

Converting given ambiguous arithmetic expression grammar to unambiguous LL(1)

In this term, I have course on Compilers and we are currently studying syntax - different grammars and types of parsers. I came across a problem which I can't exactly figure out, or at least I can't make sure I'm doing it correctly. I already did 2 attempts and counterexamples were found.
I am given this ambiguous grammar for arithmetic expressions:
E → E+E | E-E | E*E | E/E | E^E | -E | (E)| id | num , where ^ stands for power.
I figured out what the priorities should be. Highest priority are parenthesis, followed by power, followed by unary minus, followed by multiplication and division, and then there is addition and substraction. I am asked to convert this into equivalent LL(1) grammar. So I wrote this:
E → E+A | E-A | A
A → A*B | A/B | B
B → -C | C
C → D^C | D
D → (E) | id | num
What seems to be the problem with this is not equivalent grammar to the first one, although it's non-ambiguous. For example: Given grammar can recognize input: --5 while my grammar can't. How can I make sure I'm covering all cases? How should I modify my grammar to be equivalent with the given one? Thanks in advance.
Edit: Also, I would of course do elimination of left recursion and left factoring to make this LL(1), but first I need to figure out this main part I asked above.
Here's one that should work for your case
E = E+A | E-A | A
A = A*C | A/C | C
C = C^B | B
B = -B | D
D = (E) | id | num
As a sidenote: pay also attention to the requirements of your task since some applications might assign higher priority to the unary minus operator with respect to the power binary operator.

Can a table-based LL parser handle repetition without right-recursion?

I understand how an LL recursive descent parser can handle rules of this form:
A = B*;
with a simple loop that checks whether to continue looping or not based on whether the lookahead token matches a terminal in the FIRST set of B. However, I'm curious about table based LL parsers: how can rules of this form work there? As far as I know, the only way to handle repetition like this in one is through right-recursion, but that messes up associativity in cases where a right-associative parse tree is not desired.
I'd like to know because I'm currently attempting to write an LL(1) table-based parser generator and I'm not sure how to handle a case like this without changing the intended parse tree shape.
The Grammar
Let's expand your EBNF grammar to simple BNF and assume, that b is a terminal and <e> is an empty string:
A -> X
X -> BX
X -> <e>
B -> b
This grammar produces strings of terminal b's of any length.
The LL(1) Table
To construct the table, we will need to generate the first and follow sets (constructing an LL(1) parsing table).
First sets
First(α) is the set of terminals that begin strings derived from any string of grammar symbols α.
First(A) : b, <e>
First(X) : b, <e>
First(B) : b
Follow sets
Follow(A) is the set of terminals a that can
appear immediately to the right of a nonterminal A.
Follow(A) : $
Follow(X) : $
Follow(B) : b$
Table
We can now construct the table based on the sets, $ is the end of input marker.
+---+---------+----------+
| | b | $ |
+---+---------+----------+
| A | A -> X | A -> X |
| X | X -> BX | X -> <e> |
| B | B -> b | |
+---+---------+----------+
The parser action always depends on the top of the parse stack and the next input symbol.
Terminal on top of the parse stack:
Matches the input symbol: pop stack, advance to the next input symbol
No match: parse error
Nonterminal on top of the parse stack:
Parse table contains production: apply production to stack
Cell is empty: parse error
$ on top of the parse stack:
$ is the input symbol: accept input
$ is not the input symbol: parse error
Sample Parse
Let us analyze the input bb. The initial parse stack contains the start symbol and the end marker A $.
+-------+-------+-----------+
| Stack | Input | Action |
+-------+-------+-----------+
| A $ | bb$ | A -> X |
| X $ | bb$ | X -> BX |
| B X $ | bb$ | B -> b |
| b X $ | bb$ | consume b |
| X $ | b$ | X -> BX |
| B X $ | b$ | B -> b |
| b X $ | b$ | consume b |
| X $ | $ | X -> <e> |
| $ | $ | accept |
+-------+-------+-----------+
Conclusion
As you can see, rules of the form A = B* can be parsed without problems. The resulting concrete parse tree for input bb would be:
Yes, this is definitely possible. The standard method of rewriting to BNF and constructing a parse table is useful for figuring out how the parser should work – but as far as I can tell, what you're asking is how you can avoid the recursive part, which would mean that you'd get the slanted binary tree/linked list form of AST.
If you're hand-coding the parser, you can simply use a loop, using the lookaheads from the parse table that indicate a recursive call to decide to go around the loop once more. (I.e., you could just use while with those lookaheads as the condition.) Then for each iteration, you simply append the constructed subtree as a child of the current parent. In your case, then, A would get several direct B-children.
Now, as I understand it, you're building a parser generator, and it might be easiest to follow the standard procedure, going via plan BNF. However, that's not really an issue; there is no substantive difference between iteration and recursion, after all. You simply have to have a class of “helper rules” that don't introduce new AST nodes, but that rather append their result to the node of the nonterminal that triggered them. So when turning the repetition into X -> BX, rather than constructing X nodes, you have your X rule extend the child-list of the A or X (whichever triggered it) by its own children. You'll still end up with A having several B children, and no X nodes in sight.

Resources