Basically, I have N rows with one unique value always repeating three times. This is col_1. Then I have a range of values I want repeated as many times there are unique values in col_1. This needs to be dynamic, since col_1 is automatically generated from a list.
col_1 | values
------- ------
a | d
a | e
a | f
b |
b |
b |
c |
c |
c |
So this is what I want to end up with:
col_1 | col_2
----------------
a | d
a | e
a | f
b | d
b | e
b | f
c | d
c | e
c | f
Edit: as a note in comment, my data is completely dynamic so I can't have any assumptions about how many rows there will be. In here I have a list of [a,b,c], multiplied by as many times there are items in Values, so [a,b,c] & [d,e,f] results in 9 rows. If I add "g" to [d,e,f], I then have 12 rows and if I then add "h" to [a,b,c] I would have 16 rows. The dynamic part is the important bit in here.
So I want to answer my own question, because I spend way too long for looking the answer and couldn't find one, so I just came up with one by myself. So here's the answer:
=ArrayFormula(TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~")))
You can just copy and change the ranges for it to work, but let me explain how does it work.
First we combine the values we want to repeat into one string with CONCATENATE. The three values are defined in the range of C2:C4.
CONCATENATE(C2:C4&"~") → "d~e~f~"
~ is used here as a delimiter, so there's no any special tricks in here. Next we repeat this string we just made as many times as there are unique values in col_1. For this we use a combination of COUNTA, UNIQUE and REPT.
COUNTA(UNIQUE(A2:A500)) ← Count how many unique occurrences there are in a range ( 3 )
REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))
Basically this is converted into:
REPT("d~e~f~",3) → "d~e~f~d~e~f~d~e~f~"
Now we have as many d, e and f as we want. Next we need to turn them into cells. We'll do this with a combination of SPLIT and TRANSPOSE.
TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~"))
We split the string from "~" so we'll end up with an array looking like [d,e,f,d,e,f,d,e,f]. We then need to transpose it to turn it into rows instead of columns.
Last part is to wrap everything into an arrayformula, so the formula actually does work.
=ArrayFormula(TRANSPOSE(SPLIT(REPT(CONCATENATE(C2:C4&"~"),COUNTA(UNIQUE(A2:A500))),"~")))
Now the array will look like:
col_1 | col_2
----------------
a | d
a | e
a | f
b | d
b | e
b | f
c | d
c | e
c | f
Now any time you add a new unique value to col_1, three new values are added
There is a new function that we discovered on the Google Product forums due to a user's post. That function is called FLATTEN().
in your scenario, this should work:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(A2:A&"|"&TRANSPOSE(C2:C4)),"|",0,0),"where Col1<>''"))
Related
I have a grammar and I am asked to build a parsing table and to verify that it is an LL(1) grammar. I noticed after building the parsing table that some of the columns were empty and had no production rules in them. Does this mean that it is wrongfully built? Or that it is not LL(1)? Or maybe something is missing?
Thank you.
That’s nothing to worry about. An empty column indicates that there’s a terminal symbol that never starts a production (among other things). For example, take this simple LL(1) grammar:
S → abcdefg
Here, in the row for S, the columns for b, c, d, e, f, and g will all be empty, and the column for a will be the rule S → abcdefg. More specifically, the augmented grammar looks like this:
S' → S
S → abcdefg
The parsing table looks like this:
| a | b | c | d | e | f | g
---+---------+---+---+---+---+---+---
S'| S | | | | | |
S | abcdefg | | | | | |
Notice that most columns are empty.
I am trying to write rules in YACC (.y file). I want to make sure that certain tokens occur together or they don't appear at all.
I tried writing this :
rule_pilot : NULL | rule_pilot rule
rule : A | B | C | D | E | F
Some valid strings are:
ABCD
ACEDB
FCDAB
AE
FA
EFA
As shown in example valid strings, what i want is all B, C and D come in the string or none of them come in the string. But rule_pilot can't ensure that. It will also accept strings like ABC, B etc.
How should I write rule_pilot so that it fulfills the above mentioned criteria?
I understand how an LL recursive descent parser can handle rules of this form:
A = B*;
with a simple loop that checks whether to continue looping or not based on whether the lookahead token matches a terminal in the FIRST set of B. However, I'm curious about table based LL parsers: how can rules of this form work there? As far as I know, the only way to handle repetition like this in one is through right-recursion, but that messes up associativity in cases where a right-associative parse tree is not desired.
I'd like to know because I'm currently attempting to write an LL(1) table-based parser generator and I'm not sure how to handle a case like this without changing the intended parse tree shape.
The Grammar
Let's expand your EBNF grammar to simple BNF and assume, that b is a terminal and <e> is an empty string:
A -> X
X -> BX
X -> <e>
B -> b
This grammar produces strings of terminal b's of any length.
The LL(1) Table
To construct the table, we will need to generate the first and follow sets (constructing an LL(1) parsing table).
First sets
First(α) is the set of terminals that begin strings derived from any string of grammar symbols α.
First(A) : b, <e>
First(X) : b, <e>
First(B) : b
Follow sets
Follow(A) is the set of terminals a that can
appear immediately to the right of a nonterminal A.
Follow(A) : $
Follow(X) : $
Follow(B) : b$
Table
We can now construct the table based on the sets, $ is the end of input marker.
+---+---------+----------+
| | b | $ |
+---+---------+----------+
| A | A -> X | A -> X |
| X | X -> BX | X -> <e> |
| B | B -> b | |
+---+---------+----------+
The parser action always depends on the top of the parse stack and the next input symbol.
Terminal on top of the parse stack:
Matches the input symbol: pop stack, advance to the next input symbol
No match: parse error
Nonterminal on top of the parse stack:
Parse table contains production: apply production to stack
Cell is empty: parse error
$ on top of the parse stack:
$ is the input symbol: accept input
$ is not the input symbol: parse error
Sample Parse
Let us analyze the input bb. The initial parse stack contains the start symbol and the end marker A $.
+-------+-------+-----------+
| Stack | Input | Action |
+-------+-------+-----------+
| A $ | bb$ | A -> X |
| X $ | bb$ | X -> BX |
| B X $ | bb$ | B -> b |
| b X $ | bb$ | consume b |
| X $ | b$ | X -> BX |
| B X $ | b$ | B -> b |
| b X $ | b$ | consume b |
| X $ | $ | X -> <e> |
| $ | $ | accept |
+-------+-------+-----------+
Conclusion
As you can see, rules of the form A = B* can be parsed without problems. The resulting concrete parse tree for input bb would be:
Yes, this is definitely possible. The standard method of rewriting to BNF and constructing a parse table is useful for figuring out how the parser should work – but as far as I can tell, what you're asking is how you can avoid the recursive part, which would mean that you'd get the slanted binary tree/linked list form of AST.
If you're hand-coding the parser, you can simply use a loop, using the lookaheads from the parse table that indicate a recursive call to decide to go around the loop once more. (I.e., you could just use while with those lookaheads as the condition.) Then for each iteration, you simply append the constructed subtree as a child of the current parent. In your case, then, A would get several direct B-children.
Now, as I understand it, you're building a parser generator, and it might be easiest to follow the standard procedure, going via plan BNF. However, that's not really an issue; there is no substantive difference between iteration and recursion, after all. You simply have to have a class of “helper rules” that don't introduce new AST nodes, but that rather append their result to the node of the nonterminal that triggered them. So when turning the repetition into X -> BX, rather than constructing X nodes, you have your X rule extend the child-list of the A or X (whichever triggered it) by its own children. You'll still end up with A having several B children, and no X nodes in sight.
I am considering learning graph databases (like neo4j), but I was curious if such facilities are available in graph databases, eg., If I do:
Step 1: create: A --> B --> C
Step 2: create: D --> B --> E
Step 3: create: F --> G --> E
This should automatically result in a graph stored something like:
A ---> B ----> C
/|\ \
D -----| \--> E
/|\
F ---> G --------|
Here the common nodes B and E are coalesced (without having to programmatically check for a prior existence of these nodes). In a real world example, there would be 1000's of such B's and E's which would be implemented in relational DB as follows:
FK = Foreign Key .. X Y Z are keys for three primary tables.
___________ ________ _____________ ________
X | FK(Y) Y | ... FK(Y) | FK(Z) Z | ..
---|------- --|----- ------|------ ---|----
A | FK(B) B | ... FK(B) | FK(C) C | ..
D | FK(B) G | .. FK(B) | FK(E) E | ..
F | FK(G) FK(G) | FK(E)
In a RDB, (eg., when I insert relation D-->B) I would have to programmatically search for a duplicate object B in the 2nd table (or look for a fail code when trying to insert an identical object into it) and then get the B's foreign key to put along with D. I am hoping that in graph DB, such things are taken care of by the DB.
You should look at v2.0's new MERGE clause, which allows you to have a follow-on ON MATCH and ON CREATE clause, so you can take a specific action when a node is found vs created.
See the 2.0M3 blog post for an intro (2.0M4 is the latest build but MERGE was intro'd in M3), as well as this "What's new in 2.0" video.
How can I find a relation direction with regards to a containing path? I need this to do a weighted graph search that takes into account relation direction (weighing "wrong" direction with a 0, see also comments).
Lets say:
START a=node({param})
MATCH a-[*]-b
WITH a, b
MATCH p = allshortestpaths(a-[*]-b)
RETURN extract(r in rels(p): flows_with_path(r)) as in_flow
where
flows_with_path = 1 if sp = (a)-[*0..]-[r]->[*0..]-(b), otherwise
0
EDIT: corrected query
So, here's a way to do it with existing cypher functions. I don't promise it's super performant, but give it a shot. We're building our collection with reduce, using an accumulator tuple with a collection and the last node we looked at, so we can check that it's connected to the next node. This requires 2.0's case/when syntax--there may be a way to do it in 1.9 but it's probably even more complex.
START a=node:node_auto_index(name="Trinity")
MATCH a-[*]-b
WHERE a <> b
WITH distinct a,b
MATCH p = allshortestpaths(a-[*]-b)
RETURN extract(x in nodes(p): x.name?), // a concise representation of the path we're checking
head(
reduce(acc=[[], head(nodes(p))], x IN tail(nodes(p)): // pop the first node off, traverse the tail
CASE WHEN ALL (y IN tail(acc) WHERE y-->x) // a bit of a hack because tail(acc)-->x doesn't parse right, so I had to wrap it so I can have a bare identifier in the pattern predicate
THEN [head(acc) + 0, x] // add a 0 to our accumulator collection
ELSE [head(acc) + 1, x] // add a 1 to our accumulator collection
END )) AS in_line
http://console.neo4j.org/r/v0jx03
Output:
+---------------------------------------------------------------------------+
| extract(x in nodes(p): x.name?) | in_line |
+---------------------------------------------------------------------------+
| ["Trinity","Morpheus"] | [1] |
| ["Trinity","Morpheus","Cypher"] | [1,0] |
| ["Trinity","Morpheus","Cypher","Agent Smith"] | [1,0,0] |
| ["Trinity","Morpheus","Cypher","Agent Smith","The Architect"] | [1,0,0,0] |
| ["Trinity","Neo"] | [1] |
| ["Trinity","Neo",<null>] | [1,1] |
+---------------------------------------------------------------------------+
Note: Thanks #boggle for the brainstorming session.