Google Sheets: Repeat a row based on COUNTA of a cell, ARRAYFORMULA - google-sheets

I have a table as below. The table is updated from time to time so the exact number of rows is not know:
+--+-------+-------------+
|a |red |1, 1, 1, |
+--+-------+-------------+
|b |green |2, 2, |
+--+-------+-------------+
|c |blue |3, |
+--+-------+-------------+
I need to repeat each row based on the COUNTA in the Column 3 as follows:
+--+-------+-------------+
|a |red |1 |
+--+-------+-------------+
|a |red |1 |
+--+-------+-------------+
|a |red |1 |
+--+-------+-------------+
|b |green |2 |
+--+-------+-------------+
|b |green |2 |
+--+-------+-------------+
|c |blue |3 |
+--+-------+-------------+
I wrote a formula but to ensure it addresses enough rows I have to manually add another row to that formula (consider the columns are E, F, and G):
={
if(len(E2)>0,{
transpose(split(rept(E2&"****",COUNTA(split(G2,", "))),"****")),transpose(split(rept(F2&"****",COUNTA(split(G2,", "))),"****")),TRANSPOSE(split(G2,", "))}
,{"","",""});
if(len(E3)>0,{
transpose(split(rept(E3&"****",COUNTA(split(G3,", "))),"****")),transpose(split(rept(F3&"****",COUNTA(split(G3,", "))),"****")),TRANSPOSE(split(G3,", "))}
,{"","",""});
if(len(E4)>0,{
transpose(split(rept(E4&"****",COUNTA(split(G4,", "))),"****")),transpose(split(rept(F4&"****",COUNTA(split(G4,", "))),"****")),TRANSPOSE(split(G4,", "))}
,{"","",""});
if(len(E5)>0,{
transpose(split(rept(E5&"****",COUNTA(split(G5,", "))),"****")),transpose(split(rept(F5&"****",COUNTA(split(G5,", "))),"****")),TRANSPOSE(split(G5,", "))}
,{"","",""})
}
etc.
Example sheet.
Sine the exact number of rows is not known I would like to convert this into an ARRAYFORMULA for rows 2-1000.
Would that be possible at all? If yes, what would be the formula? Thanks!

Paste this script in the script editor.
/**
* Splits the array by commas in the column with given index, by given delimiter
* #param {A2:B20} range Range reference
* #param {2} colToSplit Column index
* #param {","} delimiter Character by which to split
* #customfunction
*/
function advancedSplit(range, colToSplit, delimiter) {
var resArr = [], row;
range.forEach(function (r) {
r[colToSplit-1].replace(/(?:\r\n|\r|\n)(\d|\w)/g,", ").split(delimiter)
.forEach(function (s) {
row = [];
r.forEach(function (c, k) {
row.push( (k === colToSplit-1) ? s.trim() : c);
})
resArr.push(row);
})
})
return resArr.filter(function (r) {
return r.toString()
.replace(/,/g, "")
})
}
Then in the spreadsheet use this script as a custom formula
=advancedSplit(E2:G, 3, ",")
I hope this helps?

Related

gSheets: How to use SPLIT in ARRAYFORMULA over columns

For numbers x and y, I have cell data formatted as x#y.
An example row:
| A | B | C | D |
| ------ | ------ | ----- | ------ |
|10#100 | 10#120 | 8#150 | 5#175 |
I want to parse this type of row into two quantities: the sum of the x's and sum of y's.
With my example, I should have two cells:
33 and 545
Basically, I want to SUM the resulting array of SPLIT applied to each cell in A1:D1.
My attempt
=SUM(ARRAYFORMULA(SPLIT(A1:D1, "#")))
Unfortunately, this approach doesn't allow me to specify whether I want x or y (when I call SPLIT) and it seems to be returning x + y, rather than sum(i=1 to 4) x_i.
Try this:
=index(query(arrayformula(split(transpose(A1:D1), "#")),"select sum(Col1),sum(Col2) ",0),2)
Another option:
=ArrayFormula({SUM(INDEX(SPLIT(TRANSPOSE(A1:D1),"#"),0,1)),SUM(INDEX(SPLIT(TRANSPOSE(A1:D1),"#"),0,2))})
use:
=SUMPRODUCT(SPLIT(JOIN("#",A1:D1),"#"),ISEVEN(SEQUENCE(1,COUNTA(A1:D1)*2)-1))
F3= (replace ISEVEN -> ISODD)
use:
=ARRAYFORMULA(QUERY(QUERY(SPLIT(TRANSPOSE(A1:D1); "#");
"sum(Col1),sum(Col2)"); "offset 1"; 0))

How to translate COUNTIFS formula in ARRAYFORMULA to automatically insert the formula in each row

With my countifs formula in column C I want to auto-number (running total) all occurrences of an identical string in column A (e.g. Apple or Orange) but only if on the same row where the string appears column B is of a certain type, e.g. if in column B the type is of "fruit" in column C auto number all occurrences of an identical string in column A. For each new string which is of type "fruit" start the numbering all over again.
The outcome should be like this:
+---+-----------+-------+---+--+
| | A | B | C | |
+---+-----------+-------+---+--+
| 1 | Apple | Fruit | 1 | |
| 2 | Apple | Fruit | 2 | |
| 3 | Mercedes | Car | 0 | |
| 4 | Mercedes | Car | 0 | |
| 5 | Orange | Fruit | 1 | |
| 6 | Orange | Fruit | 2 | |
| 7 | Apple | Fruit | 3 | |
+---+-----------+-------+---+--+
The formula in column C:
=COUNTIFS($A1:$A$1;A1;$B1:$B$1;"Fruit")
=COUNTIFS($A$1:$A2;A2;$B$1:$B2;"Fruit")
=COUNTIFS($A$1:$A3;A3;$A$1:$A3;"Fruit")
…and so on…
I want to translate this formula into an array formula and put this into the header so the formula will automatically expand.
No matter what I've tried it won't work.
Any help is truly appreciated!
Here's a link to a sheet: [https://docs.google.com/spreadsheets/d/1lgbuLbTSnyKkqr33NdVuDEv5eoXFwatX1rgeF9YpIks/edit?usp=sharing][1]
={"ARRAYFORMULA HERE"; ARRAYFORMULA(IF(LEN(B2:B), IF(B2:B="Fruit",
MMULT(N(ROW(B2:B)>=TRANSPOSE(ROW(B2:B))), N(B2:B="Fruit"))-
HLOOKUP(0, MMULT(N(ROW(B2:B)>TRANSPOSE(ROW(B2:B))), N(B2:B="Fruit")),
MATCH(VLOOKUP(ROW(B2:B), IF(N(B2:B<>B1:B), ROW(B2:B), ), 1, 1),
VLOOKUP(ROW(B2:B), IF(N(B2:B<>B1:B), ROW(B2:B), ), 1, 1), 0), 0), 0), ))}
demo spreadsheet
=ARRAYFORMULA(IF(LEN(B2:B), IF(B2:B="Fruit",
MMULT(N(ROW(B2:B)>=TRANSPOSE(ROW(B2:B))), N(B2:B="Fruit")), 0), ))

Google sheets wrap cell content in other content

Im looking to create a formula in one column that takes the content from the adjacent column and wraps it inside some other content, can anyone help with this?
For example, given:
A | B
1| | someText1
2| | someText2
3| | someText3
4| | someText4
expected outcome content for Col A, after applying appropriate formula:
A | B
1| wrap("someText1") | someText1
2| wrap("someText2") | someText2
3| wrap("someText3") | someText3
4| wrap("someText4") | someText4
I hope this makes sense, any help would be appreciated. Thanks
What i ended up doing: Add a function and applied it to the whole column A
function getAdjacentValue() {
var range = SpreadsheetApp.getActiveRange();
var col = range.getColumn();
var row = range.getRow();
var range2 = SpreadsheetApp.getActiveSheet().getRange(row,col+1);
return 'wrap("'+range2.getValue()+'")';
}
By combining MewX suggestion with arrayformula one can achieve the same for all column, with one formula:
=arrayformula("wrap(""" & B1:B4 & """)")
Explanation: & is the string concatenation operator, quote marks within a string are escaped by doubling them.

Spark: Join dataframe column with an array

I have two DataFrames with two columns
df1 with schema (key1:Long, Value)
df2 with schema (key2:Array[Long], Value)
I need to join these DataFrames on the key columns (find matching values between key1 and values in key2). But the problem is that they have not the same type. Is there a way to do this?
The best way to do this (and the one that doesn't require any casting or exploding of dataframes) is to use the array_contains spark sql expression as shown below.
import org.apache.spark.sql.functions.expr
import spark.implicits._
val df1 = Seq((1L,"one.df1"), (2L,"two.df1"),(3L,"three.df1")).toDF("key1","Value")
val df2 = Seq((Array(1L,1L),"one.df2"), (Array(2L,2L),"two.df2"), (Array(3L,3L),"three.df2")).toDF("key2","Value")
val joinedRDD = df1.join(df2, expr("array_contains(key2, key1)")).show
+----+---------+------+---------+
|key1| Value| key2| Value|
+----+---------+------+---------+
| 1| one.df1|[1, 1]| one.df2|
| 2| two.df1|[2, 2]| two.df2|
| 3|three.df1|[3, 3]|three.df2|
+----+---------+------+---------+
Please note that you cannot use the org.apache.spark.sql.functions.array_contains function directly as it requires the second argument to be a literal as opposed to a column expression.
You can cast the type of key1 and key2 and then use the contains function, as follow.
val df1 = sc.parallelize(Seq((1L,"one.df1"),
(2L,"two.df1"),
(3L,"three.df1"))).toDF("key1","Value")
DF1:
+----+---------+
|key1|Value |
+----+---------+
|1 |one.df1 |
|2 |two.df1 |
|3 |three.df1|
+----+---------+
val df2 = sc.parallelize(Seq((Array(1L,1L),"one.df2"),
(Array(2L,2L),"two.df2"),
(Array(3L,3L),"three.df2"))).toDF("key2","Value")
DF2:
+------+---------+
|key2 |Value |
+------+---------+
|[1, 1]|one.df2 |
|[2, 2]|two.df2 |
|[3, 3]|three.df2|
+------+---------+
val joinedRDD = df1.join(df2, col("key2").cast("string").contains(col("key1").cast("string")))
JOIN:
+----+---------+------+---------+
|key1|Value |key2 |Value |
+----+---------+------+---------+
|1 |one.df1 |[1, 1]|one.df2 |
|2 |two.df1 |[2, 2]|two.df2 |
|3 |three.df1|[3, 3]|three.df2|
+----+---------+------+---------+

How to get the index of FOREACH iterations

Within a FOREACH statement [e.g. day in range(dayX, dayY)] is there an easy way to find out the index of the iteration ?
Yes, you can.
Here is an example query that creates 8 Day nodes that contain an index and day:
WITH 5 AS day1, 12 AS day2
FOREACH (i IN RANGE(0, day2-day1) |
CREATE (:Day { index: i, day: day1+i }));
This query prints out the resulting nodes:
MATCH (d:Day)
RETURN d
ORDER BY d.index;
and here is an example result:
+--------------------------+
| d |
+--------------------------+
| Node[54]{day:5,index:0} |
| Node[55]{day:6,index:1} |
| Node[56]{day:7,index:2} |
| Node[57]{day:8,index:3} |
| Node[58]{day:9,index:4} |
| Node[59]{day:10,index:5} |
| Node[60]{day:11,index:6} |
| Node[61]{day:12,index:7} |
+--------------------------+
FOREACH does not yield the index during iteration. If you want the index you can use a combination of range and UNWIND like this:
WITH ["some", "array", "of", "things"] AS things
UNWIND range(0,size(things)-2) AS i
// Do something for each element in the array. In this case connect two Things
MERGE (t1:Thing {name:things[i]})-[:RELATED_TO]->(t2:Thing {name:things[i+1]})
This example iterates a counter i over which you can use to access the item at index i in the array.

Resources