How to create a comma separate aggregate in Google Sheets? - google-sheets

Given the following data set:
https://docs.google.com/spreadsheets/d/1wr7v93CM_kWygRNHyqMWcBFvd1XXkC5SYbjLjauS4SM/edit?usp=sharing
**people** **channel**
person1 channel1
person2 channel1
person1 channel2
person3 channel2
How could I write a QUERY (or anything else that makes sense) such that the C column shows a comma separated list of the channels that a given person is in?
For example, I'd like the following output.
**people** **channel**
person1 channel1 channel1, channel2
person2 channel1 channel1
person1 channel2 channel1, channel2
person3 channel2 channel2

try:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SPLIT(FLATTEN(QUERY(QUERY({A2:A&"×", B2:B&",", ROW(A2:A)},
"select max(Col2) where Col2 <> ',' group by Col3 pivot Col1"),,9^9)), "×")), ",$", ))

Related

How to query/filter duplicate rows with multiple criteria?

I'm trying to query/filter rows from a dataset structured like this:
Creator
Title
Barcode
Inv. No.
springer
Cellbio
014678
POL02P14x
springer
Cellbio
026938
POL02P26r
springer
Cellbio
038745
nature
Cellular
026672
POL02P26h
elsevier
Biomed
026678
POL02P26g
elsevier
Biomed
026678
POL02P26g
spring
Cellbit
POL02P147
spring
Cellbit
026938
POL02P26j
spring
Cellbit
038745
I need to return all rows where the value/string in column B(title) is duplicate and when in those duplicate rows at least one string/value in column C(barcode) starts with 014 and at least one starts with 026. If the criteria is not met in column C the next check would be similar in column D (Inv. no.): at least one value string starts with POL02P14 and at least one starts with POL026.
So the basic logic would be something like this:
Select all rows where B is duplicate and
((at least one value in C starts with x and one with y) or ( at least one value in D starts with z and one with W)).
So the desired output should be like this:
Creator
Title
Barcode
Inv. No.
springer
Cellbio
014678
POL02P14x
springer
Cellbio
026938
POL02P26r
springer
Cellbio
038745
spring
Cellbit
POL02P147
spring
Cellbit
026938
POL02P26j
spring
Cellbit
038745
Here is a sample spreadsheet more similar to the actual dataset which is fairly large:
https://docs.google.com/spreadsheets/d/1xj5LnOxIwEmcjnXD0trmvcCKJIGIcfDkARV80Hx5Fvc/edit?usp=sharing
Tried adapting formulas with similar logic but always getting errors or unexpected results either the query logic/syntax is wrong or there is filter/array dimension mismatch.
Some examples(the column references are mixed up here because i was trying to reduce the number of columns) :
=FILTER(query(list!A1:AR, "Select * where C starts with 'POL02P'"), list!B1:B<>"",COUNTIF(list!B1:B,list!B1:B)>1)
={results!A1:AR1;array_constrain(
query(
{Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26"));
countif(index(Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26")),0,45),
index(Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26")),0,45))}
,"Select * where Col46>1")
,9^9,44)}
=query(FILTER({list!A2:A&list!J2:J,list!A2:J,
iferror(
vlookup(list!A2:A&list!J2:J,query(query(filter(list!A2:A&
list!J2:J,REGEXMATCH(list!C2:C, "^POL02P14|POL02P26")),
"select Col4, count(Col4) where Col4 <> '' group by Col4"),
"select Col4 where Col2 >1 "),1,false))},REGEXMATCH(list!C2:C, "^POL02P14|POL02P26")),
"select Col1, Col2, Col3, Col5, Col6, Col7, Col8, Col9, Col10, Col11 where Col12 <> ''
order by Col3 asc, Col11 asc")
Please try this out in your sample sheet:
={results!A1:AR1;FILTER(results!A2:AR,REGEXMATCH(results!B2:B,JOIN("|","^"&LAMBDA(z,LAMBDA(x,y,z,{filter(filter(x,y="014"),xmatch(filter(x,y="014"),filter(x,y="026")));filter(filter(x,z="POL02P14"),xmatch(filter(x,z="POL02P14"),filter(x,z="POL02P26")))})(INDEX(z,,1),INDEX(z,,2),INDEX(z,,3)))((UNIQUE(FILTER({results!B2:B,LEFT(results!C2:C,3),LEFT(results!D2:D,8)},results!B2:B<>"",results!D2:D<>""))))&"$")))}
formula logic at a glance:
filter Col_B (Title) in 4 ways (matches to 014, 026, POL02P14, POL02P26)
capture the Col_B which has both 014 and 026
capture the Col_B which has both POL02P14 and POL02P26
Shortlist the Col_B which is TRUE for either step 2 OR step 3 above
Once the list is finalised join them all for regexmatch with Col_B for the final output.

Finding duplicates, concatenating and adding text

I am working with a 7.5K dataset of email addresses and the name of the list they are in and I need to format the list name in a JSON array ["apples","bananas","oranges"].
I used =countif(A:A,A1)>1 and colour to see the duplicates. But how do I combine the list name and have ["list 1","list 2"] from cell B2 if there is a duplicate?
Current data:
Column A
Column B
email 1
list 1
email 1
list 2
email 2
list 1
email 2
list 2
I want it:
email 1
["list 1","list 2"]
try:
=INDEX(REGEXREPLACE(TEXT(TRIM(SPLIT(FLATTEN(
QUERY(QUERY({A:A&"×", IF(B:B="",, B:B&",")},
"select max(Col2)
where Col2 is not null
group by Col2
pivot Col1"),,9^9)), "×")), {"#", "\[#\]"}), ";]", "]"))

Google Sheets to transform a table into a nested tree output or hierarchical structure output

I just wonder if there is an easy way to transform the Google Sheets Original Input cells into a more readable table as in Output A or even Output B?
Original Input
Output A
Output B
Group
Points
Group
Points
Points
A
BBBB
A
A
A
CCCC
BBBB
- BBBB
A
DDDD
CCCC
- CCCC
A
EEEE
DDDD
- DDDD
B
FFFF
EEEE
- EEEE
B
GGGG
B
B
B
HHHH
FFFF
- FFFF
GGGG
- GGGG
HHHH
- HHHH
I know I can do it with App Script but I am curious if there are some smart solutions like a combo of query/filter/sort without the need for the App Script solution.
or:
=ARRAYFORMULA(TRIM(SPLIT(FLATTEN(SPLIT(QUERY(FLATTEN(QUERY(TRANSPOSE(
QUERY(QUERY({A3:A&"×", "¤"&B3:B&"×", B3:B},
"select Col1,max(Col2) where Col3 is not null group by Col1 pivot Col3"),
"offset 1", 0)),, 9^9)),,9^9), "×")), "¤")))
Another approach (for your preferred arrangement, given that the non-header data runs in A3:B):
=ArrayFormula(SPLIT(QUERY({VLOOKUP(UNIQUE(FILTER(A3:A,A3:A<>""))&"*",{A3:A,ROW(A3:A)&"-"&COLUMN(A3:A)},{1,2},FALSE);FILTER({"|"&B3:B,ROW(B3:B)&"-"&COLUMN(B3:B)},A3:A<>"")},"Select Col1 ORDER BY Col2"),"|",1,0))
ADDENDUM (based on additional comment from poster):
This version of the formula will work on unsorted raw data:
=ArrayFormula(SPLIT(QUERY({VLOOKUP(SORT(UNIQUE(FILTER(A3:A,A3:A<>"")))&"*",{SORT(FILTER(A3:A,A3:A<>"")),SEQUENCE(COUNTA(A3:A),1)&"-"&COLUMN(A1)},{1,2},FALSE);SORT(FILTER("|"&B3:B,A3:A<>""),1,1,2,1),SEQUENCE(COUNTA(A3:A),1)&"-"&COLUMN(B1)},"Select Col1 ORDER BY Col2"),"|",1,0))
try:
=ARRAYFORMULA(TRIM(FLATTEN(SPLIT(QUERY(FLATTEN(QUERY(TRANSPOSE(
QUERY(QUERY({A3:A&"×", "- "&B3:B&"×", B3:B},
"select Col1,max(Col2) where Col3 is not null group by Col1 pivot Col3"),
"offset 1", 0)),, 9^9)),,9^9), "×"))))

Randomly splitting a dynamic range into equal groups

I have a dynamic range of values that I want to split into N groups, where N is specified by the user.
I want to do the following:
Split the group into N equal parts.
Specify which ones in the range that are to be in separate groups.
Have a "checker" to see if the math works out and it is possible to do (i.e. a group of 11 cannot be split into 2 groups).
Here is a scenario:
I have a list of 26 values (an array of letters, A to Z).
I want two groups, randomly split.
I specified 2 of the values where I want them to be separate (i.e. the letter B and X).
This should give me two groups, 13 values of "Group 1" and 13 values of "Group 2".
"Group 1" can contain something like ("B", "N", "V", "C", "T", ..... x 13)
"Group 2" contains ("X", "A", .... x 13)
The variables in this case are the # of values, # of groups to split, and specific values to split.
EDIT: google sheet example:
https://docs.google.com/spreadsheets/d/1baZr8QAkFjw1UwsMOyphma6v_MNkWz-aCLx57yPsViE/edit#gid=1454808593
try:
=ARRAYFORMULA(IF(B4<>TRUE,,TRANSPOSE(SPLIT(FLATTEN(QUERY(QUERY(QUERY({
ROUNDUP(SEQUENCE(COUNTA(B7:B))/(COUNTA(B7:B)/B2)),
SORT({RANDARRAY(COUNTA(B7:B)), FILTER(B7:B, B7:B<>"")})},
"select max(Col3) group by Col2 pivot Col1"), "offset 1", 0),,9^9)), " "))))
UPDATE:
=ARRAYFORMULA(IFNA(CHAR(96+VLOOKUP(B7:B, {QUERY({B7:B,
COUNTIFS(C7:C, C7:C, ROW(C7:C), "<="&ROW(C7:C))*C7:C}, "where Col2<>0");
{FILTER(B7:B, C7:C=FALSE, B7:B<>""), QUERY(SORT({RANDARRAY(SEQUENCE(B1-B2)),
ROUNDUP(SEQUENCE(B1-B2)/B3), QUERY({
COUNTIFS(C7:C, C7:C, ROW(C7:C), "<="&ROW(C7:C))*C7:C, B7:B},
"select Col2 where Col1=0 and Col2 is not null")}), "select Col2")}}, 2, 0))))

Google Spreadsheet Query count based on column header

I have a spreadsheet that looks like this:
Name 8/13/2020 | 8/17/2020 | 8/20/2020
John OT OT OT
Bob OT AL OT
Echo A LE OT
I would like to enter in one cell a date and then have it output me how many people have "OT", "AL", or "LE" in their column. As such I thought about using the query to do this:
=QUERY(MySheet!B:P, "SELECT COUNT(D) WHERE D = 'OT' OR D = 'AL' OR D = 'LE' LABEL COUNT(D) ''")
However, I always have to specify the column on which it has to count, is there a way I can not specify count and instead have it match up the correct column based on a date I enter in another field?
If you know of a different way I can do this that does not uses query, that is fine too as an answer.
try:
=COUNTIF(INDIRECT(
ADDRESS(6, MATCH(B2, 5:5, 0))&":"&
ADDRESS(ROWS(A:A), MATCH(B2, 5:5, 0))), "OT")
or:
=ARRAYFORMULA({A6:A, INDIRECT(
ADDRESS(6, MATCH(B2, 5:5, 0))&":"&
ADDRESS(ROWS(A:A), MATCH(B2, 5:5, 0)))})
or:
=ARRAYFORMULA(QUERY({A6:A, INDIRECT(
ADDRESS(6, MATCH(B2, 5:5, 0))&":"&
ADDRESS(ROWS(A:A), MATCH(B2, 5:5, 0)))},
"select Col1,count(Col1)
where Col1 is not null
group by Col1
pivot Col2"))

Resources