Finding duplicates, concatenating and adding text - google-sheets

I am working with a 7.5K dataset of email addresses and the name of the list they are in and I need to format the list name in a JSON array ["apples","bananas","oranges"].
I used =countif(A:A,A1)>1 and colour to see the duplicates. But how do I combine the list name and have ["list 1","list 2"] from cell B2 if there is a duplicate?
Current data:
Column A
Column B
email 1
list 1
email 1
list 2
email 2
list 1
email 2
list 2
I want it:
email 1
["list 1","list 2"]

try:
=INDEX(REGEXREPLACE(TEXT(TRIM(SPLIT(FLATTEN(
QUERY(QUERY({A:A&"×", IF(B:B="",, B:B&",")},
"select max(Col2)
where Col2 is not null
group by Col2
pivot Col1"),,9^9)), "×")), {"#", "\[#\]"}), ";]", "]"))

Related

Aggregating rows with query in google sheets

I have a data set that looks something like this:
Column A
Column B
category 1
Team 1
1.category 1
Team 1
2.category 2
Team 1
category 2
Team 1
category 3
Team 1
3.category 3
Team 1
I am trying to use query function with a pivot statement to calculate the occurrence of each category for team 1 (I have several other teams in the data set, but for simplicity I just wrote out my example with team 1). Unfortunately the naming of the categories are not consistent in the original data, and I cannot change them.
So I need a way to combine the results of the sum of category 1 and 1.category1, and so on.
How could I handle rewrite this to get the type of result as listed below?
Category
Team 1
category 1
2
category 2
2
category 3
2
The formula I have now is as following:
query('sheet1!A:B,"Select A, count(B) where B='Team 1' group by A pivot B label B 'Team 1'",1)
If the category names all have a similar format to those in your example (with extraneous data only at the beginning, followed by 'category N', and you don't care if zero counts per category are left blank then a more compact approach then the previous answer is (for any number of teams/categories):
=arrayformula(query({regexextract(A2:A,"category.+"),B2:B},"select Col1,count(Col1) where Col2 is not null group by Col1 pivot Col2 label Col1 'Category'",0))
formula:
=ArrayFormula(
LAMBDA(DATA,CATEGORY,
LAMBDA(RESULT,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(QUERY(SPLIT(TRANSPOSE(SPLIT(RESULT,"&")),"|"),"SELECT Col1,SUM(Col3) GROUP BY Col1 PIVOT Col2 LABEL Col1'Category'",0))
)(
JOIN("&",
BYROW(CATEGORY,LAMBDA(CAT,
JOIN("&",CAT&"|"&BYROW(TRANSPOSE(QUERY(DATA,"SELECT COUNT(Col1) WHERE lower(Col1) CONTAINS'"&CAT&"' PIVOT Col2",0)),LAMBDA(ROW,JOIN("|",ROW))))
))
)
)
)({ASC($A$2:$B$7)},{"category 1";"category 2";"category 3"})
)
use ASC() to format all numbers-like values into number,
use {} to create the match conditions,
iterate the conditions with BYROW() and...
use QUERY() with CONTAINS to COUNT matches of the given conditions,
use TRANSPOSE() to turn the match results of each row sideway,
change the results into string with JOIN(), this helps to modify the row and column arrangment,
SPLIT() the data to create the correct array format we can use,
use QUERY() to PIVOT the SUM of the COUNT result as our final output.
Another approch works in a slightly different concept:
=ArrayFormula(
LAMBDA(DATA,CAT,
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(COLA,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(COLA,JOIN("|",CAT)))
)(INDEX(DATA,,1),INDEX(DATA,,2))
)(ASC(DATA))
)($A$2:$B$7,{"category 1","category 2","category 3"})
)
We can modify the Category column of the input data with REGEXEXTRACT() before sending it into query, which in this case, do make the formula looks a bit cleaner.
Inspired by #The God of Biscuits 's answer, we can now get rid of the CAT variable, which makes the formula more elastic to fit into your condition.
This REGEXEXTRACT() will extract Category value from the 1st 'category' match found to the end of the 1st 'number' after it, with any spacing in between the two value.
=ArrayFormula(
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(LOWER(INDEX(DATA,,1)),"((?:category)(?: +?)(?:[0-9]|[0-9])+)"),INDEX(DATA,,2))
)($A$2:$B)
)
You can also use filter with a count a like this:
=counta(filter(Sheet1!A:A,(Sheet1!A:A="category 1")+(Sheet1!A:A="1.category 1"),Sheet1!B:B="Team 1"))

How to Get the Sum Total of a Category if Data is Split Between Multiple Adjacent Tables

I have attached a sample of the format the data I am working with is in. The actual data set has many more columns. So I am looking for a single formula that will get the totals for a category from the whole table. As you can see in the photo we have "Test 2" in columns B and D with values of 1 and 9 respectively. That is a total of 10. Is there a singular formula that would return 10? Thank you.
try:
=QUERY({FLATTEN(FILTER(B2:G10, MOD(COLUMN(B:G), 2)=0)),
FLATTEN(FILTER(B2:G10, MOD(COLUMN(B:G)-1, 2)=0))},
"select Col1,sum(Col2)
where Col1 is not null
group by Col1
label Col1'Name',sum(Col2)'Totals'")
for unknown number of columns try:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(
FILTER(B2:1000, MOD(COLUMN(B2:2), 2)=0)&"×"&
FILTER(B2:1000, MOD(COLUMN(B2:2)-1, 2)=0)), "×"),
"select Col1,sum(Col2)
where Col2 is not null
group by Col1
label Col1'Name',sum(Col2)'Totals'"))

Filter by array with 2 fields if text contains (filter or query)

I need help as I don't know how to make this formula. I tried multiple variations with FILTER and QUERY functions, but still no success.
I have 2 sheets:
USERS - contains user email, and 2 location columns: city and state
LOCATIONS - contains 2 columns: city and state - it's a list of locations
I need a third sheet that would list all users whose location is listed in LOCATIONS sheet. Each user should be in its own row.
Conditions:
Extracted users must match both city and state columns to those in LOCATIONS sheet, to avoid getting users from multiple locations like Portland, OR, and Portland, TX, when I need just one of them
City column in USERS might have multiple cities separated by ", " inside a single cell if the user is in multiple locations, so city needs to be filtered by "if text contains" condition
Here's a copy of an example sheet: https://docs.google.com/spreadsheets/d/1XruYIMq0nklFInqcGtzN7nd26rXTNnudsZNMI70uG4I/copy
try:
=ARRAYFORMULA(IFNA(VLOOKUP(E2:E&"♥"&F2:F;
REGEXREPLACE(TRIM(SPLIT(FLATTEN(QUERY(TRANSPOSE(QUERY(QUERY(TRIM(SPLIT(
FLATTEN(IF(IFERROR(SPLIT(USERS!B2:B; ","))="";;
SPLIT(USERS!B2:B; ",")&"♥"&USERS!C2:C&"♠♦"&USERS!A2:A&",♦"&USERS!A2:A)); "♦"));
"select Col1,max(Col2) where Col2 is not null group by Col1 pivot Col3");
"offset 1"; 0));;9^9)); "♠")); ",$"; ); 2; 0)))
update:
=ARRAYFORMULA(QUERY({USERS!A:C, TRIM(FLATTEN(QUERY(TRANSPOSE(
IF(IFERROR(SPLIT(USERS!B:B, ","))="",,
SPLIT(USERS!B:B, ",")&"♥"&USERS!C:C)),,9^9)))},
"select Col1,Col2,Col3 where Col4 matches '.*"&TEXTJOIN(".*|.*", 1,
IF(LOCATIONS!A2:A="",,LOCATIONS!A2:A&"♥"&LOCATIONS!B2:B))&".*'", 1))

Grouping by Name in Query Formula not working

So I am using a query function to count the number of instances a particular name appears in column A of another sheet, and display that result in Column B of this sheet with the respective name in Column A. Here is the function:
=ArrayFormula(QUERY(Attendance!A:A&{"",""},"select Col1, count(Col2) where Col1 != '' group by Col1 label count(Col2) 'Count'",1))
The problem is, while it works for the most part, some of the names appear twice, for instance Fred Jones appears as:
Col A | Col B
Fred Jones | 5
Fred Jones | 2
I have looked at the names, and there is no discernible difference between them, I do not understand why it is not grouping. Is there a way I can use wildcard or something to get Google to combine the names if they are nearly identical? Any help would be appreciated, thanks as always.
try:
=ARRAYFORMULA(QUERY(TRIM({Attendance!A:A}),
"select Col1,count(Col1)
where Col1 is not null
group by Col1
label count(Col1)'Count'", 1))

How could I form a string of items grouped by a matching value? (e.g. "Red: Apples, Cherries")

I'm wondering if there is a decent way to do this (without scripts) - if not, I can attempt creating a script for it but some users of this sheet will be using Excel on their computers so I'm trying to keep it scriptless as much as possible.
I have a sheet set up to display text based on certain conditions that is meant to be copied and pasted into an external program.
There is a column for months jan-dec and a column next to that where the user can input a number from 1-10 (and those numbers are associated with strings that are found with Vlookup on another sheet. They're basically "error codes" just to keep the sheet clean. But I'm just omitting this part because it's not needed for this question)
Right now, the text that populates shows:
Jan: 1
Feb: 2
Apr: 1
How could I group these by the value instead of listing them separately? Something like:
1: Jan, Apr
2: Feb
Is it possible to grab the items from that months list and put them in their own lists?
This is the current formula for reference:
=if(countif(Calculator!B2:B13,">0"),CONCATENATE(C2:C13),"None")
(Calculator sheet)B2:B13 --> column with the numbers
(Data sheet)C2:C13 --> a concatenated string that contains the month name from one cell and the number (or technically the string associated with that number as I mentioned before)
Each cell in the C column has the Jan: 1, Feb: 2 data and any month without data is left blank. When I concatenate the C cells together, it automatically omits the blank cells which is helpful but now I'd really like to group them by that value instead.
Here is the example sheet that reflects this
delete A15 and paste this in A14:
={""; ARRAYFORMULA(TEXTJOIN(CHAR(10), 1, REGEXREPLACE(TRIM(
TRANSPOSE(QUERY(QUERY({A2:A13&",", B2:B13&":"},
"select max(Col1)
where not Col2 matches ':'
group by Col1
pivot Col2"),,9^9))), ",$", )))}
UPDATE:
if order matters...
={""; ARRAYFORMULA(TEXTJOIN(CHAR(10), 1, REGEXREPLACE(TRIM(
TRANSPOSE(QUERY(QUERY({"♦"&ROW(A2:A13)&"♦"&A2:A13&",", B2:B13&":"},
"select max(Col1)
where not Col2 matches ':'
group by Col1
pivot Col2"),,9^9))), "♦\d+♦|,$", )))}
UPDATE:
={""; ARRAYFORMULA(JOIN(CHAR(10), SUBSTITUTE(REGEXREPLACE(TRIM(QUERY(QUERY({
SORT(FILTER({SUBSTITUTE(A1:A12, "'", "/"&20)*1, B1:B12&":"}, B1:B12<>""), 2, 1, 1, 1)},
"select max(Col1)
group by Col1
pivot Col2
format max(Col1) 'Mmm♦yy,'"),,99^99)), ",$", ), "♦", CHAR(39))))}

Resources