Google Spreadsheets: How do you concat strings in an aggregation function - google-sheets

Say I have a table:
A, 1
B, 1
C, 2
D, 1
E, 2
How do I view the table grouping by the 2nd column and aggregating by the first with a comma separated concat function ie:
1, "A,B,D"
2, "C,E"
In both defining a pivot table and using the QUERY syntax, it seems that the only aggregation functions available are numerical aggregations like MIN, MAX, SUM, etc. Can I define my own aggregation function?

You have to add a "Calculated Field" to the pivot table, and then select "Summarise by > Custom". This will make the column names in your formula refer to an array of values (instead of a single value). Then you can type a formula like:
= JOIN(", ", MyStringColumn)
More specifically, if you have the following table:
Create a pivot table by going to "Data > Pivot table", with the following configuration. Ensure "Summarize by" is set to "Custom"!

Another option: if the data is in A2:B, then, say, in D2:
=UNIQUE(B2:B)
and then in E2:
=JOIN(",",FILTER(A$2:A,B$2:B=D2))
which is filled down as required.
There are one-formula, auto-expanding solutions, although they get quite convoluted.

You're right, there's no easy way with pivot tables. This though, will do the trick. Inspired by this brilliant answer here.
First, have a header row and run a sort on column A to group by category.
So far, in your example, we have
| A | B
---+-----------+-----------
1 | CATEGORY | ATTRIBUTE
2 | 1 | A
3 | 1 | B
4 | 1 | D
5 | 2 | C
6 | 2 | E
In column C, let's prep the concatenated strings. Start in cell C2 with the following formula, and fill out vertically.
=IF(A2<>A1, B2, C1 & "," & B2)
...looking good...
| A | B | C
---+-----------+-----------+-----------
1 | CATEGORY | ATTRIBUTE | STRINGS
2 | 1 | A | A
3 | 1 | B | A,B
4 | 1 | D | A,B,D
5 | 2 | C | C
6 | 2 | E | C,E
In column D, let's validate the rows we want to select in a later step, with the following formula, starting in cell D2 and filling out. Basically we are marking the final category rows that carry the full concatenated strings.
=A2<>A3
...almost there now
| A | B | C | D
---+-----------+-----------+----------+-----------
1 | CATEGORY | ATTRIBUTE | STRINGS | VALIDATOR
2 | 1 | A | A | FALSE
3 | 1 | B | A,B | FALSE
4 | 1 | D | A,B,D | TRUE
5 | 2 | C | C | FALSE
6 | 2 | E | C,E | TRUE
Now, lets copy column C and D and paste special as values in the same place. Then add a filter on the whole table and filter out column D for the rows labeled TRUE. Now, remove the filter, delete columns B and D and row 1.
| A | B
---+-----------+-----------
1 | 1 | A,B,D
2 | 2 | C,E
Done. Get ice cream. Watch Road House.

Related

Google Sheets Query where column matches data on another sheets column range

I am trying to query only result that match a pre-defined list in Google Sheets.
This formula is working...but what if my WHERE list contains thousands of entries? Is there a better way?
=QUERY(DATA!A2:B,"SELECT B WHERE A = 'Data 1' OR A = 'Data 6' OR A = 'Data 8'")
Sheet #1 - Contains the data I would like to query.
| A | B |
| Data 1 | Something 1 |
| Data 2 | Something 2 |
| Data 3 | Something 3 |
| Data 4 | Something 4 |
| Data 5 | Something 5 |
Sheet #2 - Contains a pre-defined list (Column A) & a query formula (Colum B).
| A | B |
| Data1 | Something1 |
| Data6 | Something6 |
| Data8 | Something8 |
Here is the formula I have tried:
=QUERY(DATA!A2:B,"SELECT B WHERE A = '"&A2:A&"'")
Here is my Google Sheet: Click Here
Try this in cell Formula!C2:
=arrayformula( iferror( vlookup(A2:A, DATA!A2:B, columns(DATA!A2:B), false) ) )

List of the most frequently occurring words in the row

I'm looking for a way to show (in the Google Spreadsheet) the most frequently occurring word in the row, but if it isn't one word I want to display all of them separated by semicolon which have the same count of occurrence.
Explanation:
For example, I want to fill the last column with values as below:
+---+------+------+------+------+------+-------------------+
| | A | B | C | D | E | F |
+---+------+------+------+------+------+-------------------+
| 1 | Col1 | Col2 | Col3 | Col4 | Col5 | Expected response |
| 2 | A | A | C | D | E | A |
| 3 | A | A | B | B | B | B |
| 4 | A | A | B | B | E | A, B |
| 5 | A | B | C | D | E | A, B, C, D, E |
+---+------+------+------+------+------+-------------------+
Here's what I have achieved (formula for cell F2):
=INDEX(A2:E2; MODE(MATCH(A2:E2; A2:E2; 0)))
but it doesn't work for 4th and 5th row as I expect.
This works in Office 365 Excel, but probably will not in Excel online, as it is an array formula.
=TEXTJOIN(", ",TRUE,INDEX(A2:E2,,N(IF({1},MODE.MULT(IF(((MATCH(A2:E2,A2:E2,0)=COLUMN(A2:E2))*(COUNTIF(A2:E2,A2:E2)=MAX(COUNTIF(A2:E2,A2:E2)))),COLUMN(A2:E2)*{1;1}))))))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} Around the formula.
EDIT:
To do it with Google Sheets as you now want:
=join(", ",filter(A2:E2,column(A2:E2)=match(A2:E2,A2:E2,0),countif(A2:E2,A2:E2)=max(countif(A2:E2,A2:E2))))
F2:
=JOIN(",",SORTN(TRANSPOSE(A2:E2),1,1,ARRAY_CONSTRAIN(FREQUENCY(MATCH(A2:E2,A2:E2,0),COLUMN(A2:E2)),COUNTA(A2:E2),1),0))
See syntax # https://support.google.com/docs/table/25273

Coloring cells based on numbers in string

In my sheet, on column A I have a string of numbers divided by commas. Then there are 12 numbered columns, each with its own hour. How can I shade the cells if their column's number is listed in column A?
Here is an example of what I am hoping to achieve.
| A | B | C | D | E
1 | | 1 | 2 | 3 | 4
2 | 1,2,4 | (shaded) | (shaded) | | (shaded)
3 | 2,3 | | (shaded) | (shaded) |
Select the whole range
Apply conditional fofmatting
select custom formula: =MATCH(B$1,SPLIT($A2,","),0)>0

Query completed with an empty output

https://docs.google.com/spreadsheets/d/1033hNIUutMjjdwiZZ40u59Q8DvxBXYr7pcWyRRHAdXk
That's a link to the file in which it is not working! If you open it, go to sheet named "My query stinks".
The sheet called deposits has data like this in columns A (date), B (description), and C (amount):
+---+-----------+-----------------+---------+
| | A | B | C |
+---+-----------+-----------------+---------+
| 1 | 6/29/2016 | 1000000044 | 480 |
| 2 | 6/24/2016 | 1000000045 | 359.61 |
| 3 | 8/8/2016 | 201631212301237 | 11.11 |
+---+-----------+-----------------+---------+
The sheet "My Query Stinks" has data in columns A (check number), B (failing query) and C (amount):
+---+-----------------+------+--------+
| | A | B | C |
+---+-----------------+------+--------+
| 1 | 1000000044 | #N/A | 480 |
| 2 | 1000000045 | #N/A | 359.61 |
| 3 | 201631212301237 | #N/A | 11.11 |
+---+-----------------+------+--------+
In Column B on My Query Stinks, I want to enter a query. Here's what I'm trying:
=query(Deposits!A:C,"select A where A =" & A2)
For some reason, it returns "#N/A Error Query completed with an empty output." I want it to find that 1000000044 (the value in C4) matches 1000000044 over on Deposits and return the date.
Try
=query(Deposits!A:C,"select A where B ='" &A2&"'")
Explanation
Values like 1000000044 in Column B of the Deposit sheet and Column A of My Query Stinks sheets are set as text (string) values, so they should be enclosed on single quotes (apostrophes) otherwise QUERY think this values are numbers or variable names.
Try this:
=query(Deposits!A:C,"select A where B = '"&A2&"' LIMIT 1")
You'll need LIMIT 1 as you have multiple deposits for the same value in your second column.
Another solution for this problem could be to replace '=' with 'contains':
=query(Deposits!A:C,"select A where B contains '" &A2&"'")
Simple, but this error cost me half a morning.

Comparing Values on the Same Row

Sorry if this has been answered but hours of Google-ing has revealed no elegant solution.
I have a sheet that looks like this only there are hundreds of rows.
+---+---+---+---+-----+
| A | B | C | D | E |
+---+---+---+---+-----+
| X | a | Y | b | 1.2 |
| X | b | Y | c | 1.5 |
| Y | c | Z | c | 1.8 |
+---+---+---+---+-----+
My goal is to count rows where for example the character in column A="X", character in column C="X" and characters in columns B and D are not the same (B!=D). The first part is working...
COUNTIFS(A:A ,"X" , C:C, "X")
but I can't figure out how to compare two cells that are both part of a range but on the same line. The following seems to compare the whole ranges...
COUNTIFS(A:A ,"X" , C:C, "X", B:B, D:D)
Additionally, I'd like to sum the values in column E for similarly defined groups.
Thanks in advance!
Solved it! Added the following formula to each row in column F...
=(B:B=D:D)+0
That will return 1 or 0 depending on whether the contents of B and D is matching. And that is something I can add to my existing formula.
UNTIFS(A:A ,"X" , C:C, "X", F:F, 0)

Resources