Grouping by Name in Query Formula not working - google-sheets

So I am using a query function to count the number of instances a particular name appears in column A of another sheet, and display that result in Column B of this sheet with the respective name in Column A. Here is the function:
=ArrayFormula(QUERY(Attendance!A:A&{"",""},"select Col1, count(Col2) where Col1 != '' group by Col1 label count(Col2) 'Count'",1))
The problem is, while it works for the most part, some of the names appear twice, for instance Fred Jones appears as:
Col A | Col B
Fred Jones | 5
Fred Jones | 2
I have looked at the names, and there is no discernible difference between them, I do not understand why it is not grouping. Is there a way I can use wildcard or something to get Google to combine the names if they are nearly identical? Any help would be appreciated, thanks as always.

try:
=ARRAYFORMULA(QUERY(TRIM({Attendance!A:A}),
"select Col1,count(Col1)
where Col1 is not null
group by Col1
label count(Col1)'Count'", 1))

Related

How to get unique values in a column, including cells with multiple values seperated by commas, in Google Sheet?

I have a column in a Google Sheet, which in some cases, includes multiple values separated by commas — like this:
Value
A example
B example
C example
D example
A example, E example
A example, F example
G example, D example, C example
I would like to count all occurrences of the unique values in this column, so the count should look like:
Unique value
Occurrences
A example
3
B example
1
C example
2
D example
2
E example
1
F example
1
G example
1
Currently, however, when I use =UNIQUE(A2:A), the result gives this:
Unique value
Occurrences
A example
1
B example
1
C example
1
D example
1
A example, E example
1
A example, F example
1
G example, D example, C example
1
Is there a way I can count all of the instances of letters, whether they appear in individually in a cell or appear alongside other letters in a cell (comma-seperated)?
(This looks like a useful answer in Python, but I'm trying to do this in Google Sheets)
try:
Formula in C1:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ")),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 label count(Col1) ''"))
Or, as per the comments, split on the combination instead:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ",0)),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 label count(Col1) ''"))
2nd EDIT: To order descending by count use:
=INDEX(QUERY(IFERROR(FLATTEN(SPLIT(A1:A,", ",0)),""),"Select Col1, count(Col1) where Col1 is not null group by Col1 Order By count(Col1) desc label count(Col1) ''"))
Assuming data in A1:A7:
In C1:
=SORT(UNIQUE(FLATTEN(ARRAYFORMULA(SPLIT(A1:A7,", ")))))
In D1:
=ARRAYFORMULA(MMULT(0+ISNUMBER(SEARCH(", "&ColumnCSpilledRange&", ",", "&TRANSPOSE(A1:A7)&", ")),ROW(A1:A7)^0))
Replace ColumnCSpilledRange appropriately.

Efficient way to collate/aggregate specific data in Google Sheets

I'm looking for an efficient way to gather and aggregate some date in Google Sheets. I've been looking at the query function, pivot tables, and Index + Match formulas, but so far I've not found a way that brings me to the result I'm looking for. I have a set of data which looks more or less as follows.
The fields with an X represent irrelevant data which I don't want to show up in my end result. They only serve to illustrate that there are columns of data that I don't want in between the columns of data that I do want. The data in those columns is of varying types and of varying values per type, they are not actually fields with an "X" in it. Only the fields with numbers are of interest along with the related names at the top and left of those. The intent is to create a list that looks more or less like this.
I've highlighted those yellow fields because that data has been aggregated. For example, in the original file field D3 shows a relation between Laura and Pete with the number 1, and field L3 also shows a relation between Laura and Pete, so the number in that field is to be added to the number in the other field resulting in an aggregated total of 2 for that particular combination.
I would really appreciate any suggestions that can help me get to an elegant and efficient solution for this. The only solutions I can come up with would involve multiple "in-between" sheets and there just has to be a better way.
UPDATE:
Solved by applying the solution in player0's answer. I just had to switch around the order of Col1 and Col2 in the formula to get the table sorted the way I needed it. Formula looks like below now. Many thanks to both player0 and Erik Tyler for their efforts.
=INDEX(QUERY(SPLIT(FLATTEN(A2:A&"×"&D1:N1&"×"&D2:N), "×"),
"select Col2,Col1,sum(Col3)
where Col2 is not null
and Col3 is not null
group by Col2,Col1
label sum(Col3)''", ))
try:
=INDEX(QUERY(SPLIT(FLATTEN(A2:A&"×"&D1:N1&"×"&D2:N), "×"),
"where Col3 is not null and Col2 is not null", ))
update:
=INDEX(QUERY(SPLIT(FLATTEN(A2:A&"×"&D1:N1&"×"&D2:N), "×"),
"select Col1,Col2,sum(Col3)
where Col3 is not null
and Col2 is not null
group by Col1,Col2
label sum(Col3)''", ))
Given your current data set (which only appears to extend to Col N), place the following somewhere to the right of Col N:
=ArrayFormula(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(SPLIT(QUERY(FLATTEN(FILTER(IF(NOT(ISNUMBER(D2:N)),,D1:N1&"~ "&A2:A&"|"&D2:N),A2:A<>"")),"Select * WHERE Col1 Is Not Null"),"|"),"Select Col1, SUM(Col2) GROUP BY Col1 LABEL SUM(Col2) ''")&"~ "),,2)),"~ ",0,1))
It would be better if this were placed in a different sheet from the original data. Supposing that your original data sheet is named Sheet1, place the following version of the above formula into a new sheet:
=ArrayFormula(SPLIT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(SPLIT(QUERY(FLATTEN(FILTER(IF(NOT(ISNUMBER(INDIRECT("Sheet1!D2:"&ROWS(Sheet1!A:A)))),,Sheet1!D1:1&"~ "&Sheet1!A2:A&"|"&INDIRECT("Sheet1!D2:"&ROWS(Sheet1!A2:A))),Sheet1!A2:A<>"")),"Select * WHERE Col1 Is Not Null"),"|"),"Select Col1, SUM(Col2) GROUP BY Col1 LABEL SUM(Col2) ''")&"~ "),,2)),"~ ",0,1))
This separate-sheet approach and formula allows for the original data to extend indefinitely past Col N.

How can I avoid having to put 0s into the NULL fields to get a correct query calculation in Google Sheets

I have a Google Sheets question, which I have not been able to figure out yet with Google-Fu and RTFM:
Take the following spreadsheet as an example:
https://docs.google.com/spreadsheets/d/1IvMVaUdUDfYOoKyG0Uwd2n0M1mLjOTE5yZQ9K2R3q2M/edit?usp=sharing
In case the sheet gets lost in time, I am going to post its contents here:
Sheet1:
foo
withdrawal
deposit
C
4
10
D
10
E
10
4
As you see here, the withdrawal field for the D value being foo is empty, i.e. null
Sheet2:
foo
balance
C
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A2&"'"), 2)
D
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A3&"'"), 2)
E
=INDEX(QUERY({Sheet1!$A$2:C}, "SELECT SUM(Col3) - SUM(Col2) WHERE Col1 = '"&A4&"'"), 2)
The result is
foo
balance
C
6
D
E
-6
As you see, the balance field for the category D is null, although it should be -10.
The fix for that is to put a 0 into the deposit field in Sheet1 explicitly.
In my example, I get that data using a csv-export, and fields are generally empty and not 0, and it is cumbersome to add the 0 there. Is there a way to have something like COALESCE in that sum there (like in SQL)?
Please let me know.
it seems like something quite a bit simpler would avoid the problem:
=SUMPRODUCT(Sheet1!C:C-Sheet1!B:B,Sheet1!A:A=A2)
for cell B2.
Why don't you just add this in cell A1 of Sheet2 instead of all the Query:
=arrayformula({Sheet1!A1,"balance";if(Sheet1!A2:A<>"",{Sheet1!A2:A,Sheet1!C2:C-Sheet1!B2:B},)})
Obviously ensure cells Sheet2!A2:A and Sheet2!B1:B are empty.
If you have duplicate values of foo, try:
=arrayformula(query({Sheet1!A1,"balance";if(Sheet1!A2:A<>"",{Sheet1!A2:A,Sheet1!C2:C-Sheet1!B2:B},)},"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2) 'balance'",1))
A better option for a single-cell formula, referencing multiple sheets would be:
=arrayformula(query(
{Sheet1!A:A,n(Sheet1!B:C);Sheet2!A2:A,n(Sheet2!B2:C);Sheet3!A2:A,n(Sheet3!B2:C)},
"select Col1,sum(Col3)-sum(Col2) where Col1 is not null group by Col1 label sum(Col3)-sum(Col2) 'balance' ",1))

How to Get the Sum Total of a Category if Data is Split Between Multiple Adjacent Tables

I have attached a sample of the format the data I am working with is in. The actual data set has many more columns. So I am looking for a single formula that will get the totals for a category from the whole table. As you can see in the photo we have "Test 2" in columns B and D with values of 1 and 9 respectively. That is a total of 10. Is there a singular formula that would return 10? Thank you.
try:
=QUERY({FLATTEN(FILTER(B2:G10, MOD(COLUMN(B:G), 2)=0)),
FLATTEN(FILTER(B2:G10, MOD(COLUMN(B:G)-1, 2)=0))},
"select Col1,sum(Col2)
where Col1 is not null
group by Col1
label Col1'Name',sum(Col2)'Totals'")
for unknown number of columns try:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(
FILTER(B2:1000, MOD(COLUMN(B2:2), 2)=0)&"×"&
FILTER(B2:1000, MOD(COLUMN(B2:2)-1, 2)=0)), "×"),
"select Col1,sum(Col2)
where Col2 is not null
group by Col1
label Col1'Name',sum(Col2)'Totals'"))

Create an Query Formula

How can i create a query formula, if the user code in the "respostas" sheet is "1" (for example), returns which names (for this user) have less than 5 countries. In other words give me the result of the formulas F, G and H combined.
Result wanted: Nathan, Sam and Anna.
it's possible to do a query, that relates both sheets with the condition of the user?
https://docs.google.com/spreadsheets/d/1GdL5psaLKDix7282AZXGvZJUlm7rfhU2x5KlKTjp6ig/edit?usp=sharing
Thank you in advance
Yes, you could do it with a query combining columns F, G and H like this:
=ArrayFormula(if(A2:A="",,C2:C-vlookup(A2:A,query(filter({A2:A,COUNTIF(Respostas!B2:B,B2:B)},A2:A<>""),"select Col1,sum(Col2) group by Col1 label sum(Col2) ''"),2,false)<5))
or if you want to see the people with less than five responses:
=ArrayFormula(query(query(filter({A2:A,COUNTIF(Respostas!B2:B,B2:B)},A2:A<>""),"select Col1,sum(Col2) group by Col1 label sum(Col2) ''"),"select Col1, Col2 where Col2<5"))

Resources