I have a data set that looks something like this:
Column A
Column B
category 1
Team 1
1.category 1
Team 1
2.category 2
Team 1
category 2
Team 1
category 3
Team 1
3.category 3
Team 1
I am trying to use query function with a pivot statement to calculate the occurrence of each category for team 1 (I have several other teams in the data set, but for simplicity I just wrote out my example with team 1). Unfortunately the naming of the categories are not consistent in the original data, and I cannot change them.
So I need a way to combine the results of the sum of category 1 and 1.category1, and so on.
How could I handle rewrite this to get the type of result as listed below?
Category
Team 1
category 1
2
category 2
2
category 3
2
The formula I have now is as following:
query('sheet1!A:B,"Select A, count(B) where B='Team 1' group by A pivot B label B 'Team 1'",1)
If the category names all have a similar format to those in your example (with extraneous data only at the beginning, followed by 'category N', and you don't care if zero counts per category are left blank then a more compact approach then the previous answer is (for any number of teams/categories):
=arrayformula(query({regexextract(A2:A,"category.+"),B2:B},"select Col1,count(Col1) where Col2 is not null group by Col1 pivot Col2 label Col1 'Category'",0))
formula:
=ArrayFormula(
LAMBDA(DATA,CATEGORY,
LAMBDA(RESULT,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(QUERY(SPLIT(TRANSPOSE(SPLIT(RESULT,"&")),"|"),"SELECT Col1,SUM(Col3) GROUP BY Col1 PIVOT Col2 LABEL Col1'Category'",0))
)(
JOIN("&",
BYROW(CATEGORY,LAMBDA(CAT,
JOIN("&",CAT&"|"&BYROW(TRANSPOSE(QUERY(DATA,"SELECT COUNT(Col1) WHERE lower(Col1) CONTAINS'"&CAT&"' PIVOT Col2",0)),LAMBDA(ROW,JOIN("|",ROW))))
))
)
)
)({ASC($A$2:$B$7)},{"category 1";"category 2";"category 3"})
)
use ASC() to format all numbers-like values into number,
use {} to create the match conditions,
iterate the conditions with BYROW() and...
use QUERY() with CONTAINS to COUNT matches of the given conditions,
use TRANSPOSE() to turn the match results of each row sideway,
change the results into string with JOIN(), this helps to modify the row and column arrangment,
SPLIT() the data to create the correct array format we can use,
use QUERY() to PIVOT the SUM of the COUNT result as our final output.
Another approch works in a slightly different concept:
=ArrayFormula(
LAMBDA(DATA,CAT,
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(COLA,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(COLA,JOIN("|",CAT)))
)(INDEX(DATA,,1),INDEX(DATA,,2))
)(ASC(DATA))
)($A$2:$B$7,{"category 1","category 2","category 3"})
)
We can modify the Category column of the input data with REGEXEXTRACT() before sending it into query, which in this case, do make the formula looks a bit cleaner.
Inspired by #The God of Biscuits 's answer, we can now get rid of the CAT variable, which makes the formula more elastic to fit into your condition.
This REGEXEXTRACT() will extract Category value from the 1st 'category' match found to the end of the 1st 'number' after it, with any spacing in between the two value.
=ArrayFormula(
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(LOWER(INDEX(DATA,,1)),"((?:category)(?: +?)(?:[0-9]|[0-9])+)"),INDEX(DATA,,2))
)($A$2:$B)
)
You can also use filter with a count a like this:
=counta(filter(Sheet1!A:A,(Sheet1!A:A="category 1")+(Sheet1!A:A="1.category 1"),Sheet1!B:B="Team 1"))
I would like to query in google sheets and sort the query by a specific column in accending order, and have a secondary sort that is also in ascending order. I already know how to do this by
=QUERY(A:C,"select * where month(A)+1 = 1 order by A,B ",0)
Here i queried 3 columns month, unique ID, and name. I selected the data with the necessary month, and sorted it by month, followed by a secondary sort of unique ID. But this query outputs 3 columns. How would i change the formula so the output does not include the month column anymore.
Your question is:
How would i change the formula so the output does not include the month column anymore.
Try wrapping your QUERY formula within another QUERY. Like:
=QUERY(QUERY(your_query_here),"select Col2, Col3")
For your given example it would be:
=QUERY(QUERY(A1:C22,"select * where month(A)+1 = 1 order by A,B ",0),
"select Col1, Col3")
I am trying to test some basic sums with large data sets with duplicated words in a row in this Gsheets Test duplicates
I like to get the top 4 suppliers by country, based on volume but I have the condition that suppliers are duplicated in different positions in different rows. A first approach was to remove duplicates by grouping within a query and just sum the Kgs, but that's missing some of the values as seen on the query cell result.
=query($A$3:$C$6;"SELECT SUM(C) WHERE A matches '.*Walmart.*' and B='USA' ORDER BY SUM(C) DESC LABEL SUM(C) 'Kgs'";0)
Any more efficient approaches?
try with group by:
=QUERY(A3:C6;
"select B,sum(C)
where A matches '.*Walmart.*'
and B='USA'
group by B
order by sum(C) desc
label sum(C) 'Kgs'"; 0)
I have searched on a lot of pages but I cannot find a solution to my problem except in reverse order. I have simplified what I do, but I have a query that comes looking for information in my data sheet. Here there are 3 columns, the date, the amount and the source.
I would like, with a query function, to be able to make different columns which counts the information of column C based on the values of its cells per month, like this
I'm okay with the start of the formula
=QUERY(A2:C,"select month(A)+1, sum(B), count(C) where A is not null group by month(A)+1")
But as soon as I try a little different things by putting 2 query together in an arrayformula, obviously the row count doesn't match as some minus are 0 for some sources.
Do you have a solution for what I'm trying to do? Thank you in advance :)
Solution:
It's not possible in Google Query Language to have a single query statement that has one result grouped by one column and another result grouped by another.
The first two columns can be like this:
=QUERY(A2:C,"select month(A)+1, sum(B) where A is not null group by month(A)+1 label month(A)+1 'Month', sum(B) 'Amount'")
To create the column labels for the succeeding columns, use in the first row, in my example, I1:
=TRANSPOSE(UNIQUE(C2:C))
Then from cell I2, enter this:
=COUNTIFS(arrayformula(month($A$2:$A)),$G2,$C$2:$C,I$1)
Then drag horizontally and vertically to apply to the entire table.
Results:
try:
=INDEX({
QUERY({MONTH(A2:A), B2:C},
"select Col1,sum(Col2) where Col2 is not null group by Col1 label Col1'month',sum(Col2)'amount'"),
QUERY({MONTH(A2:A), B2:C, C2:C},
"select count(Col3) where Col2 is not null group by Col1 pivot Col4")})
Is it possible to use the WHERE clause in combination with the PIVOT clause in the Google query language?
I started with a pivot table using the following query:
select B, sum(C) group by B pivot A
Now I would like to alter the pivot table so that only the columns of the years 2018 and after get displayed. I tried the following query:
select B, sum(C) group by B pivot A where A>=2018
Why do I get an error here and is there a way to achieve the goal of filtering a pivot table according to specific criteria?
You need to replace the where clause.
Please use the following formula:
=QUERY(A3:C9,"select B, sum(C) where A>=2018 group by B pivot A")
The order of the language clauses in a query are:
select, where, group by, pivot, order by, limit, offset, label, format, options