QUERY Function: How to count with multiple counting criterias - google-sheets

[Goal]
I'm trying to count the number of tickets per employee in one column that has a status with either "Finished," "Finished (Scope)," or "Routed (Sales)" for a specific week. In another column I also want to count the number of tickets for a specific week without criteria. The data that I'm pulling from to count the tickets has the following column names.
Column A: Date,
Column B: Ticket ID,
Column E: Employee,
Column H: Finished Week
Column K: Week
In the formula, you'll notice that it's referring to cell H1, which the cell contains the current week which is this formula: =TODAY()-MOD(TODAY()-2,7)-1
[Current Formula]
=QUERY('Data'!$A$3:$J,"Select E,
COUNT(B) where D matches 'Finished|Finished \(Scope\)|Routed \(Sales\)'
AND H = "&H1&" GROUP BY E LABEL COUNT(B) 'Total Finished Tickets'",0)
[What it should look like]
I've created a sample spreadsheet that you can refer to.
Link: https://docs.google.com/spreadsheets/d/1MQLgt_SSbUIKv1rEwx-Y21hooxNOOgcUm_j1rFehHdg/edit?usp=sharing
[Issue]
I was able to create a table that counts the number of tickets per employee with the status as "Finished" OR "Finished (Scope)" OR "Routed (Sales)." Which is the "Current Result" table (Link: https://docs.google.com/spreadsheets/d/1MQLgt_SSbUIKv1rEwx-Y21hooxNOOgcUm_j1rFehHdg/edit#gid=0).
However, as I tried to add another count criteria, it gave me errors and I don't understand how to properly make this work. I wanted to look like the table of the title "Ideal Result" in the shared link. Can someone please help?

You can use the pivot clause to get a breakdown by the Status column like this:
=query(
Data!A3:J,
"select E, count(E)
where H = " & E4 & "
group by E
pivot D
label E 'Employee' ",
0
)
The downside is that the grand total must then be calculated separately, but that can be done with a simple sum() formula.
Alternatively, get the totals first, and then do a lookup to get the number of finished tickets, like this:
=query(
Data!A2:J,
"select E, count(D)
where H = " & E4 & "
group by E
label E 'Employee', count(D) 'Total new tickets' ",
0
)
=arrayformula(
iferror(
vlookup(
E12:E,
query(
Data!A2:J,
"select E, count(D)
where H = " & E4 & "
and (D = 'Finished' or D = 'Finished (Scope)')
group by E
label count(D) 'Finished tickets' ",
1
),
2,
false
)
)
)
Note that this serves just to illustrate how to aggregate the data into a report. Your question leaves it unclear as to which status values should be counted for each type of aggregation. No rows with status Routed (Sales) appear in the data, and I cannot see how the expected results you show could be derived from the data.
See your sample spreadsheet.
H1, which the cell contains the current week which is this formula
=TODAY()-MOD(TODAY()-2,7)-1
You may want to try the weeknum() function.

To get two independent counts, you can't use a Where clause because that would exclude cases from both counts, but you could use the fact that Query does not count empty cells something like this:
=ArrayFormula(query({if(regexmatch(D3:D,"Finished$|Finished \(Scope\)$|Routed \(Sales\)$"),true,),E3:E,if(K3:K>=H1,true,)},"select Col2,count(Col3),count(Col1) where Col2 is not null group by Col2 label count(Col1) 'Finished', count(Col3) 'New'",1))

Related

Aggregating rows with query in google sheets

I have a data set that looks something like this:
Column A
Column B
category 1
Team 1
1.category 1
Team 1
2.category 2
Team 1
category 2
Team 1
category 3
Team 1
3.category 3
Team 1
I am trying to use query function with a pivot statement to calculate the occurrence of each category for team 1 (I have several other teams in the data set, but for simplicity I just wrote out my example with team 1). Unfortunately the naming of the categories are not consistent in the original data, and I cannot change them.
So I need a way to combine the results of the sum of category 1 and 1.category1, and so on.
How could I handle rewrite this to get the type of result as listed below?
Category
Team 1
category 1
2
category 2
2
category 3
2
The formula I have now is as following:
query('sheet1!A:B,"Select A, count(B) where B='Team 1' group by A pivot B label B 'Team 1'",1)
If the category names all have a similar format to those in your example (with extraneous data only at the beginning, followed by 'category N', and you don't care if zero counts per category are left blank then a more compact approach then the previous answer is (for any number of teams/categories):
=arrayformula(query({regexextract(A2:A,"category.+"),B2:B},"select Col1,count(Col1) where Col2 is not null group by Col1 pivot Col2 label Col1 'Category'",0))
formula:
=ArrayFormula(
LAMBDA(DATA,CATEGORY,
LAMBDA(RESULT,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(QUERY(SPLIT(TRANSPOSE(SPLIT(RESULT,"&")),"|"),"SELECT Col1,SUM(Col3) GROUP BY Col1 PIVOT Col2 LABEL Col1'Category'",0))
)(
JOIN("&",
BYROW(CATEGORY,LAMBDA(CAT,
JOIN("&",CAT&"|"&BYROW(TRANSPOSE(QUERY(DATA,"SELECT COUNT(Col1) WHERE lower(Col1) CONTAINS'"&CAT&"' PIVOT Col2",0)),LAMBDA(ROW,JOIN("|",ROW))))
))
)
)
)({ASC($A$2:$B$7)},{"category 1";"category 2";"category 3"})
)
use ASC() to format all numbers-like values into number,
use {} to create the match conditions,
iterate the conditions with BYROW() and...
use QUERY() with CONTAINS to COUNT matches of the given conditions,
use TRANSPOSE() to turn the match results of each row sideway,
change the results into string with JOIN(), this helps to modify the row and column arrangment,
SPLIT() the data to create the correct array format we can use,
use QUERY() to PIVOT the SUM of the COUNT result as our final output.
Another approch works in a slightly different concept:
=ArrayFormula(
LAMBDA(DATA,CAT,
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(COLA,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(COLA,JOIN("|",CAT)))
)(INDEX(DATA,,1),INDEX(DATA,,2))
)(ASC(DATA))
)($A$2:$B$7,{"category 1","category 2","category 3"})
)
We can modify the Category column of the input data with REGEXEXTRACT() before sending it into query, which in this case, do make the formula looks a bit cleaner.
Inspired by #The God of Biscuits 's answer, we can now get rid of the CAT variable, which makes the formula more elastic to fit into your condition.
This REGEXEXTRACT() will extract Category value from the 1st 'category' match found to the end of the 1st 'number' after it, with any spacing in between the two value.
=ArrayFormula(
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(LOWER(INDEX(DATA,,1)),"((?:category)(?: +?)(?:[0-9]|[0-9])+)"),INDEX(DATA,,2))
)($A$2:$B)
)
You can also use filter with a count a like this:
=counta(filter(Sheet1!A:A,(Sheet1!A:A="category 1")+(Sheet1!A:A="1.category 1"),Sheet1!B:B="Team 1"))

Counting over aggregated columns in Google Sheets

I have the yellow table shown below, and I'm trying to get the blue table, which aggregates columns B:F by value, and then counts the number of 'x' symbols for each row value of column A.
Is there some basic SQL/array magic formula to get this, please? There must be.
Use this new functions formula
=BYROW(B2:4, LAMBDA(v, COUNTIF(v, "=x")))
Used:
BYROW, LAMBDA, COUNTIF
v is the array_or_range
Update
={ A2:A4, BYROW(B2:4, LAMBDA(vv, COUNTIF(vv, "=x")))}
For fun
Update 02
=ArrayFormula(TRANSPOSE(QUERY({
QUERY(TRANSPOSE(IF(A1:4<>"x",A1:4,1)),
" Select * Where Col1 is not null ", 1)},
" Select (Col1),sum(Col2),sum(Col3),sum(Col4) Group by Col1 ", 1)))

Google sheets Query function with Arrayformula

For each of the email id, I want to get latest 10 records by timestamp. How do I get the results with arrayformula? Query function is not important as long as I can still achieve this with arrayformula. Here is the sample data:
https://docs.google.com/spreadsheets/d/1YAHA02VM-5MXzVKhkxu_eODPKObpoz441mGX8lOFu5M/edit?usp=sharing
Try this on another sheet, row 1:
=arrayformula(query({query({Sheet1!$A:$C},"order by Col1 desc,Col2",1),{"Dupe position";countifs(query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),row(Sheet1!$A2:$C),"<="&row(Sheet1!$A2:$C))}},"select Col1,Col2,Col3 where Col1 is not null and Col4 <= 10 order by Col1",1))
You can adjust the number of records found by adjusting Col4 <= 10, and also the final sort by altering order by Col1 at the end of the formula.
Explanation
This gets the data from Sheet1, sorts it by date desc then email asc:
query({Sheet1!$A:$C},"order by Col1 desc,Col2",1)
Then to the side of this data, a COUNTIFS() is used to get the number each time an email appears in the list above (since it's sorted desc, 1 represents the most recent instance).
countifs(<EmailColumnData>,<EmailColumnData>,row(<EmailColumn>),"<="&row(<EmailColumn>))
In place of <EmailColumnData> in the COUNTIF() is:
query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0)
In place of <EmailColumn> above, we only want the row number so we don't need the actual data. We can use:
Sheet1!$A2:$C
Various {} work as arrays to bring the data together.
Eg., {a,b,c;d,e,f} would result in three columns, with a, b, c in row 1 and d, e, f in row 2. , is a new column, ; is a return for a new row.
A final query around everything gets the 3 columns we need, where the count number in col 4 is <=10, then sorts the output by Col1 (date asc).
On second thoughts, maybe this is bit cheeky, but this might do it ( taken from conditional rank idea )
=ArrayFormula(filter(A2:C,countifs(A2:A,">="&A2:A,B2:B,B2:B)<=10,A2:A<>""))
EDIT
The above assumes (because the data is time-stamped) dups shouldn't occur. If they do and the data is pre-sorted, you can use row number as a proxy for time stamp as suggested by #Aresvik.
Alternatively, you could count separately
(a) only rows with a later timestamp
plus
(b) rows with the same time stamp but with earlier (or identical) row number
=ArrayFormula(filter(A2:C,countifs(A2:A,">"&A2:A,B2:B,B2:B)+countifs(A2:A,"="&A2:A,B2:B,B2:B,row(A2:A),"<="&row(A2:A))<=10,A2:A<>""))
I have added a new sheet ("Erik Help") with the following formula in A1:
=ArrayFormula({"Submitted Time","Email","Score";SORT(SPLIT(FLATTEN(QUERY(SORT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(IF(Sheet1!B2:B=TRANSPOSE(UNIQUE(FILTER(Sheet1!B2:B,Sheet1!B2:B<>""))),Sheet1!A2:A&"|"&Sheet1!B2:B&"|"&Sheet1!C2:C,),,COUNTA(Sheet1!A2:A)))," ",0,1)),SEQUENCE(MAX(COUNTIF(Sheet1!B2:B,Sheet1!B2:B))),0),"LIMIT 10")),"|",1,0),1,0)})
The number of records is set after LIMIT.
The order is set by the final two numbers: 1,0 (meaning "sort by column 1 in reverse order," which, as currently set, is sorting in reverse order by date/time).

Google Query SELECT statement concatenated with a NESTED IF result

Is it possible to return a Nested IF result from a CELL that will be concatenated to the SELECT statement in the QUERY function?
For example, I am trying to return the result for the following Nested IF function into the Query Function:
https://docs.google.com/spreadsheets/d/15i1E8AZHORRmPlu1VQqFRN1_7-aUyAz-hlYMOUtIlY4/edit?usp=sharing
Appreciate it, if anyone could take a look.
Regards
JVA
its done like this:
=QUERY(TESTDATA!A1:D16, "SELECT A, D, SUM(C) WHERE 1=1 "&
IF(AND(M3="NAME",N3="Customer"), " GROUP BY A, D PIVOT B",
IF(AND(N3 = "Customer"," AND A = '"&M3&"' GROUP BY A, D PIVOT B"),
" AND A = '"&M3&"' GROUP BY A, D PIVOT B",
" AND A = '"&M3&"'
AND D = '"&N3&"' GROUP BY A, D PIVOT B")), 1)
Sometimes, it's easier to FILTER the results before applying QUERY:
=ArrayFormula(QUERY(FILTER(A1:D16, A1:A16=M3, D1:D16=N3), "SELECT Col1, Col4, SUM(Col3) GROUP BY Col1, Col4 PIVOT Col2 LABEL Col1 'Name', Col4 'Customer'",0))
As you can see, this requires using Colx notation instead of letters to indicate columns in the SELECT clause; but this is actually (in my opinion) more versatile, since you don't have to rewrite the QUERY if you ever insert columns before the existing source data.
You'll also notice that I needed to LABEL the first two columns, since FILTER will have FILTERed out the headers. (In fact, for this reason, the ranges in the formula could just as easily have begun with row 2, e.g., A2:A16, etc.)
Finally, at least in your sample spreadsheet, you didn't need the sheet name to reference the source ranges, since the result is in the same sheet.

How can I adjust this query to add conditionals from other columns? Formula / sample sheet included

https://docs.google.com/spreadsheets/d/1TjkR3TEg_eSei-25zUm8yRimftQ6ocRKQNEfrN-9Ogc/edit?usp=sharing
^ Sample sheet with my current formula, sample data, and description of the problem/current situation.
The current formula calculates the average of the last 10 appearances (going from the bottom of the sheet upwards) of columns C or D when "New York" (cell K1) is in columns B or C.
If New York appears in column B then it uses the value in column D, and if New York appears in column C it uses the value in column E.
The improvement I want to make is that it only uses the values (within those last 10 appearances of "New York" / cell K1) based on conditionals of columns G/F. In this case, let's say >10 as the conditional.
When "New York" is in columns B/C, for the last 10 appearances, it should bring the value in D into the equation if the value in F is >10 (and New York is in column B), and it should bring E into the equation if the value in G >10 (and New York is in column C).
Any ideas?
range construct:
={A:A, B:B, D:D, G:G;
A:A, C:C, E:E, F:F}
or shorter:
={A:B, D:D, G:G;
A:A, C:C, E:F}
use:
=AVERAGE(QUERY(SORT({A:B, D:D, G:G; A:A, C:C, E:F}, 1, ),
"select Col3
where Col4 > 10
and Col2 = '"&K1&"'
limit 10"))
I won't calculate the average, just so you can see the data records the query is pulling, and confirm the records. But I think my formula works.
=query(
query(
{query(A1:G,"select A,B,D,G where B='"&K1&"' ",0);
query(A1:G,"select A,C,E,F where C='"&K1&"' ",0)},
"select * order by Col1 desc limit 10",0),
"select * where Col4 > 10",0)
To get the average, change the last line of the formula to:
"select avg(Col3) where Col4 > 10",0)
Note: my understanding is that you want to filter the ten latest records with New York, and then filter those ten records to just those which have a value > 10 in the right column. This is different then the ten latest records that are New York AND have a value > 10 in the right column. But either solution can be provided.
I've stacked two queries together, to make the correct columns align vertically. So the first inner query gets column A,B,D and G, checking for New York (ie equal to K1) in B. Then the second query stacks columns A,C,E, and F underneath, checking for New York in C.
An outer query then sorts them in descending order by the date column, Col1 (column A). By setting a limit of ten, we get the ten latest records.
A final query is used to select the records with Col4>10. By changing this query to just return the avg(Col3), you should have your desired result.
It should be easy to modify this formula to get what you need.
Note also I believe that you missed a couple of records to be blue - G21 and F28? And E21 should be green also?
Update
When using the final version of the formula, to extract the Average, you can add the LABEL parameter to the QUERY statement to rename, or remove, the header label for that average. So in my example, the SELECT statement would become:
"select avg(Col3) where Col4 > 10 label avg(Col3) '' ",0)
or
"select avg(Col3) where Col4 > 10 label avg(Col3) 'New Label Name Here' ",0)
Update #2
I have provided a sample sheet, which has the enhancements you requested. The formula that calculates the result, the average, is in J3. The formula looks to a variable cell, I3, for the city name. I3 uses data validation, from a list in K2:L, to present the drop down list of city names to pick from.
The selection criteria are located in J6 and J7. If you had standard values you wanted to pick from here, maybe between 10 and 20, they could also be presented with a drop down list. But otherwise, just type in the desired limit values.
As an enhancement, I used conditional formatting to color the active cells in the data. Note that all matching rows will get colored, not just the latest ten. But the formula calculating the average should just be suing the ten latest, THEN applying the criteria, before calculating the average. Test this carefully to be sure it is doing what you expect.
Note that the correct placement of the single and double quotes is very important when referencing criteria cells with the SELECT ... WHERE ... statements. Comparison to text values requires single quotes, whereas comparison to numeric values excludes the single quotes.
Valid QUERY Select statements for a numeric comparison:
"select * where A >= " & $B$5 & " limit 5 "
Valid QUERY Select statements for a text/string comparison:
"select * where A >= 'New York' limit 5 "
"select * where A >= '" & $B$5 & "' limit 5 "
<<== Do not have any spaces between the single and double quotes!
Invalid QUERY Select statements for a text/string comparison
"select * where A >= "New York" limit 5 "
<<== Do not have any spaces between the single and double quotes!
"select * where A >= ' " & $B$5 & " ' limit 5 "
<<== Valid, but matches " New York ", not "New York"!

Resources