Google Sheets: select top N cases within groups using QUERY - google-sheets

I want to make a report: "select salesmen who have N recent Status codes = 2,3,4,5".
A structural example of my data has 35 rows (including 1 header row):
link.
This file has a Date, Sales code (id of a salesman), Status code (id of how successful a transaction was) and other fields which are not necessary for the purpose.
I ended up using three formulas:
a QUERY function with IMPORTRANGE.In the example data it is slightly simpler - take only Sales Codes, Status Code, and Date from another sheet, then order by Date, Sales Code. Formula in A9:
=QUERY({Source!D1:E, Source!A1:A}, $B$4, 1)
an additional column with a sequentional numbering. Formula in D10:
=ArrayFormula(if(len(A10:A), ROW(A10:A) - MATCH(A10:A,A10:A,0) - 8, ))
a QUERY function to extract only N cases (let's say 5). Formula in F9:
=QUERY(A9:D, "select A, B where D <="&B3, 1)
Is there a way to combine all 3 steps into one so I get the output like in F10:G24 using one (and hopefully fast :)) formula? The formula I tried (expanded for readability):
=QUERY(
{
QUERY({Source!D1:E, Source!A1:A}, $B$4, 1),
ArrayFormula(
IF(len(J10:J),
ROW(J10:J) - MATCH(J10:J, J10:J, 0) - 8,
)
)
},
"select Col1, Col2 where Col4 <="&B3,
1
)
It gives me the error:
"Function ARRAY_ROW parameter 2 has mismatched row size. Expected: 28. Actual: 991."
I also tried finite data ranges in ROW() and MATCH() but that yields an empty table.
In the original database there are ~3500 rows and they will expand, so I think I should stick to infinite ranges to automate data extraction.

Hm well it's a bit of a nightmare TBH - a similar question has cropped up before but no easy answer. Here is a draft which involves repeating the basic query several times-
=ArrayFormula(query({query(A2:E,"select * where E>=2 and E<=5 order by D, A desc"),
row(indirect("2:"&count(filter(E2:E,E2:E>=2,E2:E<=5))+1))-
match(query(A2:E,"select D where E>=2 and E<=5 order by D, A desc"),
query(A2:E,"select D where E>=2 and E<=5 order by D, A desc"),0)
},"select * where Col6<=5"))
But needs looking at more to see if it can be simplified.
This is the full-column version including headers - I think it's OK
=ArrayFormula(query({query(A:E,"select * where E>=2 and E<=5 order by D, A desc"),
row(indirect("1:"&count(filter(E:E,E:E>=2,E:E<=5))+1))+1-
match(query(A:E,"select D where E>=2 and E<=5 order by D, A desc"),
query(A:E,"select D where E>=2 and E<=5 order by D, A desc"),0)
},"select * where Col6<=5"))

Related

Google sheets Query function with Arrayformula

For each of the email id, I want to get latest 10 records by timestamp. How do I get the results with arrayformula? Query function is not important as long as I can still achieve this with arrayformula. Here is the sample data:
https://docs.google.com/spreadsheets/d/1YAHA02VM-5MXzVKhkxu_eODPKObpoz441mGX8lOFu5M/edit?usp=sharing
Try this on another sheet, row 1:
=arrayformula(query({query({Sheet1!$A:$C},"order by Col1 desc,Col2",1),{"Dupe position";countifs(query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),row(Sheet1!$A2:$C),"<="&row(Sheet1!$A2:$C))}},"select Col1,Col2,Col3 where Col1 is not null and Col4 <= 10 order by Col1",1))
You can adjust the number of records found by adjusting Col4 <= 10, and also the final sort by altering order by Col1 at the end of the formula.
Explanation
This gets the data from Sheet1, sorts it by date desc then email asc:
query({Sheet1!$A:$C},"order by Col1 desc,Col2",1)
Then to the side of this data, a COUNTIFS() is used to get the number each time an email appears in the list above (since it's sorted desc, 1 represents the most recent instance).
countifs(<EmailColumnData>,<EmailColumnData>,row(<EmailColumn>),"<="&row(<EmailColumn>))
In place of <EmailColumnData> in the COUNTIF() is:
query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0)
In place of <EmailColumn> above, we only want the row number so we don't need the actual data. We can use:
Sheet1!$A2:$C
Various {} work as arrays to bring the data together.
Eg., {a,b,c;d,e,f} would result in three columns, with a, b, c in row 1 and d, e, f in row 2. , is a new column, ; is a return for a new row.
A final query around everything gets the 3 columns we need, where the count number in col 4 is <=10, then sorts the output by Col1 (date asc).
On second thoughts, maybe this is bit cheeky, but this might do it ( taken from conditional rank idea )
=ArrayFormula(filter(A2:C,countifs(A2:A,">="&A2:A,B2:B,B2:B)<=10,A2:A<>""))
EDIT
The above assumes (because the data is time-stamped) dups shouldn't occur. If they do and the data is pre-sorted, you can use row number as a proxy for time stamp as suggested by #Aresvik.
Alternatively, you could count separately
(a) only rows with a later timestamp
plus
(b) rows with the same time stamp but with earlier (or identical) row number
=ArrayFormula(filter(A2:C,countifs(A2:A,">"&A2:A,B2:B,B2:B)+countifs(A2:A,"="&A2:A,B2:B,B2:B,row(A2:A),"<="&row(A2:A))<=10,A2:A<>""))
I have added a new sheet ("Erik Help") with the following formula in A1:
=ArrayFormula({"Submitted Time","Email","Score";SORT(SPLIT(FLATTEN(QUERY(SORT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(IF(Sheet1!B2:B=TRANSPOSE(UNIQUE(FILTER(Sheet1!B2:B,Sheet1!B2:B<>""))),Sheet1!A2:A&"|"&Sheet1!B2:B&"|"&Sheet1!C2:C,),,COUNTA(Sheet1!A2:A)))," ",0,1)),SEQUENCE(MAX(COUNTIF(Sheet1!B2:B,Sheet1!B2:B))),0),"LIMIT 10")),"|",1,0),1,0)})
The number of records is set after LIMIT.
The order is set by the final two numbers: 1,0 (meaning "sort by column 1 in reverse order," which, as currently set, is sorting in reverse order by date/time).

QUERY Function: How to count with multiple counting criterias

[Goal]
I'm trying to count the number of tickets per employee in one column that has a status with either "Finished," "Finished (Scope)," or "Routed (Sales)" for a specific week. In another column I also want to count the number of tickets for a specific week without criteria. The data that I'm pulling from to count the tickets has the following column names.
Column A: Date,
Column B: Ticket ID,
Column E: Employee,
Column H: Finished Week
Column K: Week
In the formula, you'll notice that it's referring to cell H1, which the cell contains the current week which is this formula: =TODAY()-MOD(TODAY()-2,7)-1
[Current Formula]
=QUERY('Data'!$A$3:$J,"Select E,
COUNT(B) where D matches 'Finished|Finished \(Scope\)|Routed \(Sales\)'
AND H = "&H1&" GROUP BY E LABEL COUNT(B) 'Total Finished Tickets'",0)
[What it should look like]
I've created a sample spreadsheet that you can refer to.
Link: https://docs.google.com/spreadsheets/d/1MQLgt_SSbUIKv1rEwx-Y21hooxNOOgcUm_j1rFehHdg/edit?usp=sharing
[Issue]
I was able to create a table that counts the number of tickets per employee with the status as "Finished" OR "Finished (Scope)" OR "Routed (Sales)." Which is the "Current Result" table (Link: https://docs.google.com/spreadsheets/d/1MQLgt_SSbUIKv1rEwx-Y21hooxNOOgcUm_j1rFehHdg/edit#gid=0).
However, as I tried to add another count criteria, it gave me errors and I don't understand how to properly make this work. I wanted to look like the table of the title "Ideal Result" in the shared link. Can someone please help?
You can use the pivot clause to get a breakdown by the Status column like this:
=query(
Data!A3:J,
"select E, count(E)
where H = " & E4 & "
group by E
pivot D
label E 'Employee' ",
0
)
The downside is that the grand total must then be calculated separately, but that can be done with a simple sum() formula.
Alternatively, get the totals first, and then do a lookup to get the number of finished tickets, like this:
=query(
Data!A2:J,
"select E, count(D)
where H = " & E4 & "
group by E
label E 'Employee', count(D) 'Total new tickets' ",
0
)
=arrayformula(
iferror(
vlookup(
E12:E,
query(
Data!A2:J,
"select E, count(D)
where H = " & E4 & "
and (D = 'Finished' or D = 'Finished (Scope)')
group by E
label count(D) 'Finished tickets' ",
1
),
2,
false
)
)
)
Note that this serves just to illustrate how to aggregate the data into a report. Your question leaves it unclear as to which status values should be counted for each type of aggregation. No rows with status Routed (Sales) appear in the data, and I cannot see how the expected results you show could be derived from the data.
See your sample spreadsheet.
H1, which the cell contains the current week which is this formula
=TODAY()-MOD(TODAY()-2,7)-1
You may want to try the weeknum() function.
To get two independent counts, you can't use a Where clause because that would exclude cases from both counts, but you could use the fact that Query does not count empty cells something like this:
=ArrayFormula(query({if(regexmatch(D3:D,"Finished$|Finished \(Scope\)$|Routed \(Sales\)$"),true,),E3:E,if(K3:K>=H1,true,)},"select Col2,count(Col3),count(Col1) where Col2 is not null group by Col2 label count(Col1) 'Finished', count(Col3) 'New'",1))

Joining 2 dynamic ranges using {} operator in Google Sheets

I have 2 tables resulting from the query formulas
formula1
=query(BuchungSystem!A2:AZ,"Select C, F, H, I, J, K, L where Q = "& $B$10)
formula2
=arrayformula({{"MWST ",""}} & QUERY(query(BuchungSystem!A2:AZ,"Select M, N, L where Q = "& $B$10),"SELECT Col1*100, SUM(Col2) GROUP BY Col1, Col3 LABEL SUM(Col2) '' , Col1*100 ''"))
and the result is shown as below
I need to combine 2 ranges since the number of rows in table1 keeps changing and so in table2. Since both are query formulas, they would not expand if the underlying cells have data. To avoid such issues, I am thinking both tables can be joined together with 2 empty lines between tables.
I have tried to join the query result ranges using {formula1;formula2} but it gives me an error since both tables have differing columns. How can I merge the tables one below other?
Mind sharing an example sheet? Here is a quick example of how to include empty columns to make the ranges equal in column size.
={{QUERY(G1:K4,"SELECT G,H,I,J,K")};{"","","","",SUM(QUERY(G1:K4,"SELECT G,H,I,J,K"))}}
If you use 6 columns in one range and only 2 in the next, then you need 4 empty columns in the second {} so their sizes remain the same.

SUM last N values with criteria

I have simple table that looks like this:
All i need is to SUM points for specific player (John) in his last 3 matches.
I was able to come with this formula:
SUMPRODUCT(LARGE((A2:B="John")*(C2:D);{1;2;3}))
The problem is that instead of what I was looking for, it sums the highest 3 values, that can be anywhere in that range.
Is there some similar formula, that can do only the last 3 matches?
I think a SUMPRODUCT can get you there with some constructed arrays using a COUNTIFS() and ROW() to get the most recent 3.
This formula:
=SUMPRODUCT((COUNTIFS(A:B,G2,ROW(A:B)*{1,1},">="&ROW(A:B)*{1,1})<=3)*(A:B=G2),C:D)
on this sheet I made seems to work.
I thnk I have a formula that gives what you want. It's not pretty, and I'm sure it can be made simpler, but this works:
=query( query(
{ arrayformula( {ROW(A1:A) } ),
query(A1:D,"select A, B, C, D",1)
} , "select * order by Col1 desc",1),
"select Col2, Col3, Col4, Col5
where (Col2 ='John' or Col3 = 'John')
order by Col1 desc limit 3",1)
Basically, it adds the row number as an extra column to the data, so that we can sort the data in reverse order by row number. Then we query the result to find the first three occurences of 'John', in either Col A or Col B.
Here is a sample sheet:
https://docs.google.com/spreadsheets/d/1-mhTb5Cpp3D-1OltlmCfwlmM-vc2OknHxfJAyHD7BjI/edit?usp=sharing
Credit to Erik Tyler for a previous answer on a different question, on how to add the row number to a query.
Edit: Updated the sheet to provide the SUM of John's (or any player's) scores from the last three matches. This can be combined with the previous formula, if you want a single formula to place somewhere. Or will you have a list of all the players, and you'll want their last three scores beside each of their names?
If I can simplify the formula, I'll update it here.
Let me know if you need something more than this, or if this has answered your question.
Approach
I would use the query formula to get the cells that you need so that you can leverage the limit statement.
You should put a column with the indexes so that you can order the cells in descending order and take the first 3.
Given that your table headers are:
+-----------------------------------------------+
| INDEX | NAME 1 | NAME 2 | POINTS 1 | POINTS 2 |
+-----------------------------------------------+
I would use this query to get your desired result:
=SUMPRODUCT(QUERY(A2:E, "Select D * E where B = 'John' or C = 'John'" order by A desc limit 3"))

IF statement in query

I have a table with two columns A and B the first is a tag and the second is an amount. I am trying to write a query with two columns, one summing up negative values while the other summing up positive ones.
Coming from SQL, I tried the following
=QUERY(A1:B100,
"SELECT A, SUM( B * IF(B>0, 0, 1) ),
SUM( B * IF(B<0, 0, 1) ) GROUP BY A ")
But it seems that the IF function is not supported in a query. I know I can create two intermediate columns in my sheet (one for positive value and one for negative ones), but I was wondering if it's possible to achieve what I want with a query or somehow without intermediate columns.
If you must use the query function, assuming your Tag Data is in Column A, and your Values in Column B:
=arrayformula(query({A1:A100,if(B1:B100>0,B1:B100,),if(B1:B100<0,B1:B100,)},"Select Col1, sum(Col2), sum(Col3) where Col1 <>'' group by Col1 label Col1 'Tag', sum(Col2) 'Positive', sum(Col3) 'Negative'"))
Here's the example output: https://docs.google.com/spreadsheets/d/1DW5CyPCC71CopW48uKy6basn-WP4hMfh7kuuJXT-C4o/edit#gid=1606239479
=arrayformula(query({a1:a100,if(b1:b100>0,b1:b100,),if(b1:b100<0,b1:b100,)},"Select Col1,sum(Col2),sum(Col3) group by Col1"))
Please see this sheet for an example of using the FILTER function which is probably better than your query function for this use case:
https://docs.google.com/spreadsheets/d/1DW5CyPCC71CopW48uKy6basn-WP4hMfh7kuuJXT-C4o/edit?usp=sharing
I didn't know what you meant by tag, but I just created a list of random words, 10 negative, and 10 positive
With Tags in Column A and Numbers in Column B. Then in Column D I put this for the "positive" filter:
=filter($A$2:$A,$B$2:$B>0)
And for the Positive Sum:
=sum(filter($B$3:$B,$B$3:$B>0))
And in Column E for the Negative filter:
=filter($A$2:$A,$B$2:$B<0)
And for the Negative Sum:
=sum(filter($B$3:$B,$B$3:$B<0))
EDIT: I added another sheet in the workbook that shows you how to list the sum next to each tag in a filtered list of the tags:
On this sheet, I created examples of how to list the total sums of each particular tag: https://docs.google.com/spreadsheets/d/1DW5CyPCC71CopW48uKy6basn-WP4hMfh7kuuJXT-C4o/edit#gid=1784614303
This formula will look at the list of tags/values in Columns A & B, and then match and sum all tags that are in the cell to the left in Column D:
=sum(filter($B$3:$B,$B$3:$B>0,$A$3:$A=D3))

Resources