How to query spreadsheet column data, matching and summarising values? - google-sheets

I want to get the total of entries in a column for a particular value. The values are a number such as 0.5 or 1 and then a code such a H, S, O or WFH. i.e "0.5 H" or "1 S"
It is a absence spreadsheet recording holiday, sickness, appointments and working from home. One sheet has a row for every day in the year, the columns represent all the staff members. I wish to be able to query the values under the columns and then summarise that per staff member / per month.
I have googled but not found anything similar enough to put me in the right direction.
Any help would be greatly appreciated.

query function might help you:
about
Query Language Reference
Please, provide data sample to get more help.
your query will look like this:
=query({A:D}, "select Col1, Col2, sum(Col4) where Col3 = '1 S' group by Col1, Col2")
In this sample query formula summarises data in A:D per columns A, B (Col1, Col2) and checks the value in column C (Col3) eqwuals '1 S'.

Related

How to Query in google sheets, sort by a column and not include that column in the output

I would like to query in google sheets and sort the query by a specific column in accending order, and have a secondary sort that is also in ascending order. I already know how to do this by
=QUERY(A:C,"select * where month(A)+1 = 1 order by A,B ",0)
Here i queried 3 columns month, unique ID, and name. I selected the data with the necessary month, and sorted it by month, followed by a secondary sort of unique ID. But this query outputs 3 columns. How would i change the formula so the output does not include the month column anymore.
Your question is:
How would i change the formula so the output does not include the month column anymore.
Try wrapping your QUERY formula within another QUERY. Like:
=QUERY(QUERY(your_query_here),"select Col2, Col3")
For your given example it would be:
=QUERY(QUERY(A1:C22,"select * where month(A)+1 = 1 order by A,B ",0),
"select Col1, Col3")

Google sheets Query function with Arrayformula

For each of the email id, I want to get latest 10 records by timestamp. How do I get the results with arrayformula? Query function is not important as long as I can still achieve this with arrayformula. Here is the sample data:
https://docs.google.com/spreadsheets/d/1YAHA02VM-5MXzVKhkxu_eODPKObpoz441mGX8lOFu5M/edit?usp=sharing
Try this on another sheet, row 1:
=arrayformula(query({query({Sheet1!$A:$C},"order by Col1 desc,Col2",1),{"Dupe position";countifs(query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),row(Sheet1!$A2:$C),"<="&row(Sheet1!$A2:$C))}},"select Col1,Col2,Col3 where Col1 is not null and Col4 <= 10 order by Col1",1))
You can adjust the number of records found by adjusting Col4 <= 10, and also the final sort by altering order by Col1 at the end of the formula.
Explanation
This gets the data from Sheet1, sorts it by date desc then email asc:
query({Sheet1!$A:$C},"order by Col1 desc,Col2",1)
Then to the side of this data, a COUNTIFS() is used to get the number each time an email appears in the list above (since it's sorted desc, 1 represents the most recent instance).
countifs(<EmailColumnData>,<EmailColumnData>,row(<EmailColumn>),"<="&row(<EmailColumn>))
In place of <EmailColumnData> in the COUNTIF() is:
query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0)
In place of <EmailColumn> above, we only want the row number so we don't need the actual data. We can use:
Sheet1!$A2:$C
Various {} work as arrays to bring the data together.
Eg., {a,b,c;d,e,f} would result in three columns, with a, b, c in row 1 and d, e, f in row 2. , is a new column, ; is a return for a new row.
A final query around everything gets the 3 columns we need, where the count number in col 4 is <=10, then sorts the output by Col1 (date asc).
On second thoughts, maybe this is bit cheeky, but this might do it ( taken from conditional rank idea )
=ArrayFormula(filter(A2:C,countifs(A2:A,">="&A2:A,B2:B,B2:B)<=10,A2:A<>""))
EDIT
The above assumes (because the data is time-stamped) dups shouldn't occur. If they do and the data is pre-sorted, you can use row number as a proxy for time stamp as suggested by #Aresvik.
Alternatively, you could count separately
(a) only rows with a later timestamp
plus
(b) rows with the same time stamp but with earlier (or identical) row number
=ArrayFormula(filter(A2:C,countifs(A2:A,">"&A2:A,B2:B,B2:B)+countifs(A2:A,"="&A2:A,B2:B,B2:B,row(A2:A),"<="&row(A2:A))<=10,A2:A<>""))
I have added a new sheet ("Erik Help") with the following formula in A1:
=ArrayFormula({"Submitted Time","Email","Score";SORT(SPLIT(FLATTEN(QUERY(SORT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(IF(Sheet1!B2:B=TRANSPOSE(UNIQUE(FILTER(Sheet1!B2:B,Sheet1!B2:B<>""))),Sheet1!A2:A&"|"&Sheet1!B2:B&"|"&Sheet1!C2:C,),,COUNTA(Sheet1!A2:A)))," ",0,1)),SEQUENCE(MAX(COUNTIF(Sheet1!B2:B,Sheet1!B2:B))),0),"LIMIT 10")),"|",1,0),1,0)})
The number of records is set after LIMIT.
The order is set by the final two numbers: 1,0 (meaning "sort by column 1 in reverse order," which, as currently set, is sorting in reverse order by date/time).

Google Sheets Combine a column with duplicates and update total sum in another colum

This might be something fairly simple but struggling to find a way to do it.
In Column B, I have a list of foods required.
In Column C, I have the amount needed.
In Column D, I have g (for grams) ml (for mills) etc.
I would like to combine the duplicates in Column B and update the totals from Column C, with the g or ml in Column D beside it.
The list I have has been created by using an array formula based on dropdowns in another sheet.
I have seen people using UNIQUE formula in 1 column (this works) and then a SUMIF formula in another column and then a JOIN formula in another... I tried this but the SUMIF is always returning 0.
Would someone please be able to advise on how I can do this?
TIA :D
It's hard to be sure exactly what you need without seeing the data. But based on my understanding of solely what you've posted, this QUERY formula should generate a condensed mini-report:
=QUERY({B2:D},"Select Col1, SUM(Col2), Col3 WHERE Col1 Is Not Null GROUP BY Col1, Col3 LABEL SUM(Col2) ''")
In plain English, this means "Arrange the data from the range B2:D in the same order as the raw data, but sum the second column's data according to matches in both the first and third columns. Only return results for the raw data where the first column is not blank. Replace the default 'sum' header on the second column with nothing; I don't need it."
This formula assumes that every ingredient will always be attached to the same measurement (e.g., 'salt' in Col B is always paired with 'mg' in Col D, etc.). If this is not the case, you will wind up with ingredients being listed as many times as there are different measures in Col D.

Sum based on multiple row + header criteria

I have a problem with summing up my values from a data set, that's structured like this:
The goal is to sum the revenues separated by company and split by month, so the result is output in this way
I have tried it with some =sumifs + index/match and =sumproduct solutions, but can't seem to make it work.
Here's the sample file:
https://docs.google.com/spreadsheets/d/16xOoPCHDtcSRRojCkwcBorUc5dstgkXFPR6M_d5uY2U/edit#gid=0
On the "revenues" tab, in cell B4, try using the formula:
=SUMIFS(indirect(address(1,match(A4,Overview!$3:$3,0)-1,,,"Overview")&":"&address(1000,match(A4,Overview!$3:$3,0)-1)),Overview!A1:A1000,">="&B$2,Overview!A1:A1000,"<="&B$3)
To break it down, this bit helps figure out which revenue column to use by matching the name of the company and then taking the column before that:
match(A4,Overview!$3:$3,0)-1
This bit creates an address "Overview!$G$1":
address(1,match(A4,Overview!$3:$3,0)-1,,,"Overview")
This bit creates the 2nd part of the address i.e.":$G$1000":
"&":"&address(1000,match(A4,Overview!$3:$3,0)-1)
And the rest is a SUMIFS where it sums the revenue column for dates after the 1st of the month and before the last date of the month.
Be careful: your data is for 2020 and your summary table is using dates in 2021.
Reference:
SUMIFS
ADDRESS
MATCH
INDIRECT
use in A4:
=ARRAYFORMULA(QUERY(QUERY({SPLIT(FLATTEN(IF(
FILTER(Overview!G7:1000, MOD(COLUMN(Overview!G7:1000)+2, 3)=0)="",,
TEXT(Overview!A7:A25, "m")&"×"&
FILTER(Overview!G7:1000, MOD(COLUMN(Overview!G7:1000)+2, 3)=0)&"×"&
FILTER(Overview!G3:3, MOD(COLUMN(Overview!G3:3)+1, 3)=0))), "×");
SEQUENCE(12), SEQUENCE(12, 2,,)},
"select Col3,sum(Col2)
where Col1 is not null
group by Col3
pivot Col1", 0),
"offset 2", 0))

SUM last N values with criteria

I have simple table that looks like this:
All i need is to SUM points for specific player (John) in his last 3 matches.
I was able to come with this formula:
SUMPRODUCT(LARGE((A2:B="John")*(C2:D);{1;2;3}))
The problem is that instead of what I was looking for, it sums the highest 3 values, that can be anywhere in that range.
Is there some similar formula, that can do only the last 3 matches?
I think a SUMPRODUCT can get you there with some constructed arrays using a COUNTIFS() and ROW() to get the most recent 3.
This formula:
=SUMPRODUCT((COUNTIFS(A:B,G2,ROW(A:B)*{1,1},">="&ROW(A:B)*{1,1})<=3)*(A:B=G2),C:D)
on this sheet I made seems to work.
I thnk I have a formula that gives what you want. It's not pretty, and I'm sure it can be made simpler, but this works:
=query( query(
{ arrayformula( {ROW(A1:A) } ),
query(A1:D,"select A, B, C, D",1)
} , "select * order by Col1 desc",1),
"select Col2, Col3, Col4, Col5
where (Col2 ='John' or Col3 = 'John')
order by Col1 desc limit 3",1)
Basically, it adds the row number as an extra column to the data, so that we can sort the data in reverse order by row number. Then we query the result to find the first three occurences of 'John', in either Col A or Col B.
Here is a sample sheet:
https://docs.google.com/spreadsheets/d/1-mhTb5Cpp3D-1OltlmCfwlmM-vc2OknHxfJAyHD7BjI/edit?usp=sharing
Credit to Erik Tyler for a previous answer on a different question, on how to add the row number to a query.
Edit: Updated the sheet to provide the SUM of John's (or any player's) scores from the last three matches. This can be combined with the previous formula, if you want a single formula to place somewhere. Or will you have a list of all the players, and you'll want their last three scores beside each of their names?
If I can simplify the formula, I'll update it here.
Let me know if you need something more than this, or if this has answered your question.
Approach
I would use the query formula to get the cells that you need so that you can leverage the limit statement.
You should put a column with the indexes so that you can order the cells in descending order and take the first 3.
Given that your table headers are:
+-----------------------------------------------+
| INDEX | NAME 1 | NAME 2 | POINTS 1 | POINTS 2 |
+-----------------------------------------------+
I would use this query to get your desired result:
=SUMPRODUCT(QUERY(A2:E, "Select D * E where B = 'John' or C = 'John'" order by A desc limit 3"))

Resources